Merge tracking with Subversion 1.6

I am now running Subversion 1.6 for my client’s SVN repositories. I upgraded mainly to take advantage of the Merge tracking introduced in Subversion 1.5, and improved in 1.6. In particular, Subversion now creates its own “mergeinfo” entries, so you no longer have to use the svnmerge.py script. Subversion 1.6 now has better detection of “tree conflicts” – essentially, problems with the local working copy caused by renames, missing files, etc.

I used the new release of VisualSVN Server to install and upgrade SVN in a painless way (see earlier post on VisualSVN Server). The new version installs over the top of the previous one, so back up your SVN repositories first.

An important part of the Subversion philosophy is “don’t break things”. Even though VisualSVN Server 2.0 runs Subversion 1.6, the underlying repository format is not upgraded automatically, meaning that the new merge tracking feature is still not (yet) available. To get this working we simply need to run “svnadmin upgrade PATH”, like this:

> svnadmin upgrade D:\Data\Svn\DevDoctor
Repository lock acquired.
Please wait; upgrading the repository may take some time...

Upgrade completed.

Following a working copy svn update, you can run svn merge. In this case, I used TortoiseSVN:

Here, we were merging from the development branch, so I selected Reintegrate a branch. Once the correct settings have been chosen, you can even “Test merge”, which gives you a report of what would happen if you went ahead with the merge. When the changes are successfully merged, Tortoise shows the results:

As normal, the merge needs to be committed, but the crucial difference after the commit succeeds is that Subversion itself has tracked the merge using the mergeinfo property:

You can see that an svn:mergeinfo property has been set on the folder, showing the branch from which the merges were done, and the revision numbers (here, 400-409).

All this means that merging (both from branches to trunk, and from trunk to branches) is all much less tricky and error-prone than before (with Subversion 1.4). Subversion 1.6 is also noticeably quicker than 1.4 for all operations so far.

Further reading: http://scm.jadeferret.com/subversion-16-new-features-explained/

Subversion made easy: VisualSVN Server

Subversion is fairly straightforward to install and maintain, but VisualSVN Server makes the process on Windows almost trivial:

  1. Download
  2. Run Installer
  3. err, that’s it

It bundles Apache and the SVN server, and comes with a nifty Admin console which makes it easy to change repo permissions. It even provides pretty repo browsing via XSLT in the web browser.

ACCU Conference 2005

Introduction

I attended six presentations at the ACCU Conference this year:

  1. IronPython – Python for .NET (Microsoft)
  2. Python at Google (Greg Stein)
  3. C++ and the CLR (Microsoft)
  4. Unit Testing XSLT (Jez Higgins)
  5. Modern 3-Tier Architectures (T-Mobile)
  6. .TEST Testing Framework (Parasoft)

I also made some random observations.

IronPython – Python for .NET

The experience of seeing Python being implemented on the CLR (by Jim Hugunin) has influence Microsoft’s design of .NET 2.0 – better support for dynamic languages, generative programming, dynamic classes, and so on, are all coming to .NET 2.0. In fact, so much so, that Microsoft head-hunted Hugunin last year.

IronPython targets the CLI, the open-standard implementation of the CLR, so it runs on Mono and other, non-Microsoft platforms. We saw a live demmonstration of IronPython importing classes written in C# from an Assembly, and calling methods and events on those classes. Even the IronPython source code itself if a Visual Studio project!

Python at Google

Greg Stein has worked for several major organisations, including Microsoft, Apache Foundation and now Google. He revealed that Python is fundamental to Google, and is used not only in back-end management, but as the main language for both Google Groups (http://groups.google.com/) and Google Code (http://code.google.com/). Some of the other areas for which Google uses Python include:

  • Wrappers for Version Control systems – to enforce check-in rules etc.
  • Build system – entirely in Python
  • Packaging of applications prior to deployment
  • Deployment to web-farms and clusters
  • Monitoring of servers (CPU Load, fan speed, NIC utilization, etc.)
  • Auto-restarting of applications
  • Log reporting and analysis

He referred to Python a the “special sauce” which makes a company more ‘tasty’ or appealing than another. Because Python lends itself to readability and maintainability, it also allows rapid changes to code, following shifting requirements. C and C++ tend to be used (in Google) only for specialist, speed-critical tasks. Otherwise, Python (or sometimes Java) is used – it’s simpler and quicker to write, and easier to modify.

Software Development

Greg made some interesting comments about software development. At Google, the typical project is “3 people, 3 months”, forcing many small applications to co-operate, rather than having just a single monolithic blob, aiding scalablility. Every Google developer has 20% of their time to “play” with their own ideas. Google Maps, and the Google Toolbar both started off with employees tinkering with code in the 20% “spare” time.

Employees are also encouraged to submit patches to any part of the Code Base, not just that for their project. THis more open approach contrasts with the style at Microsoft (apparently), where teams have access only to the source for their project and no other. Greg argues that this openness explains why Google innovates so well.

As an aside, he said that he has a machine running Unit Tests continuously on the latest code, so that as soon as code breaks, they are notified ;o)

C++ and the CLR

The bod from Redmond pitched C++ as the Systems programming language for the foreseeable future, underpinning the .NET Framework, and allowing developers to achieve applications of the highest performance.

The C++ compiler can compile manages code directly to a native platform executable, and allows free mixing of Managed and Unmanaged code throughout an application. In this way, the .NET Framework is treated as just another class library, which can be loaded on demand.

Performance can thus be highly tuned. We were shown a demo of Doom 2 running on the CLR, and running faster than the native code, due to the highly optimized code produced by the C++ compiler. For example, in Managed C++, when a variable goes out of scope, it is automatically set to null by the compiler, making the job of the Garbage Collector much easier. In C#, by contrast, this does not happen automatically, so that objects marked for GC may sit in the GC queue much longer, using up resources.

The Sampling Profiler was recommended as a good tool to use when beginning investigations into performance problems. No code changes are required, and it can be used on live systems with practially no detrimental effects to application speed.

Unit Testing XSLT

XSLT is a fully-specified programming language; it is Turing Complete. However, unlike most programming languages, it has very few testing and debugging tools. Often, in a mixed system of Java/C# + XML + HTML, the XSLT is something of an unknown: output must be inspected visually, not programatically, to attempt to verify correctness.

Jez Higgins demonstrated some simple code (written in XSLT) allow Unit-Testing of XSLT.

Basically, like any Unit Testing, you define expected output data for given input data, and test that the software (here, the XSLT transform) produces the correct output. There is an Open Source project called XSLTUnit (http://xsltunit.org/), which, although really proof-of-concept, is stable and useful. Starting with this framework, Jez added extra functionality so that Unit Testing of XSLT files can be included as part of an JUnit test script [the same could be done in other languages, e.g. C# and NUnit].

The XML looked intially complicated, but the scheme is actually remarkable simple:

    1. Define input data for a given XSLT file (in XML).
    2. Define expected outputs for the transform (in XML).*
    3. Write a special XSLT file which, upon transform, will produce Java code for each test case. The Java will inspect the XML output file.
    4. Write a small amount of Java to load in the XSLT and XML files, and perform the transformations.
    5. Using Ant and JUnit, compile and run the Java code generated by the XSLT to test the output data.

*Note: output from transform must be in XML

The beauty of this approach is that existing standard Unit Test tools (JUnit/NUnit/etc.) can be used to test XSLT! XSLT is used to generate code required to test XSLT itself; Java (or C#, etc.) is used dynamically as “scaffolding” to make use of standard testing frameworks.

Example:

import junit.framework.*; public class extends TestCase { public (String name) { super(name); } public static Test suite() { return new TestSuite(.class); } } // class public void () { Helper.runTest( “”, “”, “”); }

Once the initial test definitions are written, no more work is needed – no Java, no editing XML configuration files. Crucially, the fact that test definaitions are defined in XSLT allows non-programmers to write the Test definitions, in addition to the XSLT itself; this follows the standard pattern of Unit Testing, where the implementer also writes the Unit Tests.

In terms of the complications introduced by complex, once-unverifiable XSLT, this scheme could bring massive improvements.

Modern 3-Tier Architectures

The most salient point made by Nico Josuttis was with regard to Project Management: that in a large n-tier system, the architecture will change over its lifetime, and trying to lock down the technical design is recipe for disaster. Clients like to think that the architecture remains the same, but – even if that is what they are led to believe – developers must be willing to refactor or completely redesign.

The speaker argued that the back-end (database layer) should maintain data consistency itself, and therefore perform checks for:

  • Authentification (who am I)
  • Authorization (what I am allowed to do)
  • Validation (does the data make sense)

There was lots of discussion of the granularity of security and validation, and where exactly this should take place. Most people agreed that having two separate levels of validation is essential. This granularity issue was exemplified by comments from some people who work for large telecoms comapnies, whose databases contain thousands and (soon) millions of logins – one for each customer – not just a few logins used by the application for all connections, irrespective of the customer. This clearly has performance issues.

Project Management

Nico then spoke about development procedures within a large company, T-Mobile, where (typically) software is released every 3 months; they have 6 weeks development time, followed by 6 weeks of regression testing – a 50-50 split between coding and testing!

As for releasing software, he reported that in his experience, nightly builds are essential (echoing Greg Stein’s comments about the continuously running build machine). He suggested to plan for ten per cent of development effort to go into releasing software (build, deploy, integration, etc.).

He was also scathing about UML, MDA, and other “fad” techniques, at least as far as whole projects go. For small tasks they are okay, but a project driven from these modelling approaches is doomed, according to Josuttis.

Parasoft .TEST Testing Framework

Parasoft .TEST is a testing tool for .NET code. It has Visual Studio integration, but can also be run as a stand-alone app (for the QA people) and from the command-line for build process integration.

It addresses three main areas: Coding Standards, QA testing, and Unit Testing. For the latter, it integrates with NUnit, allowing existing Unit tests to be run as part of the testing cycle.

It works by inspecting the ILASM in existing Assemblies, so can work with Assemblies created with any .NET language. It can automatically generate NUnit test harness projects for the assembly and (with the VS-Plugin version) automatically add these to the Solution. This is probably the biggest bonus – all the tedious Unit Test scaffolding is done for you – all the developer needs to do is to fill in the implementation of each generated test. This then addresses one of the chief complaints of people using Unit Testing; namely, that it takes a long time to code up all the Test classes.

In addition, the suite generates internal Unit Tests for common boundary conditions (null reference, index out of bounds, etc., etc.) across all or part of the Assembly/Assemblies under test.

.TEST ships with about 250 pre-defined Coding Standards, but others can be defined by non-programmers, so that (for example) Security and Performance QA issues can be addressed at the code level, and as part of the testing cycle.

It was pretty impressive; seemingly all the gain and none of the pain of Unit Testing.

Other random observations

Python

There are Python Meetups in London (Putney); the UK Python mailing list is uk-python@python.org.

Version Control

Greg Stein said he wants to convert Google from Perforce to Subversion¬†for Source Code Control. Now, he might be a bit biased (he helped develop SVN), but if SVN is ready for Google, it’s ready for anyone, I reckon!

Assembly Dependency Viewer

This simple application gets any VS.NET project or assembly and displays all the assembly / project dependencies. All dependencies are shown in a tree view with images that indicate the assembly location (Bin, GAC or registered as COM+ component).

http://www.codeproject.com/dotnet/Assemblydependencies.asp