« "Ah-ha gotcha" moments | Main | "And that includes me..." »

Version Control System Shootout Redux

At the Firefox Summit last week, we convened a session death match to discuss which version control system the Mozilla Project would use going forward.

There had been some initial work1 to specify our requirements for a new system, but now with work for Mozilla 2.0 looming, it was time to get everyone in a room and make a final decision.

I've been asked a few times about the outcome of the discussion.

For those that didn't catch the Pay-per-view broadcast, here's a review of the event and decisions, complete with screencaps:

The session, led by Vlad, Brendan and myself, started by clarifying the scope of the discussion. The two issues at hand:

  1. What version control system do we use for Mozilla 2.0 development?
  2. Do we convert to an interim system, to move off of CVS, for Gecko 1.9/Firefox 3 development?

So many choices to consider!

For Mozilla 2.0 work, it was decided that a distributed system is needed over a more classical, centralized system. This is due to a number of requirements, including increased developer agility2 and ability to share patches directly with each other. Additionally, distributed systems, to be coherent, much less usable/useful, need to solve the Complicated Merging Problem (tm) up front, and there will be a lot of complicated branching and merging in 2.0 time frame that distributed systems, by virtue of their more advanced merging algorithms, support better.

This removes systems such as Subversion from consideration and and focused the discussion on Mercurial, Bazaar, Git, and the like.


While fun to watch, the battle among centralized systems was short.

Git was largely removed from consideration due to lacking Win32 performance and support, which is a requirement for us.

Vlad has been exploring Mercurial, which is used by the OpenSolaris project. There's been a test-import of the trunk and initial testing seems favorable in terms of performance and our requirements.

During the meeting, a number of people asked for our impressions of Bazaar. Initially, we didn't look too deeply into Bazaar, due to reports of performance issues from Sun's investigations with OpenSolaris. As it turned out, some Bazaar developers were in town, and we met with them during the Summit to discuss requirements, Bazaar's features, and other issues. Bazaar has some compelling features, but the performance and import story is still being investigated.


The battle of distributed systems is a bloody prospect...

We've made contact with both the Bazaar and Mercurial teams, and are beginning to work through import and usage scenarios. We'll post more information on the scenarios as we work them out, and it may turn out that one system becomes a natural choice as the details start to fall into place.

For Gecko 1.9/Firefox 3 work, we discussed whether moving away from CVS to something like Subversion in the Q1 time frame was feasible, and then whether it was desired. There was a lot of discussion in this area, but given that Subversion fell out of the running for Mozilla 2.0, we resolved that it made little sense to spend time converting CVS to Subversion or some other system, only to convert it again to Mercurial or Bazaar.


Moving to RCS sounded like a good idea until we put the Lizard in the ring...

The plan of record, therefore, is to continue using CVS for Mozilla 1.8/Firefox 1.5, Mozilla 1.8.1/Firefox 2, and Mozilla 1.9/Firefox 3 development.

When we've shored up all the tool support and usage policy for the new version control system, be it Mercurial or Bazaar, then we'll look at the feasibility of moving or merging development of the old branches into the new system.

In the original announcement, I said that we wouldn't leave the room until we had a winner decided. That was mostly tongue-in-cheek, despite the fact that I think we all would've liked to have left the discussion knowing.

But going from a bunch of various version control system options and plans down to two, and having a concrete plan of action for Gecko 1.9 development and Mozilla 2.0 development was still a huge win.

I think this meeting illustrated that there's going to be a lot of work involved in the conversion3 and because we've been so hand-wavy about which system we're going to use, we've not thought about a lot of the details. But, we've got some concrete options to pursue, and the rubber is starting to hit the road, so I'm encouraged by that.

Having said that, I must admit I'm devastated that my favorite contender, ClearCase, got beat down so early in the running.


Poor ClearCase... we hardly knew ye...4

__________________________
1 Over a year ago
2 Admittedly, a fuzzy term.
3 bsmedberg and I estimated about 1.5 engineers for at least a quarter... probably more like two), and because we've been so hand-wavy about which system we're going to use, we've not thought about a lot of the details. But, we've got some concrete options to pursue, so expect more information.
4 Images and characters chosen entirely randomly; don't read too much into the pairings; I was never a MK player, myself.

Comments

ROTFLMAO.

Best. Blog post. Ever.

Oh, and it was informative, too.

LOL!! I busted a gut laughing!

I never knew you guys considered so many systems.

Awesome post! It was informative too!

Wow, great post.

Only thing I would add is that using more obscure version control (not cvs or subversion) adds another level of confusion for those viewing source without affiliation to the project. IMHO that's a major plus for svn, or just sticking with cvs.

What percentage of the repository user base require distributed features?

Only thing I would add is that using more obscure version control (not cvs or subversion) adds another level of confusion for those viewing source without affiliation to the project. IMHO that's a major plus for svn, or just sticking with cvs.

That is something we're (pro)actively attempting to address.

We're looking into the possibilities of how these two systems support CVS/SVN users and how they support users who don't need/want the distributed features or don't subscribe to that usage model.

It's up in the air how that will turn out, but we're looking into things like an anonymous, read-only CVS mirror of whatever's in the distributed system, so people can still checkout/create patches like they're used to.

I've been particularly worried (and am spending time thinking) on how these changes will affect localizers (who aren't necessarily coders, and who may be working with limited computing resources) as well how the build machinery, which has a very specific use-case, interacts with the new system.

You are correct in that we have some amount of data about usage models for a particular type of usage.

At this point, I'm particularly interested in other contributors' usage models, which may be underrepresented in the data we have.

What percentage of the repository user base require distributed features?

I don't know if that's a question I can answer; I'm actually unsure how we'd go about getting an answer.

There are two issues, I think, that would skew the data:

1. A lot of the Mozilla 2.0 work requires merging algorithms that... weren't designed in the 80s. If you buy that assertion, then the percentage of people requiring the distributed part really isn't the relevant question. It's a matter of which tool provides support for complicated merge algorithms, especially those that support the type of operations commonly done during refactoring (renaming, moving directories, etc.)

b. Converting the way you think about where you do your work to a distributed model takes some time. I suspect (but am not sure, and am therefore entirely happy to totally eat crow on this) that once these features are available to developers, and they're familiar with them, they'll find new and exciting uses for those features.

Having said that, as I mentioned to Robert Accettura, I've spent, (and continue to spend) a lot of time thinking about how others use the version control system, how we handle the conversion, and what our story is for helping contributors transition to the new system.

That aren't clear answers yet, but... I'm thinking about it (for whatever that's [not] worth. :-)

For what it's worth, as someone who has been involved with version control systems in various capacities (both using them and developing them commercially) for the last 15 years or so, I have yet to encounter a developer who has come up to speed on a Mercurial-style distributed system and wants to go back to a centralized one for anything but tiny hobby projects.

In my opinion it's just a better model in almost every conceivable way with the possible exception of higher disk usage on developer systems -- and even that is largely a myth that often turns out to be the opposite of the actual reality. (For example, the Git repository I'm using at the moment is an imported clone of a Subversion repository, and the Git one is about 30% smaller than a checked-out *client* of the Subversion one, even though the Git repository has upwards of 10,000 revisions in its database and the svn client has no revision history at all.)

There is a learning curve, that's undoubtedly true. But the benefits are well worth it, and once you've gotten used to working with a distributed VCS, you will grit your teeth when you're forced to use Subversion or (ugh) CVS again. At least, that's been my experience.

In terms of development efficiency, you get to do things like share changes with collaborators without publishing them to the world at large. All the distributed systems are much better than most of the centralized systems at keeping track of what has been merged already, so although you do a lot more merging, you actually spend less time worrying about it. You can checkpoint your own work locally without resorting to copying your source directory somewhere, and make full use of the system's merging and history tracking for your own personal development, then package up your changes in a logical order to send upstream. And the list goes on.

At worst, as noted, the project has to keep around a CVS or Subversion clone of some integration repository that reluctant users can fetch from. There are tools to automate that process so it's pretty much a set-it-and-forget-it operation.

To me it's a no-brainer for a big distributed project like Mozilla to switch to a distributed system and I'm very happy to see it happening.

Even I can understand this advantage:
http://sayspy.blogspot.com/2006/11/bazaar-vs-mercurial-unscientific.html

And one way to make branching cheap is to do it locally. The "distributed" part of these tools comes from the fact that they are designed to work offline. This lets you commit changes and such to your local disk without requiring Internet access to hit a central repository online (but you can commit your changes to an online repository later when you are ready to). This is really handy if you develop on a laptop; it really sucks to have to do a bunch of coding remotely and then commit one huge patch instead of committing a bunch of individual patches. You want the atomicity of commits that represent a single piece of semantic change to allow better tracking of how/why your code changed, not some blob of code that changes a bunch of things at once. Plus it allows for easier rollback if you accidentally introduced a bug.

As preed mentioned localizers, I'd like to reiterate that we're making heavy use of partial checkouts all through our project, not just in l10n.

On the other hand, if we manage to sell that right, a distributed VCS may actually be candy for localizers, in particular when trying to improve (or land from scratch) on stable branches.

Xen is using mercurial and seems pretty happy with it. Features are still being added at a good pace to mercurial, and I'm sure that Mozilla would end up contributing well. Branching is one area that is still in flux. Also, Tailor looks like the best tool for keeping a cvs repository synchronized.

~Matt

what was wrong with darcs?

too slow?

too haskelly?

It's important that localizers don't only need to view the code, they need to make commits. Tool support may be the answer here (you could even have a two-way gateway to cvs or svn?)

I think you have the fighters backwards in at least the last capture, since it would appear that ClearCase is savaging Hg in a way that the world hasn't seen since Coop and I held court in Ottawa.

(Does Hg give three? A gentleman always does.)

Please change the last screenshot, it is wrong and it ruins the fun :) I want to see darn clearcase lying in a bloodpool not HG.

Please reconsider darcs.