Downplaying the "Distributed" Dogma
Benjamin recently wrote about the current state of our effort to try to import our CVS repository to... something from this century.
His conclusion is spot on, although I think it... minimizes the head-banging he and I have been going through for a couple of weeks. My original characterization didn't turn out to be far from the truth, it seems... except, it's me and my quad-core-P4-with-4-gigs-of-RAM sitting there, bloodied and bruised on the floor, not ClearCase. ;-)
I was somewhat surprised by the number of responses to Benjamin's post that seemingly amounted to "Can't you just use Subversion? Subversion works. And if you want distributed, use SVK."
Well, the first issue with that is cvs2svn1 doesn't seem to import the Mozilla CVS tree anymore: it's hitting the error that Hg tends to hit2, and while completely dying is arguably more correct, bzr and cvs2svn 1.3.0's approach—annotating and ignoring the error, so the import can actually continue—is much more satisfying.
The second issue is that the march towards a distributed version control system really isn't about a distributed version control system; it's about using a tool that support merging algorithms that weren't invented in the 80s, back when you never did branches anyway, because it was annoyingly difficult with the tools of the time.
During the original discussion, the main issue that limited Subversion's advancement in the race was that it didn't support any better merging functionality or techniques than its predecessor. It requires external tools to record which merges had been performed and the actual algorithms used are the old ones we all love and/or hate.
Now don't get me wrong: I use Subversion for all my personal stuff and I like it. I think it's a great improvement over CVS (which I used for years and imported from) and in many (most?) cases, I would recommend it.
But when you're going to be doing the kind of "agile"3, disruptive, reconstructive work that Mozilla 2.0 requires, at a minimum, you need a tool that makes branching and merging easy. SVN does work for me (and lots of other people and projects) because I'm not faced with, for example, renaming nsIFrame::GetPresContext, a task where a branch makes a lot of sense, and I'm going to be doing hundreds of renames.
I contend that it's not so much that we require (or necessarily even want) a "distributed" version control system. In fact, as a counter example, Perforce is a [closed source] centralized VCS that has a lot of great features, including merging primitives that are awesome. Accurev is another (although, I've never personally used it.)
We just happen to be focused on "distributed" VCSs because those are the only open source offerings that have merging facilities that handle complicated situations and get the merging stuff right. This is likely because a distributed version control system isn't worth anything if you can't merge your work back in easily and [more importantly] reliably.
I'll concede, of course, that once you have things like offline diff/commit and easy patch sharing among peers, all built-in-and-tracked-by the VCS, that's (possibly addictive) icing on the cake.
But it's not about "distributed" part. It's about the capable-merging part.4
Breaking code apart is easy. Putting it back together is hard.
We want and need a tool that intrinsically expects, is designed to handle, and expertly supports the latter.
_____________________
1 As of 1.5.0
2 Which amounts to deleting files which don't exist on branches [possibly yet] that they're being deleted from.
3 I hate using that [buzz] word.
4 Coincidentally, Joel recently blogged about version control systems and large teams, and it seems the Windows team uses a model very similar to that of the 2.6 kernel developers, and possibly similar to what we'll end up using. It seems that easy branching (which is easy) and easy merging (which is hard) is the only real way to scale a development project into the thousands.
Comments
git has every merge algorithm known to man. It seems like a new one is implemented each week.
Git was dismissed because it lacks native Windows tools. If Mozilla were to seriously consider using git and a specific proposal was made about what new features were needed, I believe they would get implemented.
Posted by: Jon Smirl | January 31, 2007 7:41 AM
Have you guys gotten in touch with the SVN developers to see if they have better merging capabilities coming down the pipe, or could have them with a small amount of help? It seems (to me) like making SVN handle merges better would be easier than making Hg or bzr actually work properly. (I certainly wouldn't trust my project's repo to either of them yet.) I guess I wonder how much of the "let's use a distributed system because it handles merges better!" is actually cover for "wow, this thing is shiny and new and cool, I want to try it! SVN? That's old and lame."
Or perhaps I'm just not seeing how moving to an incrementally-better system is a bad thing, especially if you can isolate some of the problems in this transition and others in a future transition (if desired): work out how to make the repo history sane this time, for example, and save the problem of completely changing people's development model for next time.
Best of luck whichever route you go.
Posted by: Peter Kasting | January 31, 2007 10:10 AM
"Breaking code apart is easy. Putting it back together is hard."
Amen. There's a diagram on p.49 of my book "Practical Development Environments" that shows a 3-D space with time on the X axis, file names on the Y axis with points where each file is changed over time. Branches are shown on the Z axis, coming out of the book towards you.
The comment at the top of that page notes that on the Z axis it's very easy to move in one direction (branching) but it's always harder to move in the other direction (merging).
I wish more people had that as a gut instinct.
Posted by: Matt Doar | January 31, 2007 11:03 AM
The impression I got was that the git developers showed no interest in making it work on Windows. Even if they were willing to do so, would they also maintain it? I'm sure Mozilla doesn't want to start maintaining a VCS, even if it's only on one platform.
Posted by: Jeff Walden | January 31, 2007 1:29 PM
git does have a MinGW port these days: http://www.gelato.unsw.edu.au/archives/git/0701/38380.html
Posted by: tor | January 31, 2007 2:03 PM
Re Peter, svn doesn't really have branches to begin with. You can emulate branches with svn's shallow copies, but in it's heart, they're not branches. That shows at last in the fact that you don't get info for bonsai's cvs graph from svn directly. One can probably resurrect that information, but it's not at the heart of svn.
And yes, svn developers know, we chatted about that on, I guess, EuroOSCON 05.
Posted by: Axel Hecht | January 31, 2007 3:49 PM
@Everyone suggesting git
As Jeff Walden points out, git-on-Win32 is not (and, given the development community, seemingly unlikely to ever be) a first-class, tier-1-supported platform.
That's a solid requirement for us, and there's very little interest in supporting a VCS tool ourselves, so it makes git a non-starter. Which sucks, because as everyone is pointing out, it has lots of pluses.
@Peter: We have not specifically talked to the SVN developers about this. It's really not an issue of making Hg/bzr "actually work propoerly"; there's a lot of data to support that they work just fine.
The problem lies in getting our current CVS repo—which is by no means a model of pristine CVS state—into these tools.
Thanks for the luck, though; I'm sure we'll need it. :-)
Posted by: Preed | February 1, 2007 5:11 PM