« bzr/hg/git performance | Main | bzr and different network protocols. »
February 13, 2007
More on distributed VCS performance.
This is a followup to my previous posting about bzr/hg/git performance, since the time when I wrote that posting I've continued to use git for my own use, but recently decided to look more into what it would feel like using either Bazaar (bzr) or Mercurial (hg). Since I wrote my previous post, bzr did release a new version (0.14) and performance did indeed improve some for the tasks I measured in my last posting (diff). Diffing is obviously not all you do when you're working with a VCS, and even though bzr was notably slower than the alternatives, it still wasn't unusably slow. I could live with that.
So I decided to perform some other tests. This time, since git is unfortunately out of the question here at Mozilla, I chose to compare only Bazaar (bzr) and Mercurial (hg). I started by Initializing a repository and populated it with a clean Mozilla trunk checkout (pulled a tree from CVS, removed all CVS directories and .cvsignore files, and committed the whole thing into the VCS). Once that was done, I performed a set of tasks that are likely to be performed more than occasionally by Mozilla developers. All times are single samples of wall clock time in minutes and seconds, but performed on a hot disk cache.
1. Pull/branch/clone from central repository. I.e. create a local working space, i.e. copy revision history from central repository (on localhost in this test) to local directory, and populate the local working space with what's in the repository.
- hg: 2:07 (hg clone http://...)
- bzr: 35:25 (bzr branch sftp://...)
2. Local branch/clone. Create a new working space for a large-ish change you're going to be working on. bzr calls this branching, hg calls it cloning.
- hg: 0:59 (hg clone src dest)
- bzr: 33:03 (bzr branch src dest)
3. Merge. Merge in a simple change from another branch you've been working on.
- hg: 0:05 (hg merge ...)
- bzr: 1:19 (bzr merge ...)
Now those are not pretty numbers from a bzr point of view. 10 or so second diffs I can deal with, half hour pulls aren't too bad, as you really wouldn't pull a whole tree from the central repository very often (ideally only once per computer). But half-hour branch times would be really tough to deal with, as I could see myself doing that more than once a day sometimes.
And note, these tests are done with virtually no history, things would only get slower as the amount of history in the repository grows larger.
Oh, and:
sh$ bzr rocks it sure does!
Yup, I think it does too, but until it gets a fair bit faster, I can't say I'd be excited about using it for a project the size of Mozilla :(
Posted by jst at February 13, 2007 5:17 PM
Comments
You don't really need to use bzr branch when doing a local copy. Instead you can just copy the whole directory, it's the same thing. That should be at least as fast as hg.
Posted by: Johan Dahlin at February 14, 2007 1:24 PM
I wonder if MozFo/Co can just throw money at this problem. If Mercury is definitely found wanting in terms of features and non-bugginess, maybe some of the amassed Mozilla revenue can go to contracting a dedicated team to improve Bazaar performance. Then there might be hope of finishing that task in the Mozilla 2.0 timeframe.
Posted by: D at February 15, 2007 10:45 AM
Sorry for the poor formatting, but your comment box doesn't allow forcing a newline. See a nicely formatted version here:
http://bzr.arbash-meinel.com/jst_response_1.txt
To start with, your hg/bzr comparison is using 'http' for hg, but 'sftp' for bzr. Which isn't a very fair comparison, because sftp has about 3-times the protocol overhead than http. So just doing 'bzr branch http://' would be quite a bit faster than 'bzr branch sftp://'.
You might also consider using a local shared repository. Which means that local branches can share history, rather than having to copy it for a 'bzr branch' operation.
'hg' uses hard-links and copy-on-write to handle branching.
However, with 'bzr' you can do:
bzr init-repo --trees basedir
cd basedir
# This first branch will download whatever it
# needs from the remote and store it in
# basedir/.bzr/repository
bzr branch http://remote/foo
bzr branch foo bar
The 'branch foo bar' now doesn't have to copy over any of the history files. Which should put us a lot closer to hg in clone speed.
One place where we win a lot, is if you want to do a new branch from a different remote location.
cd basedir
bzr branch http://other/place
At this point bzr doesn't have to download all of 'other/place's history, only what has changed.
I would guess the only way to get that effect with hg is to pre-branch from another location. So you would do:
hg clone foo place
cd place
hg pull -u http://other/place
So that the first clone could do hard-links, and then the pull would break them as needed.
It also means that the more you develop, the more duplications happens with 'hg' directories, while with a shared repository, things will stay together.
And one more reason why we chose not to go with hardlinks, is that they aren't supported well on all platforms. Specifically on Mac they are supported very poorly on HFS+ (in that you generally get severe performance degradation if you have a lot of hard links). And on Windows, you don't have hard-links with FAT32. And we were suspicious about hard-links on NTFS. (They probably are okay, but hardlinking isn't a common operation on Windows)
Posted by: John Meinel at February 15, 2007 12:14 PM