November 8, 2007
Mozilla cvs file "moves" considered harmful.
As most of you know cvs doesn't know how to move or rename a file in a repository, i.e. if you have a file in mozilla/foo and you want to move it into mozilla/bar, cvs simply doesn't know how to record that. In a project the size of Mozilla, files inevitably need to move and be renamed now and then; cvs' inability to deal with that is one of the reasons Mozilla development is about to switch to using hg instead of cvs. hg (and most other modern version control systems) knows how to record a file/directory move/rename in the version history, so this is a non-issue once Mozilla development switches over full time to hg.
Over time people have come up with a couple of ways to deal with file moves in the Mozilla cvs repository. One way that I believe was used at some point way back was to simply move the files on the cvs server (the ,v files), this brings the history along with the files as they're moved, but this obviously also removes the files from their old location, not just for the current version and forward but also for old versions. This is obviously a bad idea, as pulling by date prior to this move will likely result in an unbuildable tree (as the build system is likely to expect the files to exist in the old location etc). A better alternative is to copy the files on the cvs server and doing a cvs remove of the old files, this avoids the above problem, but this also means that pulling by a date in the past will result in the files appearing in the new location as well as in the old location.
The current state of affairs is to use one of two scripts, both of which do more or less the same thing. These scripts basically replay a files history in the new location, either using the cvs client, or by doing this on the cvs server. This approach avoids both of the above problems, but they also invent changes to the files (dates and checkin order by date ends up incorrect etc).
Now enter the current world of active development happening in cvs, and all that (or the parts relevant for Firefox) being mirrored into our hg repository.
The script that mirrors the cvs tree into hg tracks the changes that go into cvs and figures out what of those changes were part of the same checkin (which is far from trivial btw), and you get the nice list of change sets you can see in the hg repository. So what happens then when we do one of these file moves? You get something like this. Not only is that incorrect history, but it's also impossible to get that right. The scripts that replay the checkins for the moved files obviously loose the dates of the checkins, and they also replay the history in the wrong order which in some cases makes it impossible to generate accurate change sets for those checkins. And of course it clutters the history (brings in old history we've already decided not to clutter the hg repository with), and makes it grow unnecessarily.
Given that, I think in general we'd be better off not moving files in cvs any more. Simply checking in the files in the new location with a checkin comment that explicitly states where the files were moved from and cvs removing the files from the old location should do. No history is lost, it's all there, getting to it just takes a few more steps as bonsai is perfectly capable of showing blame and logs for files that have been cvs removed.
The good news is that we're already in beta for Firefox 3, which means we'll probably be needing to move/rename fewer and fewer files anyways.
Posted by jst at 11:05 PM | Comments (4)
February 16, 2007
bzr and different network protocols.
As "D" kindly pointed out to me in a comment in my previous post, the comparison I did was unfair in that I used sftp for bzr when pulling from a server, but http for hg.
While I realized there's crypto overhead etc involved in sftp, I didn't imagine that the overhead would be noticeable, but to be fair I re-ran the tests. The results are interesting to say the least. I did two pulls over http with bzr for the same repository I used for my earlier tests (still on localhost, served through apache), and I've yet to see a number that shows a significant speedup with http rather than sftp. One run took 28 minutes (vs 33 with sftp), the other took almost 44. Go figure. I also tested the bzr protocol, and that is faster, a pull using the bzr protocol, with bzr itself serving it, took 15 minutes, about half the time of my initial tests with sftp. So that's definitely a good improvement, but why http is as slow, or even slower, than sftp is beyond me.
Posted by jst at 4:27 PM | Comments (6)
February 13, 2007
More on distributed VCS performance.
This is a followup to my previous posting about bzr/hg/git performance, since the time when I wrote that posting I've continued to use git for my own use, but recently decided to look more into what it would feel like using either Bazaar (bzr) or Mercurial (hg). Since I wrote my previous post, bzr did release a new version (0.14) and performance did indeed improve some for the tasks I measured in my last posting (diff). Diffing is obviously not all you do when you're working with a VCS, and even though bzr was notably slower than the alternatives, it still wasn't unusably slow. I could live with that.
So I decided to perform some other tests. This time, since git is unfortunately out of the question here at Mozilla, I chose to compare only Bazaar (bzr) and Mercurial (hg). I started by Initializing a repository and populated it with a clean Mozilla trunk checkout (pulled a tree from CVS, removed all CVS directories and .cvsignore files, and committed the whole thing into the VCS). Once that was done, I performed a set of tasks that are likely to be performed more than occasionally by Mozilla developers. All times are single samples of wall clock time in minutes and seconds, but performed on a hot disk cache.
1. Pull/branch/clone from central repository. I.e. create a local working space, i.e. copy revision history from central repository (on localhost in this test) to local directory, and populate the local working space with what's in the repository.
- hg: 2:07 (hg clone http://...)
- bzr: 35:25 (bzr branch sftp://...)
2. Local branch/clone. Create a new working space for a large-ish change you're going to be working on. bzr calls this branching, hg calls it cloning.
- hg: 0:59 (hg clone src dest)
- bzr: 33:03 (bzr branch src dest)
3. Merge. Merge in a simple change from another branch you've been working on.
- hg: 0:05 (hg merge ...)
- bzr: 1:19 (bzr merge ...)
Now those are not pretty numbers from a bzr point of view. 10 or so second diffs I can deal with, half hour pulls aren't too bad, as you really wouldn't pull a whole tree from the central repository very often (ideally only once per computer). But half-hour branch times would be really tough to deal with, as I could see myself doing that more than once a day sometimes.
And note, these tests are done with virtually no history, things would only get slower as the amount of history in the repository grows larger.
Oh, and:
sh$ bzr rocks it sure does!
Yup, I think it does too, but until it gets a fair bit faster, I can't say I'd be excited about using it for a project the size of Mozilla :(
Posted by jst at 5:17 PM | Comments (3)
November 30, 2006
bzr/hg/git performance
This posting is related to Paul Reed's recent posting about our investigation into switching away from CVS for our Mozilla 2.0 work. As you can read in Paul's post, the two alternatives on the plate at the moment are Mercurial and Bazaar, both of which are primarily written in Python (Bazaar is completely written in Python, Mercurial has some bottlenecks written in C from what I understand).
When we started talking about this I decided to do some performance tests to see how well these systems keep up with GIT, which I've been using for quite some time now and really like. Unfortunately GIT doesn't work anywhere near well enough on Windows, so using GIT is out of the question.
As Brett found out, Bazaar seems to be on the order of 2 to 3 times slower than Mercurial, which sounds bad, but depending on the actual performance might not really matter. So I decided to do some more unscientific tests to see how the performance of these two systems would compare to GIT, which seems really snappy with a repository containing all of Firefox and Thunderbird's source, including tests etc, and CVS/ directories.
Brett already tested commit speed and compared it. My first test (and only so far) was diff performance, as I tend to look at intermediate changes as I work fairly frequently. Here's what I did with Mercurial (hg), Bazaar (bzr). I already had a GIT (git) repository set up, so I used that.
- Initialize a fresh repository
- Add the whole Mozilla tree
- Commit
Once that was done, I made a one-line change to the file dom/src/base/nsDOMClassInfo.cpp, and did a set of diff tests and got the following results (all numbers are best of 3 runs, back to back on the same mostly idle computer):
| Operation | bzr (0.12.0c1) | hg (0.9) | git (1.4.2.4) |
| diff (top level) | 16.957 | 5.600 | 1.572 |
| diff dom/ | 10.596 | 2.240 | 0.140 |
| diff dom/src/ | 10.504 | 2.212 | 0.124 |
| diff dom/src/base/ | 10.468 | 2.212 | 0.124 |
| diff dom/src/base/nsDOMClassInfo.cpp | 10.472 | 2.084 | 0.116 |
| diff dom/src/base/nsGlobalWindow.cpp | 10.012 | 2.024 | 0.088 |
| diff in dom/ | 16.833 | 5.548 | 0.136 |
| diff in dom/src/base/ | 16.881 | 5.504 | 0.112 |
What's interesting in this data is that bzr takes a huge amount of time to do a diff operation even if you explicitly tell it to only check part of the source tree (either by changing into a subdirectory or giving it a directory or file name on the command line) as it does if you do a top-level diff (~2/3 of the time). hg appears to have partially solved this, but not in all cases. Another interesting note is that explicitly diffing a file that has no changes takes essentially as long as diffing a file with changes.
So what does this all mean? Well, to me personally it means Bazaar is not yet ready for a repository the size of Mozilla. Mercurial I can live with, even if it's not snappy. Git is fast (and yeah, I kind of wish we could use it).
Posted by jst at 1:57 PM | Comments (13)
June 20, 2005
Foxes (no fire).
About a month ago me and my girlfriend (who's now my fiance!) went on vacation to Hawaii (Kona) for a week. When we got back we had my fiance's family over for dinner. All of a sudden, in the middle of our dinner, we see a fox sit down right by our window. We've never seen a fox where we live before (out in the boonies, in a really cool canyon in Castro Valley, CA), and now we've got one sitting about 5' from our dining room window. It's got a dead rat in its mouth... it wonders around a bit, then drops the rat by our patio. Seconds later, a baby fox appears from underneath our patio. And then one more, and one more... there's 5 babies total! They come out to feed on what mom just brought home, right in front of us!
Since that day, we had the fortune of getting to watch these little foxes run around in our yard every day for 3 weeks, until they apparently moved elsewhere on Sunday a week ago. I hope they're doing well. They were so darn cute!
In the end it's fortunate that they did leave since we've got a wedding to prepare for, and the patio in our yard (along with lots of other things on our property) is going to be completely redone. The patio that was our little foxes' home is already completely gone.
More photos available here.
Posted by jst at 11:09 PM | Comments (5)
Killing more popups
Popups suck. We all know that. They make the user experience for the site that spews them at you far short of ideal. For a while there the web was a pretty calm place as far as popups go, but site developers (or abusers, really) got smarter and they figured out how to get past the popup blocker in Firefox 1.0 (and some other browsers too). The calmness is slowly but surely fading.
One of the easier and more common approaches for getting around the popup blockers is use a plugin to open the popup. Solving this problem in the browser across platforms is next to impossible. However, it's sort of solvable on Win32 and the Mac. The real fix needs to be a combined effort between the browsers and the plugins, the browsers need to disable popups from plugin content, unless the plugin says it's ok to open popups. To do this, the browser and the plugin needs to be able to communicate about this, and that's now possible in recent Deer Park nightlies. But for this new communication to happen the plugin needs to support it, and that means the user needs a new plugin and getting new versions of a plugin out to all Firefox users takes a long time. So we need an interim fix as well. The interim fix is to disable popups from plugins by default (which users have been able to do since Firefox 1.0) and to make an attempt permitting popups from plugins when the user interacts with the plugin content where possible (i.e. Win32 and Mac). This leaves Linux users sort of out in the dark, and that's unfortunate, but given the relatively small number of Linux users and the really small number of pages that open valid popups from plugins (I know of exactly 0 such pages) it seems like a reasonable overall approach until updated plugins are available.
That's now all done. If you want to help test this out, get a nightly Deer Park Alpha build and go to all pages you know of that open popups from plugin content, either wanted or unwanted ones (i.e. ones that show up just because you load the page, or ones that show up when you click a button in a flash animation or whatever).
Posted by jst at 10:29 PM | Comments (11)
My First Blogpost!
Yeah, I'm new to this. I've been meaning to start a blog since the beginning of time, it just hasn't happened, until now! The next big challenge will be to actually start posting.
Many thanks to Asa for encouraging me to get off my butt and email Jason (who btw is a great guy, not just for setting up this blog for me) and ask him to set up a blog for me. Thanks! :)
Posted by jst at 10:16 PM | Comments (2)