Sometimes, I don't understand VMware at all
pacifica-vm, which most of you probably know as the Firefox 2 Windows nightly build machine, has had an interesting week.
Last week, I took the machine down to back up its virtual disk image, increase the amount of RAM available to it, and, in an attempt to decrease the cycle-time, added a VMware virtual-CPU to the VM, increasing it to two.
This didn't really have the intended effect. Cycle times for both nightly and depend builds went up by about 15%.
Thinking that maybe the build system wasn't making as efficient usage of its shiny new virtual CPU as it could, I upped make's -j value from 3 to 4. This reduced cycle times... to what they were before the memory/CPU "upgrade," but also had the useful side-effect that make would hang every few builds, including most notably on nightly builds. (That is, incidentally, why nightly builds on Tuesday, Wednesday, and Thursday were all late; make kept hanging overnight with -j4.)
Finally, last night, I removed the second VCPU, but kept the extra memory and higher -j values.
That change not only made the machine start reliably producing nightlies again (or, at least, make stopped hanging), but it took the cycle time down to 40 minutes for a depend cycle, and 2ish hours for a nightly build. (Interestingly enough, that full build value seems to fluctuate anywhere from about 90 minutes to just over a couple of hours; I think this is because the trunk build machine and the 1.8 build machine are on the same VM box, and they're both starting their nightlies at the same time, which slows both of them down a bit.)
So, to recap here:
| Nightly Build | Depend Build | |
|---|---|---|
| Before changes | ~ 2 hours | ~1 hour |
| After memory/CPU "upgrade" | ~ 2 hours, 20 minutes | ~ 1 hour, 15 minutes |
| After adding -j4 | ~2 hours; hung often | 1 hour; hung often |
| Remove one VCPU | ~ 2 hours; jury still out, though | 40 minutes |
- Linking Firefox, especially on Win32, takes memory. A lot of memory. In the couple of trials I paid attention to, it took around 700 megs. Seeing as the VM had 700 megs, a large part of the problem seemed to be the machine descending into swapping thrash-hell when trying to do the final link.
- Win32 SMP = Teh suck. I had actually learned this from previous experiences in previous lives, but... a reminder is always good.
- When you actually pay attention to VMs, and spend some time "tuning" them—which in this case, amounted to creating a better match between the memory profile for the machine's task and the virtual hardware—VMs don't perform all that badly, relatively to physical hardware. gaius-vm, for instance, has horrible cycle times compared to gaius, but it's not "just because it's a -vm." It's because no one's paid enough attention to it after migrating it to tune it. (No, I haven't taken what I've learned here and applied to gaius.)
- Once again, sometimes... VMware['s performance] continues to confuse the hell out of me.
Comments
At Flock, we've run into the mysterious make hangs too. It was happening one of our non-vmware win32 build servers, a dual proc machine. Removing a cpu made the problem go away, and actually didn't impact build time negatively.
My hunch (and only a hunch, I've not spent any time verifying this) is that cygwin simply isn't SMP safe.
While on the subject of Win32 build optimizations, we've also found out that NTFS is significantly slower than FAT, and that FAT on a real disk versus FAT on a ramdisk gives a negligible difference in build times.
Posted by: Manish Singh | September 30, 2006 12:11 AM
Important tips when building with cygwin.
#1 Never use more than 1 cpu, most of the time you will not get the race-condition error message, and a complete failure you will just get a slowdown, that slowdown gets worse with time if you don't reboot. On average within 3 days 2 orders of magnitude slower than the first build
#2 the VFAT thing is a huge time savings for several reasons.
1 - no access time recording
2 - no complicated ACL checks
3 - optimized for smaller file size (larger files will get fragmented) making seeks faster
In vmware make sure you produce a pre-allocated disk, the dynamic ones have noticably worse performance.
I was seeing 30-50% reduction in build times under vmware with switching to fat over NTFS and that is after already having write-caching turned on.
We see more speed bennifits from VFAT that mozilla may for incremental builds because we use subversion, which has far more small file io
#3 RAM
linking flock takes arround 512MB of RAM. Generally I give a build instance a little over a Gig. 3GB gives no advantage for speed.
We are getting about 120 minutes on vmware for clean build including checkout time and such.
About 35minutes for an incremental build on vmware.
Linux 2.6 host OS, with 4GB RAM, 2x2.54GHz cpu, running two vmware instances
Posted by: Robin * Slomkowski | September 30, 2006 7:35 AM
I use also vmware to compile windows and it takes also over 2hours (3ghz, 4gb ram) to compile it but for me linking consumes at maximum 862mb (700mb is for me only the average)
Posted by: Marcus | October 5, 2006 5:58 AM