Bug 259708: A human perspective

I've filed good bugs before, innovative bugs, bad bugs, serious bugs, bugs I fix, just-plain-silly bugs, bugs I intend to fix but never get around to doing... all kinds of bugs. But I'd never filed a bug that implied a potential security hole.

In this particular bug, there were mistakes made, certainly by me, possibly by others and the community-at-large. I wanted to take a moment to tell the story behind this particular bug, the part that the bug itself probably won't tell you. I will say this: Bug 259708 is rare among bugs filed at bugzilla.mozilla.org in that it had no truly meaningless comments. So, before I roast anyone, I want to say this one was largely done right.

On to the story.


I was about to post a new weblog entry unveiling a new weblog tool I'd created. It wasn't anything revolutionary, but it was very useful. On my Linux operating system in Mozilla, it worked nicely. On my Windows operating system in Mozilla, it worked nicely. It generated no strict warnings, no errors, just a direct file download as it should have. I was happy as a clam. Then I decided, just for the heck of it, to try it out in Firefox 1.0 Preview Release.

*blink* The download didn't happen. It was stuck. So I clicked cancel. In the background, I had the target directory.

Huh? The directory looked a lot lighter than it did a moment ago...

*PANIC*

In a matter of seconds, I had my entire Downloads directory, save for one file within it (an Amaya zip file), destroyed. I knew that was not normal behavior, not by any means. I went to my Recycle Bin, to see if the files went there. Nope (but it would be nice for Firefox to move files to the Bin instead of deleting them). I did a quick chkdsk to see if they could be recovered as lost file fragments. Nope. I scrambled to think of what else I might do to recover those files. Nothing came to mind (even though I know there are undelete utilities out there, it didn't occur to me at the time). The files were definitely gone.

Worse than that, it was through something so simple. A little-known but solid protocol that had been implemented in Mozilla a long time ago. No way I could have expected that.

The next thing I did was to create a couple dummy files in the target directory, and try to reproduce the bug. (Note I was still in panic mode. This was all happening over the course of a couple minutes.) In moments, there was no doubt: Mozilla Firefox was trashing my files.

Once I had a reproducible testcase, I turned around and immediately reported it as a critical bug with keywords "dataloss, testcase". Because I've been filing legitimate bugs for a long time, I didn't have to wait for someone to confirm the bug: it was immediately marked NEW and assigned to Ben Goodger, who heads the Firefox development team. I could not attach my testcase to the bug while filing it, nor could I request a Firefox 1.0 (final) blocking flag. So, once the bug was assigned a number (259708), I immediately filed the testcase as an attachment. The testcase was itself the weblog tool.

I specifically laid out steps to reproduce which would not endanger any of the tester's important files if he or she followed the steps. What happened to me was traumatic, to say the least, but not fatal. This is largely due to experience in using computers: I always create a special directory for downloads, and I always make sure my downloads go there, isolated from other files. Had I left the downloads directory to the default, I would have kissed my Windows desktop goodbye. I immediately wanted to make sure anyone trying to reproduce the bug was ready for it, and wouldn't lose their desktop.

I asked for blocking-1.0-aviary, citing it as "traumatic dataloss. Yes, it's a rare scenario, but it's very evil." I then watched for reaction on the moznet IRC network, praying someone would notice and agree with me. Blake Ross (one of the Firefox drivers, I believe) agreed with me very quickly and gave me a + on the blocking attribute.

I left then to grab a bite to eat, not only because I was hungry but because I was still absorbing the implications of this bug. Here's where I made my first mistake. While I was out, I started wondering if the bug was itself a security bug. The default downloads directory for Mozilla Firefox on Windows is the Windows desktop. I can only assume it's similar for the Linux and Macintosh platforms. So potentially, you could lose your entire desktop to one link. Or worse, someone could craft a webpage telling you to save such-and-such link to your C:\ directory, and then run it. I thought about it, hard. But then I dismissed the idea. Losing data is horrendous, yes, but not as bad as losing it to someone else. That just wasn't happening here. So I decided not to ask for a security group review. That was my first mistake.

Lesson Number One: The very instant you start to wonder if a bug might cause a security concern, stop wondering and ask the security group to review. Don't try to do the security group's job by trying to decide if it really is one or not.

The worst they could do is say nothing, and that's not very bloody likely. If it was, then there'd be no point to having a security group. But they do take their duties very seriously, as I will note later on. Had I asked, I would have received an answer, and I might not have done other things that seemed like a good idea at the time.

Incidentally, some within the security group feel they are already swamped with bogus reports, and that encouraging others to add reports because they might be security-sensitive is not conducive to the process. I answer this by saying that it's better to be cautious and report a bug as security-sensitive, rather than to do what I did, and then have it realized as a security problem long after the fact. Security isn't the only area swamped with bad bugs, and it's generally the same problem mozilla.org has with bug triage anyway.

Also, I want to encourage people who do file bug reports but are not experienced at it to please use the Bug Helper. Filling in the blanks it offers can make a tremendous difference in how the bug is handled. There are also other important steps, like coming up with reproducible testcases. If it can't be reproduced, it will be nearly impossible for anyone to investigate it.

If you are new to bug filing and think you might have a security issue, ask for one of the security group members to chat with you privately over IRC. Not only can they give you a good opinion on whether it's a security-sensitive issue or not, they can also give you a good opinion on whether it's a valid bug or not... Of course, new bug-filers asking on IRC about a bug's validity before you file it is also a good idea, whether it's security-sensitive or not. Half the bugs in Bugzilla are not valid or are duplicates, so a few quick questions on IRC can save everyone a little heartache.


Once it was clear that bug 259708 was a legitimate Firefox 1.0 blocker (though still filed as critical, because it wasn't hindering Firefox development), I relaxed a little. I assumed the problem would get resolved very quickly. Particularly if I helped.

I was wrong.

I started looking around the source code, trying to track down where it went bad. My first guess was nsExternalHelperAppService, which was an obvious target because, well, it's the first thing that matters when we're saving an unrecognized file. I looked, and I looked, and I found no smoking gun.

Well, this was code I wasn't familiar with, so I started crying for help. I did everything I could to get eyeballs on the bug, and on the source code. Every chance I got, I asked people on #developers and #firefox to look at the bug and see if they could figure out what was going on. Everyone seemed to agree that it was a bad bug, but very few people really started looking into it. I started wondering why exactly one file survived in the doomed directory, and I had a nasty thought: the file had come from a CD-ROM, and other files I'd dumped from the CD-ROM had had a read-only attribute set on them. I retested the bug without the file having that read-only attribute, and the whole directory disappeared. It was a little worse than I'd originally thought, and I resummarized to note this.

I took a step back, and noticed that the filename which Firefox was giving the data: pseudo-file was actually the directory name. That was highly unusual, and I began to suspect the problems lay a little bit earlier in the code execution... like, in the code that actually opened up the Download Manager.

I started digging around in there, but got nowhere. I just couldn't track it down with my eyes, so I tried to find people who knew the code. I blamed nsExternalAppHelperService and nsDownloadManager, and Christian Biesinger answered asking for details (I unfortunately was unable to provide them). In a concerned moment where I wasn't seeing any progress at all, I filed a weblog entry calling attention to the bug, and for a short time that worked: Darin Fisher responded by pointing out a patch which had landed on the Mozilla trunk, as a safety measure. He suggested the patch itself might fix the bug.

Chris Thomas (bless his soul) took the time to look into the code, comparing it against Mozilla source code in the corresponding files. He noticed that the patch had not been checked into Mozilla Firefox or the stable 1.7.x branch. He discovered that the patch itself was incompatible with Firefox code and with the stable branch, so he took it upon himself to write a new patch that effectively did the same thing. After some confusion on IRC and the bug, the patch was quickly checked in, and I smiled. I thought, "Okay, my problems are probably over, and I can note in my blog entry that the patch was baking, with good prospects."

About the same time, I started thinking I couldn't wait for the next day's nightly to appear. There had to be some way to get a build without waiting around for the nightly to be certified. This bug was too serious. So I looked at the Aviary-1.0 tinderbox, and found "sweetlou" building Firefox 1.0 builds. I thought, "Okay, that's a likely source for nightlies." So I dug around a little in the FTP mirrors for mozilla.org, and quickly found a URL which led to a build copied directly from sweetlou.

The file sweetlou had migrated to the FTP server had a timestamp about four hours later than "now". I thought this was amusing, and thought nothing of it. I went to test the bug, expecting it to work.

I was wrong again. The patch, though correct for Mozilla Firefox and the trunk, had not fixed the bug. I quickly reported this, and then began to have my doubts even as I noted the bug was "still busted!" on my blog entry.

Because the FTP server had a funky timestamp, it undermined my confidence in the build. Did I download a build from before the patch, or after? With some help from IRC (I forget who it was, and my apologies for that), I opened the compatibility.ini file Firefox generated at startup, and sure enough, it had a timestamp of a build that started after the patch had landed. Just to be safe, twelve hours later I downloaded another sweetlou build that could not possibly have been before the patch. Much to my disappointment, the bug was still there.


Therein followed a lapse of activity for four days. Not for a lack of trying on my part! I begged people to look at the bug and the code that might be causing it. I begged people to test the bug, because until then, no one had tested it in any platform except Windows. Because there are different versions of nsLocalFile.cpp for each operating system, this was a significant concern for me. If it was a Windows-only bug, then Linux users wouldn't be affected. Ditto Macintosh users. Knowing whether it was specific to Windows would also eliminate the common files from the bug, and narrow the search significantly.

No one heeded my pleas to test. Chris Thomas was sympathetic, but he didn't have a Firefox on any operating system other than Windows. Here we have a critical-severity bug, two weeks old at that point, and no one had bothered to find out whether it was isolated to one operating system or affected everybody. I'm not that experienced with debuggers, and I didn't have a Firefox tree. Out of sheer frustration, I spent several late-night hours grabbing an Aviary-1.0 branch specifically to build Firefox to test this bug. I discovered (without much surprise at this point; I must have been getting numb at having to do all this) that the bug did indeed affect Linux.

Lesson Number Two: We need more people building Firefox themselves, instead of getting firefox-win32-installer.exe. We also need more people on operating systems other than Windows on-hand to do QA efforts.

It's ironic. The Mozilla Foundation just had a "Spread Firefox" campaign success beyond their wildest dreams, gathering in more than twice their stated goal in the alloted time. This is an example of success coming back to bite us. Firefox, Firefox everywhere, and not an untested build to check.


Four days later, I got another bombshell. At the library, I decided to check on the status of the bug. Now, the library insists on using Internet Explorer. Evangelization seems to be in order. IE doesn't work with GMail, and a few weeks ago I switched my Bugzilla account to send e-mail to GMail. So imagine my surprise when bug 259708 suddenly says "You do not have appropriate permissions to access this bug," in big black letters and a red background. This is a bug I filed, in the world's best-known bug database, run by a foundation that prides itself on hiding very little?

I logged in to Bugzilla, and found that I could access the bug. So I relaxed marginally. Then I saw Ben Goodger's comment, immediately after my last one, and I quote:

Asa suggested this become a security-sensitive bug due to possible side-effects.

Oops. That first mistake I made came back to bite me now, and I would now compound it by my reaction.

I should make it clear that I generally support mozilla.org's security policies and practices. The Mozilla Foundation classifies security-sensitive bugs largely to protect the user community from black-hat types who are savvy enough to create exploits. On the other hand, once a bug has been announced publicly to the community-at-large, the security group generally acknowledges it and opens it to the public again.

Up to this point, I had not only announced it publicly to the community-at-large, I had done everything I could to attract attention to the bug. This was in keeping with the idea of open-source development and of course with the severity of the bug. Now, I have been known to get a little zealous on a bug once I file it, for a short while at least, but this one I just didn't dare let up on. The consequences were too horrifying to let it go. And yet, I had dismissed it as a security concern. Asa apparently had other thoughts.

Good for you, Asa!

Only now what am I supposed to do?

I had two choices. I could raise holy hell about how everybody already knew about this bug and/or should know about this bug, pointing out all my efforts to spread the word and get the bug fixed. Or I could shut up and retract the weblog entry I'd already made about the bug, and stop all my IRC conversations (aside from private ones to security group members) about the bug.

I chose to shut up. I became as quiet as a mouse (the one sitting next to your keyboard), and hoped for the best, while quietly tracking progress on the bug and getting an Aviary tree.

Now, in retrospect, either decision would have been a good one. Maybe not the "holy hell" bit, but a polite request to re-expose the bug would have been in order. Nonetheless, I posted a public apology for retracting the blog entry, without specifically referring to the blog entry retracted. In a sense, that was my second mistake.

Lesson Number Three: When it comes to security, you have to decide on all or nothing. No matter how trivial it is.

By this, I mean posting an apology was neither all nor nothing: although I felt it was polite, it didn't help matters much...

The community-at-large is not made up entirely of fools, despite what many of us more experienced Bugzilla users may think. Other people do pay attention to us, and to planet.mozilla.org. It's my number one news source for Mozilla-related development. Well, some smarty-pants decided to repost my entire blog entry about bug 259708 as a comment on one of my entries, with an e-mail address of "fulldisclosure@netsys.com". Word for word, no changes, and no commentary either.

This annoyed the hell out of me. On the one side, I could see this anonymous poster's point: the bug was already in the public domain when it disappeared very suddenly. On the other side, the Mozilla Security Group has very good reasons for classifying a bug.

I have my reasons for doing what I did: I have never before retracted a blog entry after it stayed up for so long, and I never want to. I strongly believe in supporting freedom of speech, as much as I believe in supporting good software. The only comments I delete, per my own policy, are the ones which are spam.

I'm not a fool either. Because of blog-spamming, I had set my blog configuration to only show posts I personally approve. I deleted the comment, and it never saw the light of day. At the same time, I searched through the Full Disclosure mailing list, trying to find any indication that the blog entry or bug report had appeared on the list. I never found one. So, in my eyes, I never had any justification to say to the security group: "Look, people already know about this, and hiding it doesn't do us any good. Open this up."

Whoever it was who threw my words back in my face: please, give me a break. By the time you read this, the blog entry I retracted will be restored in its entirety to the weblog (with the summary altered to reflect the bug being fixed). I had my reasons for retracting it, and I will not dispute the Mozilla Foundation's security policy without a better reason than an anonymous person showing off how smart they are or how stupid I look. Also, I would have greatly appreciated having a link to find that particular "disclosure" within your mailing list, so that if it was there, I could use it to do what you would have wanted me to do: plead for the bug to be declassified. I would also have appreciated having your e-mail address to contact you, instead of the mailing list address. Without either of these, I was left with no choice but to delete your comment and ultimately ignore it, except for this section here. I don't take criticism of my actions personally, unless I can't respond. Your criticism was legitimate, but you didn't give me any room to do anything about it, one way or the other.

When it comes to developing projects or software, my ego takes a back seat to what's right. When I'm wrong, I'm wrong, and I'll admit it. But give me some way of getting proof that I'm wrong. Ambiguity drives me crazy.


I also should note that the Mozilla Foundation's own documents were not very helpful to me in this little ethical crisis.

Lesson Number Four: The Mozilla Foundation should clarify what should happen if a bug is considered eligible for security classification long after it is publicly exposed.

I think I did what was right, but nowhere in the security policy did I find a reference telling me what I should have done. The reporter of a bug, per the policy, has the authority to request the bug be exposed. But I didn't have much of an idea what was best for the community. As it was, I felt that by shutting up and retracting the blog entry, I was cooperating with the intent of the security policy to the best of my ability.

But the simple fact that the bug was classified so long after it had been filed is not a good sign. It's very unfortunate and looks bad for the Foundation, in my opinion. It had the best of intentions, but left me as the reporter and a very concerned developer in the lurch. Not only that, but the simple fact is that there had been plenty of talk about the bug previously, and no one considered this when they classified the bug.

I suspect this is a situation which has rarely arisen before, if ever.


Very shortly after the bug was classified for security reasons, a patch was made available. Less than half an hour, in fact. (On examining the patch, I found out I'd been looking in the wrong place.)

This is good for the security group, but bad for mozilla.org as a whole. This bug was so serious that the it was retroactively identified as a potential security hole. Several developers I talked to on IRC agreed with me on just how serious it was. It was a dataloss bug, one of the three kinds of bug that automatically qualify as a critical bug. (The other two are crash bugs and hang bugs.)

If a Mozilla product can cause a dataloss accidentally, it can cause a dataloss deliberately at the hands of a malicious programmer or website developer. Whether this particular bug was legitimately a security issue or not I will leave up to the security group and the Mozilla Foundation staff to determine.

If you as a QA person see a bug filed with the dataloss keyword, take a moment to look in the bug activity log and comments to see if anyone has even thought of the security implications. If not, raise the issue with a security group member immediately (in private!). Get someone from the security group to examine it for a potential hole and comment on the bug, whether or not they see a hole on it or not. If they don't see a hole, they should say so. If they do, they should classify the bug and say so. But either way, they should consider it and comment on it.


Conclusion

I never intended to go hunting for security bugs. I never expected to actually find one. That said, when I might have found one, I reacted incorrectly but with the best of intentions. Once you acknowledge that I incorrectly handled the security concerns, you can see that everything else I did was in good faith and what any responsible developer would have done.

Other minor scuffles caused this bug to proceed to a conclusion much less quickly than would have been ideal. Particularly since it was so serious and was never disputed as a legitimate bug. I believe that the community, even among those of us who recognized that threat, failed to act on it in the proper manner in several respects. Security concerns aside, the simple nastiness of the bug should have prompted more people to investigate the code causing it. Ben Goodger knew exactly where to go to fix this, and for that I respect him. (I should also note he was on vacation for much of the time this bug lurked.) Not everyone else has his experience working with the Firefox code, and it's a great opportunity to learn. Too many developers passed on it, simply acknowledging the seriousness of the issue and not contributing much to potential solutions. Chris Thomas did step up, and it's not his fault the patch he applied didn't fix the bug. Christian Biesinger stepped up to defend his code, but I don't think he did so without considering what I was suggesting. All three of these people did their part. I'm not sure who else did.

At the same time, I have to acknowledge that other developers do have their own lives to lead and their own code to work on. All I'm saying here is that simply agreeing with me on the severity of a bug and doing nothing to help fix it is almost as bad as saying nothing at all. I don't blame any one person for failing to contribute. I simply say to the community-at-large: Hey, we can do better than this.

If you have any critical-severity bugs you think are not getting enough attention, and you are a regular contributor to the community, feel free to ask me about them. I may or may not be able to do anything about those bugs, but if you ask me nicely to try tracking it down, I will usually say yes. (By "regular contributor," I mean someone who files good bug reports and typically doesn't file UNCONFIRMED bugs. I want to spend less time on triage than on tracking down bugs. I'm still learning C++ code, and QA is a humongous undertaking.)

Alexander J. Vincent