« Holiday Links and Trees | Main

February 09, 2006

Fresh XPCOM thinking

Everyone who gets far enough into Mozilla code has that "wow, this is chatty... verbose... inefficient even" reaction to XPCOM -- or so I thought. Having played Cassandra once in the dark days before Netscape 6, lived to witness deCOMtamination, and watched the next generation of core hackers grow up wise from birth, I foolishly believed that we were well past worrying about XPCOM abuse.

But we aren't. Perhaps SVG-the-standard is to blame, but not exclusively. We have to do better, and not just with peer pressure, but I'll spend a paragraph on exhortation too:

Please don't imitate COMtaminated code. Don't use XPCOM in core rendering and layout data structures. Do use XPCOM where it wins, as high-level inter-library and inter-programming-language glue, where the QueryInterface and virtual method call costs are noise, and the benefits for programming in the large are obvious.

More than warning again, I believe it is time to change XPCOM in deeper ways, to fix several problems:

  1. The inability to free cyclic structures automatically.
  2. The code bloat and cycle costs of nsresult return type.
  3. The lack of nsIClassInfo implementation.

Probem 1 means memory leaks, forever, because we'll never cover all cases, and new cases are being added constantly (sometimes just with JS completing the cycle, no C++ changes -- so via a Firefox extension, for example).

Problem 2 makes our source code overlarge and hard to read, never mind the terrible effects on register pressure, instruction cache utilization, memory bandwidth wasted on the out param that wants to be the return value, etc.

Problem 3 means we can't readily reflect on instance-of relationships in JS except for DOM objects. Having DOM class information (what interfaces does this object implement? is this object thread-safe?) is cool, but we still don't take advantage of nsIClassInfo (e.g. for the solution to problem 1 suggested below) for want of ubiquity. (Another example: we believe we could use it to avoid all locking costs in XPConnect.)

These problems can be solved:

  1. I pointed dbaron and graydon at work by David Bacon et al. on reference-count cycle collecting that we could infuse into XPCOM, curing all cycle-induced leaks. We may hope to have fixed the last such leak, but Murphy was an optimist. XPCOM without some kind of cycle collector is like a car without a brake.
  2. I'm looking for volunteers to help run some experiments testing the state of C++ exception support on the compilers we care about (mainly GCC and MSVC). What is the code size increase from turning on exceptions (rtti too if required)? That's the first experiment to run. Actually converting our mega-lines of C++ code to use exception handling instead of nsresult returning and testing will be real work, and lots of it. I have some thoughts for how to semi- or fully-automate that work that I'll blog about soon.
  3. Right now the way you implement nsIClassInfo involves painful C macros, so no one does it. With some C++ sugar, and with the right defaults, defining class information for each concrete XPCOM class should be simple, so we can actually count on it being there.
These problems should be solved, and I'm committed to working to solve them. Who is with me?

Posted by brendan at February 9, 2006 02:57 PM

Comments

I'm all for pushing XPCOM into the 21st century!

Some numbers: VC8 official build, zipped (not 7z'd):

Without exceptions: 7,038,820
With exceptions: 7,768,638

10% extra, zip'd size. Not sure if that would go down with 7z'ing, and I haven't tested the performance of the two builds.. I'll probably fire up Trender in the next day or so and see what I get.

Posted by: vladimir [TypeKey Profile Page] at February 11, 2006 02:11 AM

"Problem 2 makes our source code overlarge and hard to read, never mind the terrible effects on register pressure, instruction cache utilization, memory bandwidth wasted on the out param that wants to be the return value, etc."

Well I can understand the value in readability, but won’t the use of exceptions be even more disastrous on performance?

Posted by: alexr [TypeKey Profile Page] at February 11, 2006 12:38 PM

Vlad, is that with rtti as well as exceptions? Is rtti truly required for exceptions?

/be

Posted by: Brendan Eich [TypeKey Profile Page] at February 11, 2006 01:45 PM


It looks like that's with RTTI; however, so is the without exceptions number. /GR (enable RTTI) seems to be the default now with VC8, and we don't explicitly turn it off. Rerunning with /GR-. (I need to get these patches into the tree; the --enable-cpp-exceptions and rtti configure options only work with gcc right now.)

Posted by: vladimir [TypeKey Profile Page] at February 11, 2006 02:57 PM

Ok, new numbers:

VC8, size of .zip (just "make" in browser/installer):

no exc, no rtti: 6,891,769
no exc, with rtti: 7,038,820 (+147,051 +2.1%)
with exc, no rtti: 7,622,794 (+731,025 +10.6%)
with exc, with rtti: 7,768,638 (+876,869 +12.7%)

So RTTI doesn't seem to be required, but is a minor hit compared to exception handling.

Posted by: vladimir [TypeKey Profile Page] at February 11, 2006 04:25 PM

Alex: no, exceptions would not obviously make performance worse. The most common "handler" for a failure nsresult is:

if (NS_FAILED(rv)) return rv;

or similar. With the right rewriting rules, this pandemic pattern *goes away*. Failure is rare, and should be rarer once we get rid of overloaded "COMFALSE" and similar nsresults that should be plain return types.

/be

Posted by: Brendan Eich [TypeKey Profile Page] at February 11, 2006 05:52 PM

Two notes on the codesize numbers:

1) The change in mZ is a lot more important than the zip size to me (since mZ is what actually affects things like startup time, i-cache pressure, etc).
2) I'd be interested in seeing the following 4 numbers compared (independent of RTTI):

1) Vanilla build
2) Build with NS_ENSURE_SUCCESS defined to be nothing.
3) Build with exceptions.
4) Build with exceptions and the NS_ENSURE_SUCCESS change.

I realize that not all of our code uses NS_ENSURE_SUCCESS, but that would at least give us some order-of-magnitude ideas about how replacing it with exceptions would work...

As a particular metric, I would be interested in the size of gklayout.dll/.so.

Posted by: Boris [TypeKey Profile Page] at February 11, 2006 09:51 PM


gklayout.lib size:

53996044 firefox-official-ex-nortti-noensure
54026988 firefox-official-ex-nortti
59480644 firefox-official-ex-rtti
52476526 firefox-official-noex-rtti
47022886 firefox-official-noex-nortti

I don't have numbers for a vanilla build with NS_ENSURE_SUCCESS defined to nothing, but it doesn't look like it'll make that big of a difference.

Posted by: vladimir [TypeKey Profile Page] at February 13, 2006 12:06 AM

I couldn't disagree more WRT exceptions. Please see http://benjamin.smedbergs.us/blog/2006-02-18/exceptions-dont-solve-the-problem/

Posted by: bsmedberg [TypeKey Profile Page] at February 18, 2006 06:15 AM

Please don't change the XPCOM ABI! :-)

I'm skeptical. Do we have proof that checking nsresults is actually hurting performance and that exceptions would make it better? Changing our entire codebase over to use exceptions would be a monumental challenge. You cannot just compile away NS_ENSURE macros. It would be a huge undertaking to revise all of our error handling. I think there are far better uses of our time.

As for nsIClassInfo, I think we should make the macros better and encourage their use. For example, I'm all for adding nsIClassInfo to Necko classes. I think we should blackflag every explicit QueryInterface in our JS code and figure out if we can't replace it with nsIClassInfo. That will make our toolkit easier to use.

As for reference cycles, they are generally introduced through observer patterns. I think we should harden our observees to ensure that they only introduce reference cycles that they are guaranteed to break. (e.g., nsIChannel implementations are required to release their listener in OnStopRequest.) That will make our APIs more robust to abuse from poorly written extensions. Of course, there are other sources of reference cycles, but we can get a lot of mileage out of our existing architecture and some well placed effort.

I guess I'm just not convinced that we need to rip-n-replace the architecture that has gotten us this far.

Posted by: Darin Fisher [TypeKey Profile Page] at February 18, 2006 09:42 AM

Benjamin, your write-up didn't do more than make assertions. I am provoking, and attempting as time allows, actual measurement as proof of improvement.

Of course doing what bz proposed was just a first step to measure easy wins involving NS_ENSURE_* macros. No one said otherwise, and it's not worth arguing about. The proposal I've made stands or falls on code size once the source has been freed of most nsresult testing and consequent early returns, and results have moved back to the return value from the last out parameter.

Darin, none of these changes involves manual "rip and replace" of the entire architecture. As you note, class info is additive. A cycle collector requires object layout to be specified, but that is also additive -- no ripping or replacing.

The exception experiment is r&r, but here's the deal. To be compelling, it must involve changing a large body of code, proving both significant savings and correctness. It can't be done by hand, nor do I propose such a waste of time. The idea is to use a sound static analysis framework to assist in the rewrite. The goal is for most cases to be handled automatically. More on this in a bit.

/be

Posted by: Brendan Eich [TypeKey Profile Page] at February 18, 2006 04:08 PM

Sorry, I forgot to add in reply to Benjamin's post that there are always ways to bridge an exception-based XPCOM C++ binding to the current one, in both call directions. I don't want to break XPCOM, I want to fix it compatibly. If this isn't possible, then I want proof that its nsresult design, which we know makes our source code significantly bigger than, e.g., WebKit, is not a source of significant code size and cycle costs.

If we can prove all that, I'll gladly conceded that we shouldn't change XPCOM. Otherwise, all the MSCOM compatibility in the world (which we can preserve with a bit of bridging work) doesn't cut it for me.

/be

Posted by: Brendan Eich [TypeKey Profile Page] at February 18, 2006 04:10 PM

One more point: the grass may not be much greener, but it is not obvious that making every single last call site worry about nsresult testing beats making far fewer call sites worry about using try blocks. To repeat a point for the third time, all the

if (NS_FAILED(rv)) return rv;

statements go away with exceptions. Those blocks are not virtuous in my eyes, they are hazardous. We know of many bugs where the wrong rv was returned, or a failing rv was dropped on the floor (bad when there is an exception left pending in JS). Such lines are too often mindless bloat, salt thrown over one's XPCOM shoulder. Enough with superstition.

/be

Posted by: Brendan Eich [TypeKey Profile Page] at February 18, 2006 04:15 PM

> The exception experiment is r&r, but here's the deal. To be compelling, it must involve changing a
> large body of code, proving both significant savings and correctness. It can't be done by hand,
> nor do I propose such a waste of time. The idea is to use a sound static analysis framework to assist
> in the rewrite. The goal is for most cases to be handled automatically. More on this in a bit.

OK, I'm curious. In my opinion, solid error handling is paramount, and I am skeptical (hopefully, understandably) of any automatic tool that changes error handling code paths.

In Necko-land, I've worked hard to ensure that all errors are handled. It's true that people often do not take such care. (Indeed, there is a reason why I had to do so much work on Necko to make it handle error conditions properly, and there's still more work to be done.) At any rate, I'm not convinced that exception handling results in better error handling in practice for many of the reasons bsmedberg enumerated. There's simply no substitute for good programming ;-)

Posted by: Darin Fisher [TypeKey Profile Page] at February 18, 2006 05:00 PM

> mZ is what actually affects things like startup
> time, i-cache pressure, etc

The mZ effects of exception handling are hard to interpret. The unwind tables, if implemented properly, should reside in their own pages, never to be touched until an exception actually occurs (which should be rare if exceptions are being used properly), and therefore don't affect i-cache and should have a minimal effect on startup time. I sure hope exception unwind tables don't require dynamic linker relocations.

Posted by: Robert O'Callahan [TypeKey Profile Page] at February 19, 2006 02:15 AM

Darin: amen to no substitute for good programming. We are arguing about trade-offs among exceptional case handling techniques. Handling all cases correctly requires care, as ever. Different programming languages can help or hinder. Modern type systems in particular can help quite a bit, but C++ and C are not modern in their type systems.

SpiderMonkey (in C, so no chance even for C++ exception handling) uses explicit return testing, overloading null for error on pointer return type, using JSBool with false for error otherwise (mostly).

Cairo's approach keeps error state in each object, and makes objects go safely dead after an error. So you can generally avoid querying error state after each operation. Committing unwanted effects instead of stopping short on error becomes the new hazard. This may work better for graphics code than for other kinds of code.

There's no panacea, but the way nsresults are used today is pessimistic. Most of those if-return statements are properly predicted not-taken by compilers and hardware branch predictors. That suggests that there are better optimizations than writing them unalterably into the source code, and fetching them (when short) in the instruction stream.

/be

Posted by: Brendan Eich [TypeKey Profile Page] at February 19, 2006 03:19 PM

roc wrote:

"The unwind tables, if implemented properly, should reside in their own pages, never to be touched until an exception actually occurs (which should be rare if exceptions are being used properly), and therefore don't affect i-cache and should have a minimal effect on startup time."

This implies that some cases (I've alluded to NS_COMFALSE, but there are other non-NS_OK nsresults used frequently to indicate cases that may not be exceptional), a method will want to use a boolean return type, or some similar way of indicating that a less-frequent, but still common, case arose.

Using a magic nsresult (NS_ERROR_NOT_AVAILABLE, e.g., in the XUL FastLoad code) made sense when the return value was preempted by nsresult to match XPCOM's pattern. Freed of that pattern, such code may not want to throw such an nsresult code if the cost of throwing is high compared to using an "in-band" return code.

/be

Posted by: Brendan Eich [TypeKey Profile Page] at February 19, 2006 03:25 PM

Post a comment

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?