Surfin' Safari

Front Page  -  Technorati

January 24, 2004


CSS @namespace support

Posted at 8:24 PM

I just finished implementing support for the @namespace directive in stylesheets. The exercise got me thinking about clever ways to represent namespaces, element names, and attribute names efficiently.

Right now KHTML stores the namespace + the element/attribute name as two 16-bit quantities jammed into a single 32-bit value. The high 16 bits represent the namespace, and the low 16 bits represent the element/attribute name.

For HTML elements, no space is consumed at all for this 32-bit value for element names, since a virtual id() method is implemented in all the HTML element subclasses to return the appropriate element id (the upper 16 bits where the namespace would be are just 0).

For XML elements, the element name is a member variable of the XMLElementImpl object. As elements in a document are constructed and new namespace URIs, element names, and attribute names are encountered, the strings get registered and corresponding ids get handed out.

There are several drawbacks to this approach. The first is that the qualified name (including the prefix) gets lost. Technically when you ask for the qualified name via the DOM you should get back the original prefix that was specified.

Second, this element/attribute/namespace URI cache is per-document, which creates a dependency between a stylesheet and the document it's found in, thus defeating the ability to cache a stylesheet in memory for use across multiple documents.

Basically I'm trying to come up with a space and time-efficient solution that doesn't degrade the current performance of HTML elements, but that still performs well for XML elements. Ideally I would not have to have special methods for getting an HTML element name vs. getting an XML element name.

comment (42) -

XML Error Reporting III

Posted at 10:49 AM

Thanks to those of you who answered my question regarding how much of an invalid page should be rendered. It turns out that the XML spec is clear on this issue, and that I must stop building up the page DOM after the first fatal error is encountered.

With that in mind I now tell libxml to continue the processing, but I start ignoring all of the callbacks. That way I get a list of all the errors, but properly stop the DOM tree buildup after the first error.

For those of you who suggested that WebKit needs some sort of error reporting API, I agree, and if it had one, these errors would obviously be reported to it. However, these errors still have to be reported aggressively so that WebKit clients can't mask these mistakes.

I don't believe in showing a sheet or a dialog as an intermediate step prior to displaying a rendering of the page. The reason I dislike this idea is that this error reporting is primarily a Web developer feature, and they're just going to want to load the page, see the errors, maybe correct some CSS at the same time, and then reload with changes until the error report has been eliminated.

The end user isn't ever going to see this report, since anyone who makes an invalid XML file right now ends up with something that won't display in any browser. Thus it seems to me that the report should be easy to access (in terms of # of clicks), always visible, and included with the page rendering.

I have polished the look of the report a bit based off suggestions. Here's another screenshot.

comment (14) -

January 22, 2004


XML Error Reporting II

Posted at 11:27 AM

Responding to comments in the previous blog entry:

(1) Some people thought this was a hacked expat. Darin actually switched Safari over to libxml2, so the error messages you're seeing (as well as the ability to continue parsing) are all built in to libxml2.

(2) Do you think it's better to show the page only up to the first error or to try to display the entire page (with the understanding that what follows the first error could be very badly mangled)?

(3) Often there are a lot of meaningless errors after the first. I could put a cap on the number of displayed errors to deal with this problem or just not worry about it. What do people think?

(4) Those of you who suggested drawers for errors, remember a drawer is a UI element in Safari and not WebKit. This feature should just work out of the box for WebKit clients, so I'm inclined not to use drawers or sheets, but to just display the errors at the top of the page.

comment (59) -

Obtrusive XML Error Reporting

Posted at 12:53 AM

I spent some time tinkering with XML today and decided to try out a non-draconian approach to XML error recovery. Point the browser of your choice at the following XHTML URL:

http://www.faireal.net/soft/browser/XHTML-Invalidator?Content-Type=application/xhtml+xml

If you try this in Mozilla, you should get something like this:

Screenshot

In current versions of Safari, you get something even worse, since you don't even get any line/col information.

What I implemented in my build (it's still just at the tinkering stage remember, so be gentle) is error recovery for non-fatal errors, i.e., the XML parser continues and attempts to recover from the error, and then I still build the DOM for the XML.

Once the parser is finished, I then display the Web page, but with a badge of shame, namely an error report at the top that lists all of the discovered errors. This is not a halt-at-first-error system, which is cool, since it means you'll see *all* the errors in your page and not just one.

Here's a screenshot of what I have so far. The error report is just XHTML as well (shoehorned in at the top using DOM calls), so if you have any ideas of how I could style it to make it look really cool, show me your screenshots.

Let me know what you think of this idea. Do you like it better than draconian error handling? If you dislike it, let me know why!

comment (66) -

January 20, 2004


More on XML Error Handling

Posted at 11:37 AM

I thought I'd respond to a few of the comments I received:

Many people suggested that there be a built-in validator in the browser that could show the errors to the developer. The validators basically break down into two types: obtrusive validators and unobtrusive validators.

If the validator is unobtrusive, then I would argue that it won't receive sufficient usage to make a difference. If the browser doesn't impose a penalty of some kind, then there will be no incentive for the author to correct mistakes.

I can see the value of an obtrusive validator, as long as the obtrusive part was only checking well-formedness (i.e., really basic mistakes).

(2) Some people pointed out that my own blog was not valid. I have two responses to that:

(a) I am not arguing for perfectly valid XML documents. I am arguing for well-formed XML documents. There is a difference. I think asking that the page be well-formed is setting the bar fairly low. For example, one of the current errors on this blog is that I have two elements with the same id. While this makes the blog invalid, it does not have any effect on the blog being well-formed. At least I don't think it does. :)

(b) I'm illustrating a point, namely that I have no reason to make the blog valid, given that browsers will display the blog anyway.

(3) People complained that I wasn't serving up XHTML. I can't actually serve up XHTML if I want the blog to be displayable in all browsers, including Safari, which still has sufficient issues with XHTML that I can't make that switch yet.

(4) My comments on HTML error handling were largely misinterpreted.

Some people thought I was attacking WinIE for its permissive handling of HTML. I was not, and I'm glad others appreciated that fact. Back in the 90s WinIE had to emulate the permissive error handling of the then-dominant browser Netscape. They had no choice if they wanted Web sites to be viewable as the designer intended. They were in the same position then that Safari is in now.

Nor am I suggesting that WinIE should become less tolerant of malformed HTML, or that they are at fault for not doing so. That is simply not a logical conclusion to have drawn from my previous comments. You can't take a Web site (even a malformed one) that works a certain way and suddenly refuse to render it or even render it radically differently than before.

For HTML, this issue was resolved long ago in favor of permissive error handling and recovery, and no modern browser is to blame for that situation.

Others said a browser that handles malformed HTML is better than one that does not, and if Safari doesn't handle all this malformed HTML, then it's simply not as capable a browser.

What amused me about this comment is that there is no definition of what it means to handle malformed HTML. As long as a browser shows you something and doesn't crash, it has handled the malformed HTML. What people don't understand is that you don't simply have to handle the malformed HTML. You have to handle it in exactly the same way as the Web browser that the site author designed for.

If you do not, you'll end up with different renderings of the same page, which as I said before, constitute the largest set of rendering differences between Web browsers. Perfect emulation is what makes error recovery so difficult. If you allow grossly malformed pages, then most XML on the Web will end up being grossly malformed (as is the case with HTML today).

Once you have a Web full of grossly malformed XML, there will be one dominant browser that designers will check to see if the site looks ok. They will then make assumptions that other browsers will recover from the malformation errors in precisely the same way and will simply assume that it is the fault of the other browsers if they don't.

Right now it is the responsibility of alternate browsers to emulate the dominant browser's error recovery strategies, but there's simply no reason to do that for XML as well.

comment (34) -

January 19, 2004


XML, Not HTML

Posted at 10:27 PM

Because enough people seem to be getting confused by my previous blog entry, let me clarify that I am talking about XML error handling and not HTML error handling. Obviously given the current state of the Web, a browser must be extremely good at handling malformed HTML and emulate WinIE as much as possible. We of course are actively working on that with Safari.

The reason I brought up HTML error handling while talking about XML error handling was to point out how much time and effort it costs developers simply to handle malformed content. Also for those who mistakenly interpreted this as an attack on WinIE, of course it isn't. WinIE had to emulate Netscape's error handling, so if you want to blame anyone, blame Netscape.

Right now the browsers that handle XML *are* draconian in their handling, and I see no reason why that would change in the future (unless WinIE weighs in with a tolerant XML parser). In effect as far as Web browsers are concerned we draconians have already won. :)

comment (19) -

January 18, 2004


XML Error Handling in Web Browsers

Posted at 12:04 PM

I've been following the topic of XML error handling on Mark Pilgrim's blog with great interest. Go read this blog entry. Done? Good. Now go read this blog entry.

Safari has draconian XML error handling. If the file isn't well-formed, Safari won't display it. Mozilla does the same, which should come as no surprise, since the two browsers use the same open-source XML parser (expat).

I fall squarely into the draconian camp and agree with Tim Bray. Fully half of the bugs I receive in WebCore are not bugs at all, but are essentially differences in error handling and error recovery between Safari and the dominant Web browser, WinIE. None of these issues occur with XML.

If we lived in a world where browsers could refuse to display malformed content (with useful error notification of course so that authors could easily repair their content), then all of these "bugs" would simply disappear. I could focus my efforts on real DOM and CSS bugs, and not have to waste my time emulating the behavior of WinIE.

Relaxing restrictions on well-formedness is a slippery slope, and where does it end? Consider all the "helpful" rules that exist in HTML today thanks to early versions of Netscape and WinIE. Did you know that any h1-h6 tag can close any other h1-h6 tag? Try it. Open an h1, type some text and then put in a close h2. It will close up the h1 in WinIE and Mozilla. (I haven't yet fixed this "bug" in Safari.) Try specifying a close tag for a paragraph by itself. You'll get an empty paragraph in Safari, Mozilla, and WinIE.

Of course the most complicated error recovery problem is residual style, which I have blogged about at length. This "helpful feature" (note the sarcasm) allows you to accidentally mis-nest style tags like the italic and bold tags and basically treat HTML more like a stream of "on/off" states than an actual tree structure. This feature is more a by-product of primitive browsers from the 90s that didn't have true DOMs than an actual intended error recovery system.

There's also the missing quotes problem, e.g., leaving a close quote off a link href. Browsers employ complicated heuristics to try to match up unclosed quotes that depend on the number of quotes in the document, their positions, and other factors. Safari doesn't really handle this problem that well yet, and it shouldn't have to.

The whole reason nearly all Web pages on the Internet are malformed is because browsers let Web page authors get away with it. As long as browsers are permissive in their error handling and recovery, Web authors will continue to produce invalid Web pages, because they won't even have any idea the pages they are authoring are invalid!

People in the error recovery camp then suggest ideas like icons in the status bar, or error messages dumped to some obscure console, but the average Web designer isn't going to know or care about validation as long as WinIE displays the Web site adequately. The only way you can make the average Web designer care is to get in his face with the obvious errors. The browser has to make a face and refuse to eat the swill that is being force-fed to it, or the average designer is simply going to shrug and say, "Well, close enough."

The crux of the problem with implementing true error recovery is that it must be unambiguous. Every Web browser has to recover from malformed content in precisely the same way. This means that in order for browsers to be tolerant of malformed content, there would have to be a specification regarding how to handle all possible malformations. This is virtually impossible to specify, so why waste time and energy on it when creating well-formed XML files is so ridiculously simple?

I think people who don't work on Web browsers for a living have no concept of just how malformed the Web really is, so let me state this as clearly as I can:

The #1 reason that HTML pages render incorrectly in alternate browsers is because of differences in error handling and recovery.

comment (37) -

January 9, 2004


Redesign *Still* in Progress

Posted at 2:50 PM

Yes, yes, I'm still working on it. A few of the designs aren't uploaded yet, but you will find that the default "Clean" look is very similar to the previous Safari design (for those of you who objected to the other designs).

comment (4) -

Copyright © Dave Hyatt 2003, Design by Stéphane Curzi/ProjetsUrbain.com