I'm glad I haven't released an 0.1 of Abacus yet. Because I had to go back and damage what I've already got in order to get another feature working.
To explain: I'm working on figuring out a mechanism for creating a detailed presentation "branch" (as opposed to simple ones you create in the Template Editor), based on a content MathML "branch" of the semantics element. The first step is of course to figure out which content MathML templates from the template editor could be used to recreate the current MathML fragment. (In English, that means figuring out how the user wrote it before.) From there, I can then directly translate to the appropriate templates in presentation MathML, XHTML, what-have-you.
I think I solved the first half of it, the search portion. The only complication I see on the second half is from the fact that I permit and recommend a "fitb-set" element frequently. It's basically a for-loop in XML, so I need to code for that appropriately.
... when you finish basic testing of a feature that lets your pet project write good markup, and then turn around and read the markup back in.
I know I'm good, but I can't help thinking that there was some serious divine intervention in making it work so quickly. Gervase Markham isn't the only one hacking for the good Lord. 8-)
I contemplated putting in an easter egg about that sort of thing, but I will probably not include it. Given that I intend this project to eventually be used in public education, and this country's First Amendment guarantees, it would mean a scandal when schools figured out the easter egg.
Incidentally, this happens to be the 75th entry in my weblog (not that it matters).
Over the weekend, I wrote out code for exporting MathML content from the Abacus MathML Editor into the application that calls on it. It was remarkably painless.
Just as important, though, is figuring out how to import MathML content from the caller app into the editor.
That has so far meant two days of straight JavaScript code writing, and will probably take another day of testing and debugging to get it right (hence the title of this entry).
Already for the 0.1 release, I'm forced to make some sacrifices which probably will not endear me to those who want a MathML editor in Mozilla.
For one, I'm currently working on making sure Abacus can read what it writes. This is one aspect of "dogfood" as I see it. (Remember, Abacus is oriented towards content MathML, with presentation MathML as annotations.) Once that's done, I will work on making it read MathML markup which is exclusively content MathML.
Yes, that means presentation MathML gets left in the dust.
It's too bad, really. Amaya (which as I've said before was a big inspiration for Abacus) only supports presentation MathML. But the problem for Abacus lies in the fact that presentation MathML does not convey exact definitions. That's why we have content MathML. Trying to deduce content MathML from presentation MathML is as bad as trying to deduce what the browser should do from a "tag soup" of non-standard HTML. It's a classic apples and oranges situation.
This breakage -- the emphasis on content MathML at the expense of not supporting pure presentation MathML -- is a very undesirable side effect. How many MathML editors output presentation MathML only? How many of them output content MathML as well?
I'm trying to figure out how to create a UI that will let the user guide the MathML Editor through the process, given the templates that Abacus makes available. But for the moment, I'm completely stumped.
By the way, a lot of code can be written in two days... for this one feature, preparing a MathML fragment for Abacus, I'm already up to at least 600 lines of pure JavaScript...
Specifically, the key feature, overlaying one MathML template onto another, works perfectly.
I'm working on some other features for the editor, some of which are pretty important (being able to import a MathML <math> element that Abacus didn't create, for one big example). But all in all, what I've got now looks pretty slick, I have to say...
By the way, I'm looking for ideas on bug 247849.
Give it up. I turned on e-mail notification of all the comments to my blog last week. When you place a link to advertise a product that has nothing to do with what I write about, I'm going to know and I'm going to delete it.
Have a nice day. :)
XML document identification, and also getting the human language for an element
In Mozilla, figuring out if a document is a HTML document is easy. You simply do (aDocument instanceof HTMLDocument).
If you're handed a vanilla XML document, it's also easy. (aDocument instanceof XMLDocument).
XUL docs are handled a little differently; you check to see if it's an instance of XULDocument.
But what about XHTML documents?
Thanks to a bug which I believe jst fixed, Mozilla reports (aDocument instanceof HTMLDocument) as true for XHTML documents (content-type: application/xhtml+xml). This is good. Unfortunately, (aDocument instanceof XMLDocument) returns false for the same expression.
To my knowledge, there are precisely two content-types for which a document will be considered an HTMLDocument object. One is the text/html content type, which Mozilla parses as "tag soup". The other is application/xhtml+xml, which must go through the XML parser.
Because I never know when someone else might come up with a content-type for HTML, I figure any time we get a content-type of text/html, Mozilla will not treat the document as XML in any way.
So, I wrote a little function called isXMLDocument, and it's included via the link at the top of the page.
Why does this matter? It matters a LOT. Applications processing XHTML documents, when dealing with features which are XML-specific or HTML-specific, must know what's allowed and what's not. For instance, XML namespaces are useless in HTML (text/html), but are important in XHTML (application/xhtml+xml). So an application processing a document must know if it should use DOM Level 1 methods (Element.getAttribute, for example) or DOM Level 2 methods (Element.getAttributeNS).
Tag soup for thought.
(There's a bug I just realized in the filter for getLanguage; the filter will accept
This is a bit of a rant session, so if you're not interested in hearing someone complain, skip this entry.
(1) My internet connection @ home is down. So I'm stuck playing with Mozilla 1.7 RC1 and Nvu 0.2. Nor can I get recent source code from the trunk.
(2) The neighborhood I live in does not have DSL support yet.
(3) SBC won't give me a friggin' clue when the neighborhood I live in might have DSL.
(4) I don't think I can afford cable broadband (at $7.50/hr, 32 hrs/wk, the budget is a bit tight)
(5) What about dial-up? Only the desperate would consider doing CVS updates via 56K modems... of course, I did consider it...
(6) ... until I found out the computer I work with at home doesn't have a 56K modem.
(7) Laptop computer? I'm actually saving up for one, and I probably can purchase one at the end of the year... bear in mind this comes from a projected surplus of $150.00/month, after the first three months (during which I'm really thinking about OSCON and Abacus).
(8) Of course, when I do have the money saved up for a laptop, it probably won't be top-of-the-line, and I'll still have to go to Kinko's to get my CVS updates and recent software...
(9) My efforts to convince the local library to install Mozilla anything have gone completely unanswered. *sigh*
On another note: Firefox 0.9 is out, eh? I'm surprised to see both mozilla.org and mozillazine.org responding well today. I'd think the slashdot effect would make people cringe. (Thank goodness they didn't release Mozilla 1.7 final at the same time...)
I was asking myself how to identify the default language Abacus should use, and I realized there's really no DOM-defined procedure for getting the language of an element.
This is rather important to me.
I've started writing out a little JS to figure out how to get an element's default language, which is inherited per the HTML 4.01 specification (and probably in XML 1.0 as well, for xml:lang)
function getLanguage(aElement) {
var walker = aElement.ownerDocument.createTreeWalker(aElement.ownerDocument, NodeFilter.SHOW_ELEMENT, hasLangAttrFilter, true);
var rv = null;
var aNode = null;
walker.currentNode = aElement;
if (hasLangAttrFilter.acceptNode(aElement) == NodeFilter.FILTER_ACCEPT) {
aNode = aElement;
} else {
aNode = walker.parentNode();
}
}
rv = aNode.getAttributeNS(XML_NS, "lang") || aNode.getAttribute("lang");
return rv;
}
hasLangAttrFilter = {
acceptNode: function(aNode) {
if (aNode.hasAttributeNS(XML_NS, "lang") || aNode.hasAttribute("lang")) {
return NodeFilter.FILTER_ACCEPT;
}
return NodeFilter.FILTER_SKIP;
}
}
I'm a little surprised by how much JS code it took to handle editing XML files directly in a specific manner. But I can now claim that my template editor for the Abacus project is now at a point where I don't need to edit the files with a text editor.
This means that one-third of the Abacus MathML editor project is stable (if not quite complete). The next part, the actual MathML editor, I don't anticipate a lot of problems with. Though when it comes to reading in a MathML fragment that's already there and not Abacus-compliant yet, I may have some trouble. I haven't figured out yet how to take in a MathML fragment that's all presentation markup or all content markup yet, and sooner or later I'll have to...
I really wish I could release an 0.1 of Abacus based on what I have right now, but without a working MathML editor to demonstrate just what I'm doing, it would be counterproductive in the extreme.
One of the questions I've not yet answered for my own purposes is how to handle multiple human languages for documents containing MathML. I'm beginning to think my initial approach was a bit naive.
Initially, I had planned on a parallel markup scheme:
<semantics encoding="MathML-Content">
<!-- ... -->
<annotation-xml xml:lang="en-US" encoding="MathML-Presentation">
<!-- ... -->
</annotation-xml>
<annotation-xml xml:lang="fr-FR" encoding="MathML-Presentation">
<!-- ... -->
</annotation-xml>
<!-- ... -->
</semantics>
However, if the parent document itself is a multilanguage document, this may not work very efficiently.
For example, if the document follows the conventions the W3C currently suggests, then a CSS stylesheet which hides the user's non-native languages:
*:not([xml|lang="en-US") {
display: none;
}
would lose that whole fragment if it were in a section of the document designated fr-FR.
The solutions to this are at the moment non-obvious. Certainly I haven't figured out a universal solution. (I am proceeding on the assumption that the browser is receiving the document post-processing from the server, with multiple languages in the document itself.) For all I know, Mozilla may already hide content that's not from the native user's language. Again, this is bad if the entirety of the MathML expression resides in one fragment.
I may have a partial solution. Instead of parallel markup contained in a single MathML expression, I might be able to spread it out:
<html:div style="display:none;">
<semantics encoding="MathML-Content">
<!-- ... -->
</semantics>
</html:div>
<!-- later -->
<html:div xml:lang="en-US">
<semantics encoding="MathML-Presentation">
<!-- ... -->
</semantics>
</html:div>
<!-- later -->
<html:div xml:lang="fr-FR">
<semantics encoding="MathML-Presentation">
<!-- ... -->
</semantics>
</html:div>
I'm looking for additional ideas on making XML/HTML documents truly international... and possibly refinements of this idea.
For most XML documents, one or two namespaces is enough. XUL applications often require three or four (XUL, XBL, RDF, XHTML on occasion). For Abacus, I had enough namespaces where a common DTD actually made a little bit of sense...
The best part about it is I also have a script in there which I can include by &namespaces.script; entity.
The template editor is nearly done! I finally took the time for the editor to save its own templates. So now, things are looking very sharp indeed.
Just as a quick sample, I'm including current XML and DTD files for how Abacus template files should look. (This is not frozen, however! Depending on my experience in editing, these files may change.)
templates.dtd -- templates XML language
summary.xml -- pseudo-documentation of how things should be organized in a templates file
A sample template file under construction
A base templates file (wrappers for each language)
Yes, I know they have Windows newline characters in them (\r\n). They will go away.
I'm beginning to really hate the usage of the editor element in Mozilla Composer and Nvu. Because the way it's used in
both projects is a nightmare. Worse, it has worked for ages in this manner, so I may have a hard time convincing
anyone that its diapers need changing.
In Mozilla, the chrome://editor/content/editor.xul file has this excerpt:
<deck id="ContentWindowDeck" selectedIndex="0" flex="1">
<stack>
<editor editortype="html" type="content-primary" id="content-frame"
context="editorContentContext" flex="1" tooltip="aHTMLTooltip"/>
</stack>
<vbox>
<label id="doctype-text" crop="right"/>
<editor type="content" id="content-source" context="editorSourceContext" flex="1"/>
</vbox>
</deck>
This two-editor scheme causes problems for extensibility of Composer. A long time ago, someone filed bug 109682 to add DOM Inspector to Mozilla Composer. I wholeheartedly support that idea and plan on implementing it when I feel I can do so reliably.
Alas, transactions are broken between edit modes in Mozilla Composer. Just try this:
(1) Open an document in Mozilla Composer.
(2) Add a table in the Normal view.
(3) Switch to HTML Source view.
(4) In one of the cells, write something (Hi Mom).
(5) Switch to Normal View.
(6) Switch to HTML Source view.
(7) Check the Edit menu for the Undo command.
Expected results: Undo is enabled.
Actual results: Undo is disabled.
To the user, this is inexplicable. The user thinks he's editing one document when he's really editing three. The first is the baseline document under the Normal view. The second is the document he creates when he switches to Source view the first time. When he switches back to Normal view, all his transactions in Source view become one transaction for the normal editor element -- and surprise, the transactions in the Source view get wiped out. So when he goes back to editing in Source view, it's like starting with a whole new document.
Unfortunately, this particular bug is a WONTFIX per comments in editor.js:
else if (previousMode == kDisplayModeSource)
{
// Only rebuild document if a change was made in source window
if (IsHTMLSourceChanged())
{
// Reduce the undo count so we don't use too much memory
// during multiple uses of source window
// (reinserting entire doc caches all nodes)
try {
editor.transactionManager.maxTransactionCount = 1;
} catch (e) {
To implement DOM Inspector safely, I would first have to force the primary editor to get its update if I'm leaving the HTML source mode. Even so, this doesn't really fix the undo/redo scheme.
Nvu is even worse in regards to switching edit modes. It uses a tabeditor to let you edit multiple documents. But thanks to the way it's used as of Nvu 0.2 (namely, to replace the first editor of two in editor.xul and not both of them), when you switch to HTML Source editing mode, you lose the ability to switch documents! This is understandable considering it's 0.2, but even so, it's sloppy. The underlying bug, using two editor elements in the first place, forced this implementation. I'd call the loss of tab editing in Source view a major bug.
http://feedhouse.mozillazine.org doesn't like me...
Also, someone's been spamming the blog replies again. I cleaned it up. (I do value your comments, so I leave the replies feature on.)
(Thanks for the correction on the URL.)
I noted with some interest Daniel Glazman's weblog entry about a contract to develop an XML editor possibly based on Mozilla code. I think it's a great idea, and I have an idea which (at least to me) makes sense for it.
Every XML language is different. XHTML, MathML, SVG, RDF, XUL, XBL... the only thing they have in common is that they are XML languages. Mozilla Composer has a XUL interface for editing HTML. I'm developing a XUL interface for editing MathML. Several people have attempted, with apparently limited acclaim, to develop XUL interfaces for editing XUL.
There's a trend here: one XUL interface for editing one XML language.
I think this trend is a very good one. My approach of using a dialog to edit MathML may not work out all that well in a multi-language editor, but my point is we should probably separate the language-specific user interfaces (Table, Anchor, Link, Image, etc.) from the language-independent UI (New, Open, Save, Spell check, etc.). This would allow such a project to utterly shed the dependence on (X)HTML that we currently have.
I have big problems with the HTML bias in Composer. The HTML Tags view needs a face lift if it's going to work in this new project. It uses images to place the yellow HTML tags before elements. That's all well and good, when you have a limited set of tags to deal with. When you include a larger set of elements, it becomes a burden on the developer writing the language UI to come up with appropriate tags. (For Abacus, I'm not even going to bother; I'm going to have it add one image, probably in green, for math elements, and have it wipe out all descendant images for tags.)
I've already griped earlier about other weaknesses for Composer's extensibility. I hope the Disruptive Innovations team takes me seriously...
(Though I suspect that until I release Abacus 0.1, no one in Mozilla development will take me seriously... and with good reason...)
A few days ago I wrongly claimed that Nvu and Mozilla Composer would wipe out a document's native doctype. My tests aren't showing that now, which means I was likely thinking more about the spaghetti-sauce-of-the-day than about actual coding at the time.
I wanted to make this apology public for that foul-up.
I found, in the Preferences for Nvu 0.2, an installer for extensions. This discovery makes me a very happy man; I don't have to worry about command-line installation of Abacus anymore. (In theory.) I'd guess this is the extension manager ballyhooed about.
I'm thinking one extension that would make a "killer ext" for Mozilla would be a little PNG editor. People who develop templates for MathML editing probably would like to create icons for the menu items they'll create. Of course, Mozilla wasn't really intended to edit graphics, but one can dream...
I noticed, absently, that Nvu 0.2 doesn't like XHTML. (Yes, I know, after what I said about doctypes, I hear a "here we go again" groan...) It likes HTML just fine. With a little tinkering, it will like HTML + MathML as well. But when I have a link tag:
<link href='chrome://gre/res/mathml.css' type='text/css' rel='stylesheet'/>
Nvu will record that as:
<link href='chrome://gre/res/mathml.css' type='text/css' rel='stylesheet'>
I can't get mad at Nvu for doing that; it's a perfectly natural cleanup for HTML, and Nvu has no way of knowing that I want XHTML. Same problem in Mozilla Composer; it's not really a bug.
This makes me wonder: what am I supposed to do about it? Per the XHTML Media Types note of the W3C, XHTML 1.0-compatible content types should not be used for text/html documents. In Section 3.1, the document specifically points out XHTML + MathML as "NOT suitable" (their capitalization).
I thought I saw a doctype for HTML + MathML a long time ago... can't find it in MathML 2.0 1st Ed. (I'm about to grab MathML 2.0 2e.)
In terms of Abacus progress: the assert() function is proving very helpful indeed in forcing me to do things right. I'm just about at the point where I can save templates, but I hit a snag first: every template I create needs a file to save the template to. I placed an assert to throw an exception when I reached that point in my code, and forgot about it.
When I ran the code after fixing a couple other bugs, I noted with some concern that nothing seemed to happen after I okayed a template... and then my JS Console provided a very stern reminder. As I can't really do any more work on Abacus without fixing that bug, I think I'd better write some code to identify a file for a new template, or create one if necessary...
After that, it's building a UI for template writers to make UI's for their templates... that, fortunately, will be the last thing required for actually editing MathML templates, and I can finally get on to editing MathML with the templates. That should be easy in comparison.