June 9, 2004

Internationalization and Content MathML

One of the questions I've not yet answered for my own purposes is how to handle multiple human languages for documents containing MathML. I'm beginning to think my initial approach was a bit naive.

Initially, I had planned on a parallel markup scheme:

<semantics encoding="MathML-Content">
<!-- ... -->
<annotation-xml xml:lang="en-US" encoding="MathML-Presentation">
<!-- ... -->
</annotation-xml>
<annotation-xml xml:lang="fr-FR" encoding="MathML-Presentation">
<!-- ... -->
</annotation-xml>
<!-- ... -->
</semantics>

However, if the parent document itself is a multilanguage document, this may not work very efficiently.

For example, if the document follows the conventions the W3C currently suggests, then a CSS stylesheet which hides the user's non-native languages:

*:not([xml|lang="en-US") {
display: none;
}

would lose that whole fragment if it were in a section of the document designated fr-FR.

The solutions to this are at the moment non-obvious. Certainly I haven't figured out a universal solution. (I am proceeding on the assumption that the browser is receiving the document post-processing from the server, with multiple languages in the document itself.) For all I know, Mozilla may already hide content that's not from the native user's language. Again, this is bad if the entirety of the MathML expression resides in one fragment.

I may have a partial solution. Instead of parallel markup contained in a single MathML expression, I might be able to spread it out:

<html:div style="display:none;">
<semantics encoding="MathML-Content">
<!-- ... -->
</semantics>
</html:div>

<!-- later -->

<html:div xml:lang="en-US">
<semantics encoding="MathML-Presentation">
<!-- ... -->
</semantics>
</html:div>

<!-- later -->

<html:div xml:lang="fr-FR">
<semantics encoding="MathML-Presentation">
<!-- ... -->
</semantics>
</html:div>

I'm looking for additional ideas on making XML/HTML documents truly international... and possibly refinements of this idea.

Posted by WeirdAl at June 9, 2004 2:51 PM
Comments

Hi, I don't know if it's nice to send an "international" XML document to the user. If the document is an article, I, as a user, don't wan't to
download a i8 document for reading only one language. If you have a document that has French, Englighs, Spanish, German version then you have to downlad a document of 4n bytes to read just n bytes.
On the author side I think it is a good idea to have
i18 documents but I don't think that is bad to mozilla to hide the tag that is inside a tag with a lang different of the one you are seeing right know. That is because I can think of(In LaTeX)
"Let $a$ be a number, $f:\Reals\rightarrow\Reals$ a function"
"sea $a$ un numero, $f:\Reals\rightarrow\Reals$ una función" so if you have the first in one tag with the lang="us" and the other tag with the lang="es" then it's good that the rendering engine
hides the math. It 's a bit diffuse, but I hope you get the idea.

Posted by: Warwato at June 10, 2004 2:30 PM

What about entity replacement and using an server content variation to hand the client the correct set of entities? (Of course, this doesn't work with a document saved-to-disk, but will people really care?)

Posted by: Eric Hodel at June 11, 2004 8:24 AM

Good points; I rather favor the point Warwato made. Namely, why would the server send a multilanguage document down in the first place?

Server-side processing for the HTTP Accept-Language header makes a lot of sense, though. Is there a standard PHP- or Perl- procedure for filtering a document by language? (Or do I get to write one when I feel like it? ;) )

Posted by: WeirdAl (Alex Vincent) at June 11, 2004 2:30 PM