I notice that the wmlbrowser extension for Firefox has a problem; some WML sites render better if wmlbrowser has access to the WML DTD, but wmlbrowser can't ship it for licensing reasons.
That got me thinking: surely it's possible, particularly for XML-based languages where conformance to the schema is a requirement, to reverse engineer the contents of the schema if you have enough documents which conform to it? Or, at least, you could make a good guess.
For example, if the root element is always <wml>, you could guess that as the root. And if it only contained elements from a given list, and if a particular element only ever appeared once, etc. etc. Is this feasible? If so, has anyone already written "guess-schema"?
Posted by gerv at February 3, 2006 12:42 PMI know Trang can infer a RELAX-NG schema given a set of conforming documents. I think it can output DTDs as well.
Posted by: Ted Mielczarek at February 3, 2006 2:05 PMWhy couldn't the extension just load the DTD off the web? It is on the web somewhere, isn't it?
Posted by: Benjamin Smedberg at February 3, 2006 2:26 PMThe XML editor in Eclipse's WTP project has an option to infer the schema from the current document to provide content assist. I have only briefly used it but it seems to work.
bsmedberg: I believe it's the other side of a click-to-accept licence agreement. So wmlbrowser has an option to take you to the site to agree to the terms. But it's all a bit obnoxious.
You'd need to see the site for exact details.
Posted by: Gerv at February 3, 2006 2:48 PMThe extension can load the DTD off the web. I made it so that you have to tick a box saying that you accept the terms and conditions, which is a pain (you have to open the options window first), but I think that covers the bases legally.
Technically I only need the DTDs for the entity declarations ( etc.), it's not the schema I care about at all. So perhaps I should just ship with a "fake" DTD containing only the entities.
The other obstacle is that DTDs have to be stored in browser chrome, not in user profiles. I guess I should raise a bug on this (and maybe even try to fix it).
Matthew (wmlbrowser author)
Posted by: Matthew Wilson at February 3, 2006 4:04 PMTrang is the only tool I have found that will do it all, unfortunately (or fortunately) it's in Java:
http://thaiopensource.com/relaxng/trang.html
Ah, look at that, if only I paid closer attention to the first post :)
Posted by: Shane Caraveo at February 3, 2006 7:06 PM"The extension can load the DTD off the web. I made it so that you have to tick a box saying that you accept the terms and conditions, which is a pain (you have to open the options window first), but I think that covers the bases legally."
That is strange, if a DTD reference is provided any validating XML parser will automatically retrieve it! So how do they suppose that would work??
~Grauw
Why does the WML browser need the DTD? As far as I can tell, the only things in the DTD that could make the lack of the DTD a problem are the entity definitions for nbsp and shy. Creating a pseudo-DTD for two entity definitions is not difficult. (However, I consider DTD-based entities harmful in the Web context. When Mozilla gave into XHTML entities, other browsers had to follow, too, which runs against the idea that interactive browsers that have non-validating parsers were supposed to be relieved from processing DTDs. If Mozilla had firmly rejected the entities, perhaps the XML DTD-based character entities could have been eradicated on the Web.)
Posted by: Henri Sivonen at February 5, 2006 9:49 AM