I notice that the wmlbrowser extension for Firefox has a problem; some WML sites render better if wmlbrowser has access to the WML DTD, but wmlbrowser can’t ship it for licensing reasons.
That got me thinking: surely it’s possible, particularly for XML-based languages where conformance to the schema is a requirement, to reverse engineer the contents of the schema if you have enough documents which conform to it? Or, at least, you could make a good guess.
For example, if the root element is always <wml>, you could guess that as the root. And if it only contained elements from a given list, and if a particular element only ever appeared once, etc. etc. Is this feasible? If so, has anyone already written “guess-schema”?
I know Trang can infer a RELAX-NG schema given a set of conforming documents. I think it can output DTDs as well.
Why couldn’t the extension just load the DTD off the web? It is on the web somewhere, isn’t it?
The XML editor in Eclipse’s WTP project has an option to infer the schema from the current document to provide content assist. I have only briefly used it but it seems to work.
bsmedberg: I believe it’s the other side of a click-to-accept licence agreement. So wmlbrowser has an option to take you to the site to agree to the terms. But it’s all a bit obnoxious.
You’d need to see the site for exact details.
The extension can load the DTD off the web. I made it so that you have to tick a box saying that you accept the terms and conditions, which is a pain (you have to open the options window first), but I think that covers the bases legally.
Technically I only need the DTDs for the entity declarations ( etc.), it’s not the schema I care about at all. So perhaps I should just ship with a “fake” DTD containing only the entities.
The other obstacle is that DTDs have to be stored in browser chrome, not in user profiles. I guess I should raise a bug on this (and maybe even try to fix it).
Matthew (wmlbrowser author)
Trang is the only tool I have found that will do it all, unfortunately (or fortunately) it’s in Java:
http://thaiopensource.com/relaxng/trang.html
Ah, look at that, if only I paid closer attention to the first post :)
“The extension can load the DTD off the web. I made it so that you have to tick a box saying that you accept the terms and conditions, which is a pain (you have to open the options window first), but I think that covers the bases legally.”
That is strange, if a DTD reference is provided any validating XML parser will automatically retrieve it! So how do they suppose that would work??
~Grauw
Why does the WML browser need the DTD? As far as I can tell, the only things in the DTD that could make the lack of the DTD a problem are the entity definitions for nbsp and shy. Creating a pseudo-DTD for two entity definitions is not difficult. (However, I consider DTD-based entities harmful in the Web context. When Mozilla gave into XHTML entities, other browsers had to follow, too, which runs against the idea that interactive browsers that have non-validating parsers were supposed to be relieved from processing DTDs. If Mozilla had firmly rejected the entities, perhaps the XML DTD-based character entities could have been eradicated on the Web.)