DocBookDoclet: HTML/Javadoc to DocBook XML
09:51, 6 Sep 2002 UTC | Michael Smith

Michael Fuchs has released version 0.29 of DocBookDoclet, a Java application for converting HTML files and Java source documentation to DocBook XML. This release add internationalization support.

The release is available for download in several formats: RPM, tar/gz, tar/bz2, zip. A changelog is also available.

Along with supporting conversion of the most commonly used Javadoc tags (@param, @throws, etc.), DocBookDoclet supports conversion of most structural/logical HTML markup (though, for some reason, not span or cite -- which might be converted to, say, phrase and emphasis remap="cite"). And it supports conversion of some, but not all, presentational HTML markup; for example, it currently ignores the big, small, and strike elements, though it seems like these elements could all be converted to phrase with a corresponding value for the remap attribute.

Also, though it always seems to generate clean, well-formed XML -- nicely indented even -- it does sometimes produce DocBook instances that require manual cleanup in order to be made valid (even if the HTML source is valid). It seems for the most part to do a one-to-one conversion of HTML elements to DocBook elements, so markup instances that are HTML-valid even though they lack certain HTML elements (for example, a dl definition list that lacks a dd description element) can get converted to DocBook instances that are invalid because they lack the corresponding elements (for example, a "missing dd" definition list gets converted to variablelist that lacks a required listitem element).

Given that limitations in the structure that HTML can be used to model, conversion of certain HTML markup instances may continue to present a challenge. Still, it would be interesting to see if some logic could be added to DocBookDoclet to detect and automatically correct certain validity errors, so that they don't need to be corrected manually.

Overall, though, even in its current (alpha) incarnation, it's a very useful tool, and certainly further along in terms of development than the only other open-source alternative (Jeff Beal's Html2DocBook, which though currently more limited, is a potentially-very-appealing XSLT-only solution).

xmlhack: developer news from the XML community

Front page | Search | Find XML jobs

Related categories