XML 2003 session report: Combining multiple vocabularies without tears
23:24, 20 Dec 2003 UTC | Uche Ogbuji

10 December 2003 at XML 2003 in Philadelphia...

Murata Makoto joined a block of sessions on ISO Document Schema Definition Languages (DSDL) (ISO/IEC JTC 1 SC 34 WG 1) (See "XML 2003 session report: News from the world of DSDL" for coverage of the first session in that series). He covered DSDL Part 4: "Selection of validation candidates".

The problem addressed by this effort is that a single document may be written in multiple vocabularies, which can then be seen to compose a single, non-monolithic vocabulary. For example a document may be in XHTML with embedded MathML and RDF, and this combination makes up one non-monolithic vocabulary. DSDL Part 4 is an effort to define a schema language that can effectively handle non-monolithic vocabularies.

As an illustration of the fact that such documents are already common enough to be supported by mainstream user agents, Murata showed a MathML matrix within an XHTML document displayed naturally in Mozilla Firebird. He then displayed an XHTML 2.0 document with embedded MathML, SVG, HLink, RDF, Ruby annotations and more. He also pointed out how specifications such as XForms naturally make for combinations of vocabularies.

To illustrate why non-monolithic vocabularies pose a problem for schema designers, Murata asked the audience whether anyone had read all the XHTML, RDF, SVG and MathML specifications. One person out of 40 or more raised his hand. Murata said that even most XML experts only know one or two vocabularies intimately well, which means that it is hard to create one single super-schema that incorporates multiple vocabularies.

The goal of XML namespaces is to clearly partition different vocabularies within one document, but Murata said that XML namespaces have actually made validation of non-monolithic harder in practice, and that neither WXS nor RELAX NG make it easier. The problem remains that you have an entire document to validate against multiple separate schemata, but if any of these schemata has not been designed for your specific non-monolithic vocabulary, you will usually end up having to modify it so that it doesn't choke on elements in foreign namespaces.

DSDL Part 4 came from Murata's RELAX Namespace proposal, which was later augmented by James Clark to create Modular Namespaces (MNS) and by Rick Jelliffe to create Namespace Switchboard. Clark used all of these as inputs to a second effort, Namespace Routing Language (NRL), which has become the main input to DSDL Part 4. The language to be defined by DSDL Part 4 was named Namespace-based Validation Dispatching Language (NVDL) at the ISO DSDL meeting just before the conference.

The idea behind NVDL is to allow you to author a schema that is a combination of multiple sub-schemata, which can be developed separately and independently of the eventual combination. An NVDL processor would then validate a non-monolithic document by dividing it into diferent chunks, known as "validation candidates". The validation candidates are extracted by partitioning the document according to the different namespaces within it, representing the diffwerent vocabularies. An NVDL processor would then use a validator component to check each valdation candidate against each sub-schema. An important feature of NVDL is that these validator components can process diverse schema languages. One can create an NVDL schema that uses WXS to validate some candidates, RELAX NG for others, and Schematron for yet others.

NVDL is designed so that the overall document is parsed into a series of events, wich trigger the validator components. This design is intended to maximize prformance.

DSDL Part 4 is still in development, but may beocme an international standard by late 2004 if all goes well. Setting forth the promise of ISO DSDL Part 4, Murata said "For the non-monolithic World Wide Web, namespaces are the first step. NVDL is the second step."

Related stories:

xmlhack: developer news from the XML community

Front page | Search | Find XML jobs

Related categories