xmlhack: XML 2003 session report: Namespace Routing Language

10 December 2003 at XML 2003 in Philadelphia...

James Clark followed a block of sessions on ISO Document Schema Definition Languages (DSDL) (ISO/IEC JTC 1 SC 34 WG 1) with a presentation on Namespace Routing Language (NRL), which is a key contribution to DSDL Part 4: "Selection of validation candidates".

Clark began his presentation by framing the issues leading to NRL claiming there is a significant cost to namespaces, and also that it is very important to have a diversity of available XML schema languages. He said that NRL tried to redeem some of the cost of namespaces by using them to divide-and-conquer schema problems, using the best independent schema in the next schema language to address each sub-problem.

NRL identifies groups of elements and attributes based on namespaces. The developer specifies a schema for validating each group. The data model for the entire XML document to be processed is a tree of trees. The big tree is divided into "sections", which must be subtrees. This division uses a simple set of rules considering the relative subtree for each element and its namespace compared to that of its parents. Sections can also be applied against attributes according to whether they have the same namespace as its owner element, allowing for processing of what some call "global attributes".

The NRL schema language defines a set of rules for sectioning documents and instructions for executing validation on each section. Rules can invoke validation against multiple schemata in multiple languages, and they can be constructed to handle otherwise unspecified namespaces, say for extremely lax or extremely strict processing.

NRL supports modes similar to those in XSLT (in fact the overall processing model is much like that in XSLT). Actions can specify modes to be used for processing children of th context element. NRL also supports explicit setting of context, which allows for processing patterns that can't be expressed with modes alone. For example, one could specify a rule for processing any RDF/XML only if it was contained within an XHTML head element.

NRL is designed for streaming implementation, though a subschema language might enforce building of a subtree in memory. SAX is the basis of the implementation of NRL in the open-source RELAX NG processor Jing.

Clark said that NRL allows developers to extend schemata without forcing them to insert thickets of wildcard expressions at each turn in order to support expansion and modularity. The developer can concentrate on what each subschema needs to know about and express the union thereof with other schemata using NRL. He suggested that NRL could be a seed for not just validating multi-namespace documents but also for processing them.

NRL is the main input to Namespace-based Validation Dispatching Language (NVDL), the language being developed in ISO DSDL Part 4. In response to a question Clark said that NVDL adds minor features such as anonymous modes to NRL, and is pretty much a superset of NRL. Another question from the audience was how the Post Schema Validation Infoset (PSVI) is handled when WXS is used to validate subtrees in NRL. Clark responded: "In the DSDL world view, there is no PSVI. The output of validation is purely error information."