A well-known issue for SGML developers, overlapping structures had been
brought back to light in XML terms by Patrick Durusau at several conferences
including XML Europe 2002 in May, and has given birth to several proposals at
eXtreme Markup 2002 in August. Two of the complementary approaches are
detailed in the papers announced by Patrick Durusau and Jeni Tennison.
Just-In-Time-Trees
(JITTs), developed by Patrick Durusau and Matthew Brook O'Donnell now include
a revised version of the presentation made at eXtreme 2002, and a first XSLT
based implementation. Durusau mentions that this proposal supports other
formats than XML (including LMNL):
Note that we do not rely exclusively upon XML markup (you can simply record
overlapping hierarchies in standard XML markup and then separate the trees
into layers for processing) but the technique should extend to traditional
SGML and concur files under SGML, LMNL, milestone/fragmentation/join,
MECS/TexMECS, as well as other file formats.
Although originally intended to solve the issue of overlapping structures
by extracting a structure at runtime, Elliotte Rusty Harold notes that the
idea of a building trees dynamically over a document is similar to the MOE (Markup Object Events) API
proposed by Simon St.Laurent and believes that it can benefit general purpose
XML processing:
Just-In-Time-Trees have the potential to be as easy to use as a tree-based
API like JDOM or DOM while as fast and efficient as a streaming API like
SAX or XMLPULL. I'm still trying to figure out exactly what the API for
such a thing should look like before I work on the
implementation.
Layered Markup and anNotation Language
(LMNL) has been developed by Jeni Tennison, Gavin Thomas Nicol and Wendell Piez and
proposes a new, non-XML, markup language that isn't defined in term of
elements but in term of "ranges" which may overlap:
Enabling ranges to overlap is incredibly useful. It's often very hard to
squeeze a document's structure into a neat tree, for example if you're
including comments, marking up insertions and deletions or marking up text
that has multiple structures such as the Bible (chapters and verses vs.
sections and paragraphs). This isn't to say that tree structures are
useless -- of course they're incredibly useful, not least because they're
easy to process -- but they don't meet everyone's requirements.
The project, which envisions a Relax NG like schema language and a XPath
like query language in its future directions, is currently finalizing its
specifications before starting to develop supporting software:
The software hasn't quite caught up with the specification; we think it's
important to get the specs right first
Other stories:
|