XML for document preservation
Isn't it
surprising to find a recommendation, not yet three years old and unapproved
by any official body, appraised by engineers, lawyers and archivists invited by their
government to debate the long term preservation of digital
documents?
Yet this was just the case in a
meeting organized by the MTIC [1], for the French Prime Minister, to present a "guide
for the preservation of digital documents."
It included presentations by Alain Bensoussan, a lawyer
specializing in the issues of digital documents, and Catherine Dhérent
representing the "Archives Nationales."
Despite
their virtual nature, digital documents are threatened by the lack of long-term
stability of their media. The French standard NF Z42-013 and law on the
validity of digital documents as formal proof require that documents be
written on non-rewritable media, guaranteed only over ten
years -- a very brief period of time from the archivists' point of view.
This
physical deterioration is aggravated by the short life cycle of the logical
formats used to represent documents.
The long-term preservation of digital documents thus requires the setup of a dynamic process to
schedule, run and audit the physical and logical migrations needed to keep documents alive.
In this
context, XML can be used for different purposes:
- XML is a format that meets the requirements defined by the MTIC -- it's an open
recommendation, easy to transform, that should be easy to migrate.
- XML allows the separation of content from the presentation, and separate storage.
- The guide recommends defining a
XML envelope for the documents, that would contain the description of the
document, its requirements for preservation, access control and the
history of its migrations.
- XML is a good candidate for
describing the metadata associated with the document -- possibly as a part of
its envelope. The MTIC will setup a specific working group for this issue.
The
presentation by Alain Bensoussan focused on the legal issues, showing that the
presentation does also carry a semantic value that may be needed to courts and
that one should keep documents with all the "drivers" needed to
visualize them.
The issue
could be controversial, since the configuration used by the author of a document and its readers are usually different.
A litigation
on a contract edited as (X)HTML with tools from a supplier A, and displayed with
missing text by a browser B by a customer would probably be difficult to judge.
Any
webmaster knows that such things are just too easy to reproduce, and this example
gives a new perspective on the legal implications of the lack of conformance to standards in tools.
Copies of
the presentations and of the guide should be available online soon.
[1] Mission interministérielle de soutien technique pour le développement des technologies de l'information et de la communication dans l'administration
Re: XML for document preservation (Maxime Coulon - 10:29, 29 May 2003) dear miss mister
i'm a student a the institut for information in amsterdam .
i'm busy with a research about xml.
the research is for a printing company.
is it possible to store digital books in a logical xml structure.
thank you in advance Re: XML for document preservation (Eric van der Vlist - 10:45, 28 Jan 2001) My headlines are not always free from the rubish teasing of the author who'd like to attract more people reading his stories ;=) ...
In this specific case, though, the introduction was founded on a table (that I wish they'll publish soon) where XML was on a left column entitled "recommended formats" while SGML was in the right columns entitled "other formats" together with older or proprietary formats.
This is showing that they are opposing the two formats probably not based on the history nor on their technical affiliation, but on the ease of read, process and transform these formats that is key to preserve documents.
Since a XML document can be read not only by all the SGML tools, but also by the all the "XML only" tools that are flourishing, I think that this aspect does need to be taken into account and that going further on this road they could as well recommend a "safe" subset of XML --I don't know if they are willing to do so, though.
Re: XML for document preservation (Rick Jelliffe - 04:41, 27 Jan 2001) Rubbish (qualified)! XML is a profile of ISO 8879 (WebSGML). An example of how to describe it is given by ISO 8879 Annex L. SGML was created, in part, to allow archiving. Any government who wants to mandate XML for archiving needs to just specify ISO 8879 (WebSGML) as the base specification with the appropriate SGML Declaration and Additional Requirements document (for which they can turn to James Clark's note.)
However, full SGML is better than XML at modeling compound document sets: one can add attributes to entities for example.
No matter what kind of SGML is used, there will always be the need for additional requirements: which graphics formats can be used (and exactly which versions of the formats), which naming conventions, which compression, which stylesheets, which schemas, and which hyperlinking and locating mechanism. SGML/XML
enables this kind of superstructure to be built.
But the qualification is this: if "XML" is being applied loosely to mean "everything being done at the W3C" then certainly Eric's point is fair--there are lots of layers (layers subsequent to parsing) that are not from international standards. |