xmlformat is a REX-based script (take your pick from Perl or Ruby
versions) for consistently reformatting XML files; that is,
"canonizing" and normalizing whitespace, indenting,
line-wrapping, and placement of line breaks. It works as
advertised, handling mixed content and "verbatim" content
correctly.
xmlformat (developed by MySQL guru
Paul DuBois[1]) is a tool to use when you want an
off-the-shelf solution for "pretty printing" XML files for better
readability--or, more importantly, when you want to ensure that
before being committed to a revision-control system and/or having
diffs run against your files, the whitespace, indenting,
line-wrapping, and placement of line breaks in them have been put
into a standard, consistent format.
A tool like xmlformat is especially
useful in environments where you have multiple people working on
the same set of XML files--people who may be using a variety of
editing applications to edit the files. As DuBois puts in his
intro to the xmlformat documentation:
XML editors typically impose
their own style conventions on files. The application of
different style conventions to successive document revisions
can result in large version diffs where most of the bulk is
related only to changes in format rather than content. This
can be a problem if, for example, the version control system
automatically sends the diffs to a committer's mailing list
that people read. If documents are rewritten to a common
format before they are committed, these diffs become smaller.
They better reflect content changes and are easier for people
to scan and understand.
The good news about xmlformat
specifically (as opposed to some other XML "pretty printing"
tools) is that it actually always seems to do what you'd expect
it to do; after doing some initial configuration to teach it
about your content--for example, to tell it which elements in
your XML files are inline elements and which are block elements,
which need to be handled as "verbatim" elements, and which you
want it to whitespace-normalize--I think you'll find that it
reformats your content the way you want it, without unexpectedly
removing or introducing any whitespace (including handling mixed
content correctly).[2]
Clear, well-written documentation on how to configure
xmlformat is provided both in the
xmlformat distribution and online. The documentation also includes details
about how it works[3]. The xmlformat Perl and Ruby scripts themselves are
also extensively commented and make for an interesting read.
|