Encodings
Gorille supports character testing for XML 1.0, 1.1
00:28, 29 Jan 2002 UTC | Uche Ogbuji

Recent threads on XML-DEV and elsewhere have once again illustrated the complexity behind the basic atoms of XML documents: Unicode characters. Simon St. Laurent announced the 0.4 release of Gorille, a tool for helping makers of Java XML processors, or applications that emit XML, test their character processing.

Gorille uses an XML format to specify character lists according to the productions from the XML specs. Gorille 0.4 supports XML 1.0 and the controversial first working draft of XML 1.1 (also known as "Blueberry"). It also allows for the handling of sub-sets of XML allowed characters, ASCII only, for instance. From the Gorille home page:

"Gorille relies completely on Java's built-in support for Unicode strings and characters, though it doesn't use any of the Unicode property information Java provides (in java.lang.Character and java.lang.Character.UnicodeBlock ). Starting in version 0.3, Gorille provides support for the Surrogates Area (13.4) of Unicode (U+D800-U+DFFF) and for characters above 10000 represented by surrogate pairs (3.7). Java itself doesn't recognize these characters as such, but does permit their inclusion in strings as UTF-16 code points."

From the 0.4 announcement:

"I believe that Gorille's functionality is complete at this point, though the code could certainly use more testing and documentation. Future releases will likely focus on added testing, documentation, and improvements in command-line interfaces."

Gorille is distributed under the Mozilla Public License and hosted on SourceForge.

  
xmlhack: developer news from the XML community

Front page | Search | Find XML jobs

Related categories
Encodings
Java