Perl 5.6 moves to UTF-8 by default
18:28, 20 Apr 2000 UTC | Simon St.Laurent

In What's New in Perl 5.6.0?, Simon Cozens tops the list with UTF-8 support, a feature that should make processing and generating XML in Perl simpler.

Cozens reports that:

"By default, Perl now thinks in terms of Unicode characters instead of simple bytes; a character can, as the CJK people already know extremely well, span several bytes. All the relevant built-in functions (length, reverse, and so on) now work on a character-by-character basis instead of byte-by-byte, and strings are represented internally in Unicode."

Clark Cooper, maintainer of XML::Parser, told xmlhack that "5.6 will now make it at least practical to work with UTF-8 strings *as* UTF-8 strings" rather than treating the results of XML::Parser as ASCII or remapping them.

The improved support should also make it easier to create UTF-8 output, though ordinary (non-XML::Parser) input of UTF-8 information will still require a translation.

xmlhack: developer news from the XML community

Front page | Search | Find XML jobs

Related categories