[Openmcl-devel] Plans for Unicode support within OpenMCL?
dvd at davidashen.net
Wed Mar 22 05:51:12 PST 2006
On 22/07/5766, at 17:42, Tom Emerson wrote:
> "Real world" applications that were retrofitted to use Unicode use
> UTF-8 as the internal encoding, because the C runtime string
> functions "just work" (for various meanings of work) with it, i.e.,
> strlen() of a UTF-8 string gives a valid number (the number of
> bytes in the string). strlen() of a UTF-16 string usually gives 0
> since the first byte of the UTF-16 character is often 0.
UTF-8 is a convenient character encoding of UCS. It is usable for
hashing, sorting (when sorting need not be lexicographical) and
string manipulations. It is also handy for viewing the result in an
editor/viewer -- terminals and editors handle UTF-8 natively.
> Applications that are written with Unicode in mind rarely, if ever,
> use UTF-8 as an internal encoding. While there is a space savings
> in many cases, other manipulations are much more difficult because
> of the multi-byte character representation.
That's not true. Applications written with Unicode in mind do use
UTF-8 as an internal encoding. The best example is Plan 9 itself,
written from scratch with Unicode in mind. That's because it is a
convenient and efficient representation, and memory footprint is
unrelated to that.
They just don't use UTF-8 representation when random access to
individual characters is required, providing decoders and encoders.
When you need strings as integral objects, you keep them in UTF-8;
when you want to access individual characters, you do something like
(with-runes (runes length) string
or similar to that.
More information about the Openmcl-devel