[Openmcl-devel] Naming convention for character-encodings (UTF) or :utf32 vs. :utf-32

Gary Byers gb at clozure.com
Sat Sep 29 12:08:51 PDT 2007


I think that I tried to be careful with this, trying to use the
preferred names from <http://www.iana.org/assignments/character-sets>
(well, keywords based on those preferred names).  See what happens
when you're careful!  (I should have known better.)

You're right that "big endian utf 32" should be called "utf-32be"
and likewise for the little-endian variant.  I'll change that.

Trying to strictly adhere to a standard is good, but there are
lots of standards.  What ICU does when trying to match a provided
character encoding name with one of those that it knows about
is to ignore case and ignore dashes and underscores, so :utf-8
and :utf8 are treated identically.  This seems like a good approach;
it's unlikely that someone who says :utf8 is referring to something
other than "8-bit Unicode Transformation Format", and it's not
all that useful to claim to have never heard of :utf8.

On Sat, 29 Sep 2007, Ralf Stoye wrote:

> Hello,
>
> trying to use utf-8 with portable-aserve i got errors about unknown
> character-encodings.
> I found out that most encodings are named :utf-xx but the :utf-32be
> and :utf-32le versions are mixed up:
> I vote for a consist use of the :utf-xx[xx] version.
>
> level-1/l1-streams.lisp
> (defmethod stream-external-format ((s character-stream))
>   (make-external-format :character-encoding #+big-endian-
> target :utf32-be #+little-endian-target :utf32-le :line-
> termination :unix))
>
>
> level-1/l1-unicode.lisp
> (define-character-encoding #+big-endian-target :utf-32be #-big-endian-
> target :utf32-le .....
>
> Regards,
> Ralf Stoye
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list