[Openmcl-devel] *default-character-encoding* should be :utf-8

Gary Byers gb at clozure.com
Tue Mar 6 01:14:23 UTC 2012

On Mon, 5 Mar 2012, Ron Garret wrote:

> On Mar 4, 2012, at 5:53 PM, Gary Byers wrote:
>> If your sources are in some legacy encoding - MacRoman is an example
>> that still comes up from time to tine - then you obviously need to
>> process them with that encoding in effect or you'll lose information.
> If you're using such legacy sources, you first step should be to
> convert them to UTF-8 and then never touch the original again.
> (The> same goes for latin-1, except that latin-1 is not a legacy
> encoding.  It's in common use today, which is the main reason this
> is a real problem.)

I agree, but the people who have these legacy-encoded sources that really
should have been converted to utf-8 long ago have all kinds of flimsy excuses
for not wanting to do so.  "It costs time", "it costs money", "it requires
expertise", "it breaks backward compatibility"  ...  Sheesh.  It's almost
as if these people live in the real world or something.

At some point, people with legacy code do need to invest in its viability
(and in many cases that point was probably "years ago.")  It doesn't always
happen, and this so-called "real world" thing that I keep hearing about seems
to have something to do with that.  Given that situation (and the general lack
of awareness of encoding issues that sometimes accompanies it), a default
encoding that loses less information (ISO-8859-1) has more practical value
than one that loses as much information as UTF-8 can.  That's one of those
real-world considerations (debugging reported problems that often stem from
that lack of awareness is part of the real world of anyone who has to do it);
I don't know if I'm overstating the importance of that or if other people
are unaware of just how intrusive this so-called "real world" can be.  (There
isn't much else to like about ISO-8859-1.)

So, let's see.  There doesn't seem to be as much of a performance hit
for repeatedly doing READ-CHAR on utf-8 encoded files (whose contents are
all STANDARD-CHAR/ASCII) as I'd remembered, so changing the default terminal
and file encodings (in the trunk) seems like a worthwhile experiment.  It may
be easier to evaluate some of these things with those changes in effect, and
it's entirely possible that the change is neither a particularly good nor
a particularly bad idea.

> rg
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel

More information about the Openmcl-devel mailing list