[Openmcl-devel] *default-character-encoding* should be :utf-8

Tim Bradshaw tfb at tfeb.org
Mon Mar 5 04:08:01 PST 2012


On 5 Mar 2012, at 01:53, Gary Byers wrote:

> If you're writing code in a vacuum, UTF-8 is a good choice.

I think this is an important point.  Also important it that it's not just code: the amount of Lisp source an application reads and writes is generally somewhere between negligable and zero.  So the important thing to worry about is data files, of whatever kind.

The second important thing is that an incompatible change (for instance from latin-1 to utf-8) can hurt naïve users, because they will not understand encodings and external formats and will not be specifying them when they do I/O.  So suddenly all the currency symbols in their data files, for instance, will mysteriously change, and they won't know why and probably won't realise they have until their system is live and the roof falls in on them.

Now of course, people that naïve deserve to lose, don't they?  Well, yes, they do, except that those people write the code that runs my bank account[*] (definitely) and yours (probably).  Really, they do: my day job is supporting these people, and it is just horrifying that things can make it into production with no understanding of these issues at all, but it happens all the time.

Don't get me wrong: I can't decide whether making things dependent on locale, or just saying they should be UTF-8 is the Right Thing, but one or the other is. And I would dearly like to burn the people who write production code that does not think about encodings and external formats.  But I care more about having my money not all vanish.

So I think a cost / benefit thing would be:

Benefit.  users who understand care are already setting this: they would save a couple of lines in their init file.  In theory, if everyone else changes to use UTF-8, things would be better for data interchange.

Cost. users who don't understand and care will probably get nasty surprises when code that used to work stops working.  External data will still come in in latin-1 (or whatever) because we can't change that, so these things won't actually get better.

I think the benefit is small, and the cost is, unfortunately, potentially large.

Again, remember: I think UTF-8 (or locale-dependent encodings) is The Right Thing. Just probably not in practice.

--tim

[*] Not in Lisp, but they would make the same mistakes in Lisp.


More information about the Openmcl-devel mailing list