[Openmcl-devel] pathname-encoding-name

Sun Sep 12 20:43:58 PDT 2010

On Mon, 13 Sep 2010, Pascal J. Bourguignon wrote:

> Gary Byers <gb at clozure.com> writes:
>
>> Something that I read a few years ago suggested that trying to (do the
>> equivalent of) set PATHNAME-ENCODING-NAME automatically (from locale
>> information) was a bad idea, because that locale information was often
>> wrong.  That might be true, but it might be reasonable to do that (and
>> suggest that people fix incorrect locale information, which might
>> affect other programs as well) rather than require that it be set in
>> one's init file (when it's only really needed/used on some platforms.)
>
> Even if the locale is correct, unix file names are but sequences if
> binary bytes, and should not be treated otherwise.
>
> If you choose to represent unix pathnames as strings decoded by ASCII
> you will broke because some bytes will be greater than 127.    If you
> consider strings decoded from UTF-8-8, you will broke because some
> sequences will be wrong.   You consider strings decoded from a 1-1
> code such as ISO-8859-1, but then you will have some wrong characters.

Perhaps that's why Takehiko Abe reported this as a bug ?

>
> IMO, unix path processing functions should use the correct type which
> is (vector (unsigned-byte 8)), (which is what a C char[] is), and
> relegate conversion to string and dealing with conversion errors, to
> the user interface layers.  "Oops, I cannot convert that path to a
> string, do you want to see it as an hex dump?"

I prefer that users enter pathname information by toggling DIP switches
on the front panels of their computers; if they don't have DIP switches
or front panels, they should solder them on.  What are they, wimps ?

I have no idea how to refer to a filename (at the OS level) other than
as a vector of unsigned bytes (or as a vector of (UNSIGNED-BYTE 16) on
Windows, or as "whatever the OS wants" in general.  Unfortunately, these
wimpy users design and use languages where filenames are represented as
these things called "strings", and the issue of how these "string things"
are mapped to and from "whatever the OS wants" - what character encoding
should be used for that mapping - becomes relevant.  On some OSes (including
some Unix variants), the character encoding is dictated by the filesystem
(the vector of unsigned bytes passed to the "open" system call on OSX has
to be well-formed UTF-8); other OSes treat the filename as a more-or-less
arbitrary sequence of bytes, and a meaningful mapping depends on consistent
use of conventions.

(Yes, this mapping has to happen at "the user interface layer", which is where
things like OPEN and DIRECTORY and RENAME-FILE and ... practically everything
above the levels of DIP switches and hex dumps reside.  Rest assured that
the way that you think things should work is both "how they do work" and
"about the only way that they could possibly work", and there doesn't seem
to be any confusion about that.)

>
> -- 
> __Pascal Bourguignon__                     http://www.informatimago.com/
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>