[Openmcl-devel] Slime's utf-8-unix

Gary Byers gb at clozure.com
Fri Oct 27 15:32:02 PDT 2006



On Fri, 27 Oct 2006, Ben Hyde wrote:

>
> In the HTTP world the character set is very volatile.  Coming from
> that world my first attempt was to slam the encoding just after the
> accept.  At this moment I'm actually a bit unclear on how to do that
> in a safe and reliable manner.

The fact that STREAM-EXTERNAL-FORMAT is SETFable is supposed to help
to deal with this.

? (defvar *s* (make-socket :remote-host "clozure.com" :remote-port :smtp))
*S*

;;; Whoops; forgot that SMTP wants CRLF termination, and things should probably
;;; be in "NET-ASCII" at this point.

? (setf (stream-external-format *s*) '(:character-encoding :us-ascii
                                        :line-termination :crlf))
#<EXTERNAL-FORMAT :US-ASCII/:CRLF #x300040EBC27D>
? (read-line *s*)
"220 clozure.com ESMTP Sendmail 8.13.3/8.13.3; Fri, 27 Oct 2006 15:10:41 -0600 (MDT)"

So, the stream handled CRLF translation for us (which is at least and
at most a small victory.)  In general, changing a stream's
external-format affects subsequent user-level character I/O
operations; if there are already some buffered octets, they stay buffered;
changing the streams character encoding merely changes the way that those
octets will be interpreted as characters.

Real-world protocols may contain additional details (transfer encodings - if
that's the correct term - like base64 and quoted-printable) that this doesn't
help with at all, but it should be possible/safe/reliable to change character
encoding and/or line-termination on the fly.

Back to Swank: that seems to give us a third option, namely doing:

  (let* ((stream-socket (ccl:accept-connection ... :wait t)))
     (setf (stream-external-format stream-socket)
           (make-external-format ...))
     stream-socket)

in swank-openmcl.lisp's ACCEPT-CONNECTION.

Whether that's at all preferable to binding the special variables is unclear;
in theory. if the CCL:ACCEPT-CONNECTION was interrupted and something tried
to create a socket (from a break loop or something) at that time, the fact
that the defaults have been bound to non-default values might create 
unexpected problems.  For a number of reasons, that's an unlikely scenario,
so whether it's done via binding the defaults of via (SETF STREAM-EXTERNAL-FORMAT)
may not matter too much.

>> It'd probably be good - once some dust settles - to make utf-8 the
>> default;
>> getting Slime to support it would settle a lot of that dust.
>
> Absolutely!  Unicode support in the emacs world is just as tangled as
> it is in the lisp world.  Dust would come out of both sides.
>

I read a page recently which described the various ways that some versions
of XEmacs encoded characters internally; as one might expect, it was bloodcurdling.

>  - ben
>



More information about the Openmcl-devel mailing list