[Openmcl-devel] %ioblock-read-u16-encoded-char

Gary Byers gb at clozure.com
Tue Apr 24 21:20:21 PDT 2007


My first reaction is that IOBLOCK-LITERAL-CHAR-CODE-LIMIT - which is
inherited from a similarly-named field in the CHARACTER-ENCODING -
should be #xd800 for all variants of UTF-16 (e.g.,
CHARACTER-ENCODING-LITERAL-CHAR-CODE-LIMIT should be set to #xd800 in
the UTF-16 CHARACTER-ENCODINGs.)

What I called the "literal-char-code-limit" is supposed to be the
exclusive upper bound on char-codes that can be passed through without
translation.  (The result of calling the decode/encode function should
be the same as the result of directly using the argument in those cases.)


(I haven't looked at this yet and might be misremembering something;
it's clearly the case that we want to do translation on some code
units >= #xd800 when decoding UTF-16, but I'm not 100% sure that the
parenthesised assertion above is correct.)

On Tue, 24 Apr 2007, Takehiko Abe wrote:

> The test form in %ioblock-read-u16-encoded-char:
>
> (< 1st-unit
>   (ioblock-literal-char-code-limit ioblock))
>
> returns T for UTF-16 surrogate values. So,
> ioblock-decode-input-function will not be called when it should be.
>
> (defun %ioblock-read-u16-encoded-char (ioblock)
>  (declare (optimize (speed 3) (safety 0)))
>  (let* ((ch (ioblock-untyi-char ioblock)))
>    (if ch
>      (prog1 ch
>        (setf (ioblock-untyi-char ioblock) nil))
>      (let* ((1st-unit (%ioblock-read-u16-code-unit ioblock)))
>        (if (eq 1st-unit :eof)
>          1st-unit
>          (locally
>              (declare (type (unsigned-byte 16) 1st-unit))
>            (if (< 1st-unit
>                   (the (mod #x110000) (ioblock-literal-char-code-limit
> ioblock)))
>              (code-char 1st-unit)
>              (funcall (ioblock-decode-input-function ioblock)
>                       1st-unit
>                       #'%ioblock-read-u16-code-unit
>                       ioblock))))))))
>
> A possible fix is to remove the test form and always call
> ioblock-decode-input-function. Although I am not sure if this is the
> right fix, it works at least for now because
> %ioblock-read-u16-encoded-char is used only by utf-16 and ucs-2
> currently -- they both have ioblock-literal-char-code-limit set to
> #x10000. (I am not sure if the test will ever be necessary.)
>
> After the fix:
>
> (with-open-file (out "home:utf-16.txt" :external-format :utf-16
>                     :direction :output
>                     :if-does-not-exist :create)
>  (write-char #\U+2000B out))
>
> (with-open-file (in "home:utf-16.txt" :external-format :utf-16)
>  (read-char in))
> --> #\U+2000B  ;; was nil
>
> ;; ucs-2
> (with-open-file (in "home:utf-16.txt" :external-format :ucs-2)
>  (read-char in))
> --> #\Replacement_Character ;; as expected
>
>
> regards,
> T.
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list