[Openmcl-devel] %ioblock-read-u16-encoded-char
Gary Byers
gb at clozure.com
Tue Apr 24 21:20:21 PDT 2007
My first reaction is that IOBLOCK-LITERAL-CHAR-CODE-LIMIT - which is
inherited from a similarly-named field in the CHARACTER-ENCODING -
should be #xd800 for all variants of UTF-16 (e.g.,
CHARACTER-ENCODING-LITERAL-CHAR-CODE-LIMIT should be set to #xd800 in
the UTF-16 CHARACTER-ENCODINGs.)
What I called the "literal-char-code-limit" is supposed to be the
exclusive upper bound on char-codes that can be passed through without
translation. (The result of calling the decode/encode function should
be the same as the result of directly using the argument in those cases.)
(I haven't looked at this yet and might be misremembering something;
it's clearly the case that we want to do translation on some code
units >= #xd800 when decoding UTF-16, but I'm not 100% sure that the
parenthesised assertion above is correct.)
On Tue, 24 Apr 2007, Takehiko Abe wrote:
> The test form in %ioblock-read-u16-encoded-char:
>
> (< 1st-unit
> (ioblock-literal-char-code-limit ioblock))
>
> returns T for UTF-16 surrogate values. So,
> ioblock-decode-input-function will not be called when it should be.
>
> (defun %ioblock-read-u16-encoded-char (ioblock)
> (declare (optimize (speed 3) (safety 0)))
> (let* ((ch (ioblock-untyi-char ioblock)))
> (if ch
> (prog1 ch
> (setf (ioblock-untyi-char ioblock) nil))
> (let* ((1st-unit (%ioblock-read-u16-code-unit ioblock)))
> (if (eq 1st-unit :eof)
> 1st-unit
> (locally
> (declare (type (unsigned-byte 16) 1st-unit))
> (if (< 1st-unit
> (the (mod #x110000) (ioblock-literal-char-code-limit
> ioblock)))
> (code-char 1st-unit)
> (funcall (ioblock-decode-input-function ioblock)
> 1st-unit
> #'%ioblock-read-u16-code-unit
> ioblock))))))))
>
> A possible fix is to remove the test form and always call
> ioblock-decode-input-function. Although I am not sure if this is the
> right fix, it works at least for now because
> %ioblock-read-u16-encoded-char is used only by utf-16 and ucs-2
> currently -- they both have ioblock-literal-char-code-limit set to
> #x10000. (I am not sure if the test will ever be necessary.)
>
> After the fix:
>
> (with-open-file (out "home:utf-16.txt" :external-format :utf-16
> :direction :output
> :if-does-not-exist :create)
> (write-char #\U+2000B out))
>
> (with-open-file (in "home:utf-16.txt" :external-format :utf-16)
> (read-char in))
> --> #\U+2000B ;; was nil
>
> ;; ucs-2
> (with-open-file (in "home:utf-16.txt" :external-format :ucs-2)
> (read-char in))
> --> #\Replacement_Character ;; as expected
>
>
> regards,
> T.
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>
More information about the Openmcl-devel
mailing list