[Openmcl-devel] Character Encoding Problem?

Pascal J. Bourguignon pjb at informatimago.com
Thu Dec 23 21:53:32 PST 2010


Philippe Sismondi <psismondi at arqux.com> writes:

> In the past day or so I posted a question regarding file
> :external-format usage before I had learned everything I should have.
>
> However, in attempting to sort out my character encoding problems I
> have observed some behaviour in ccl which seems problematic to
> me. This problem relates to the presence of a null character,
> i.e. #\Null, in a string.
>
> When my function outputs the following two strings (in this order) to
> a file using external-format :utf-8, the character encoding of the
> second string gets messed up:
>
> (format out "AB^@D~%")
> (format out "María~%")
>
> In the first string above ^@ represents the null character. Notice
> that the second string contains an accented i, which is char-code
> 237. If the null character is not present the second string is encoded
> properly on output. When the null is there I am getting something or
> other that is wrong, but I don't really know what  ccl is trying to do
> with it.
>
> The nulls are getting into the strings from external binary files that
> I am parsing. Either the input data is corrupt, or my parser is
> buggy. In any case, the string containing the null was output a
> thousand lines or so before the messed up string, so it took me a long
> time to find the connection.
>
> However the nulls got in my strings, it does not seem right to me that
> the character encoding  on output should be affected by this. At
> least, I tried the same thing in sbcl and did not observe this
> behaviour.
>
> Is this a bug? Or am I doin' it wrong?

You forgot the essential information, the output of:

    od -t x1 file.dat

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.




More information about the Openmcl-devel mailing list