[Openmcl-devel] Objective-C constant strings with unicode

Pascal J. Bourguignon pjb at informatimago.com
Wed Apr 6 15:39:58 PDT 2011

Gary Byers <gb at clozure.com> writes:

> On Wed, 6 Apr 2011, Pascal J. Bourguignon wrote:
>>    (lisp-string (ccl:@ "??)) --> "??t??" ; fails.
>> with the "obvious":
>>    (defun lisp-string (a-nsstring)
>>      (ccl::%get-utf-8-cstring (objc:send a-nsstring 'utf8-string)))
>> The problem is that make-cstring doesn't deal with encodings.
> NSConstantString expects the bytepointer its "instances" contain to encode a
> string in "the default C string encoding"; on my system, that seems to
> be MacRoman.  I don't know if that's universally true, but if it isn't
> then @"..." couldn't work portably in ObjC if the string literal contained
> non-ASCII characters.
> The third message in
> <http://www.cocoabuilder.com/archive/cocoa/131727-how-to-code-nsstring-literal-with-utf8.html>
> claims that it's a well-known issue that literal strings created via @"..."
> in ObjC "can't contain non-ASCII characters"; it may be more accurate to
> to say that they can only contain non-ASCII characters in ObjC if those
> characters are encoded in MacRoman.
> In any case, you're correct in noting that MAKE-CSTRING (which effetively
> tries to encode the string in ISO-8859-1) isn't the right way to deal
> with non-ASCII characters; I don't think that the approach that you
> suggest (which, if I undestand the code correctly, tries to replace
> the use of NSConstantCString with other kinds of NSStrings) is desirable
> (and I'd be a little surprised if the bridge would bootstrap itself with
> the change you describe in effect.)
> NSConstantStrings can be (and usually are) created statically and with
> very little involvement of the ObjC runtime.  It's meaningless to retain
> or release them (and in fact those operations are no-ops.)  They can be
> used early in the bridge's bootstrapping process and early in the process
> by which an image reinitializes itself, before it's possible to send messages
> and before the class hierarchy is (re)initialized.
> Even if that code worked at the moment with your proposed changes in effect,
> there's still something to be said for "cheap to create", "impossible to
> release accidentally", and 'essentially the same thing that @"..." produces
> in ObjC'.  There's certainly something to be said for "and deals with non-ASCII
> characters" too, but addressing that - as far as it can be addressed by
> NSConstantString - is a separate issue.

Thanks, this is quite informative.

So I think we should distinguish several cases:

   - bootstrapping the bridge,
   - lisp application strings containing only MacRoman characters,
   - lisp application strings containing non MacRoman characters.

NSConstantString can be used only in the first two cases.

Note that lisp application strings are not created at "compilation"
time, but once lisp is already running.

__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.

More information about the Openmcl-devel mailing list