[Openmcl-devel] Objective-C constant strings with unicode

Gary Byers gb at clozure.com
Wed Apr 6 02:49:21 PDT 2011



On Wed, 6 Apr 2011, Pascal J. Bourguignon wrote:

>
>
>    (lisp-string (ccl:@ "??)) --> "??t??" ; fails.
>
> with the "obvious":
>
>    (defun lisp-string (a-nsstring)
>      (ccl::%get-utf-8-cstring (objc:send a-nsstring 'utf8-string)))
>
>
> The problem is that make-cstring doesn't deal with encodings.

NSConstantString expects the bytepointer its "instances" contain to encode a
string in "the default C string encoding"; on my system, that seems to
be MacRoman.  I don't know if that's universally true, but if it isn't
then @"..." couldn't work portably in ObjC if the string literal contained
non-ASCII characters.

The third message in

<http://www.cocoabuilder.com/archive/cocoa/131727-how-to-code-nsstring-literal-with-utf8.html>

claims that it's a well-known issue that literal strings created via @"..."
in ObjC "can't contain non-ASCII characters"; it may be more accurate to
to say that they can only contain non-ASCII characters in ObjC if those
characters are encoded in MacRoman.

In any case, you're correct in noting that MAKE-CSTRING (which effetively
tries to encode the string in ISO-8859-1) isn't the right way to deal
with non-ASCII characters; I don't think that the approach that you
suggest (which, if I undestand the code correctly, tries to replace
the use of NSConstantCString with other kinds of NSStrings) is desirable
(and I'd be a little surprised if the bridge would bootstrap itself with
the change you describe in effect.)

NSConstantStrings can be (and usually are) created statically and with
very little involvement of the ObjC runtime.  It's meaningless to retain
or release them (and in fact those operations are no-ops.)  They can be
used early in the bridge's bootstrapping process and early in the process
by which an image reinitializes itself, before it's possible to send messages
and before the class hierarchy is (re)initialized.

Even if that code worked at the moment with your proposed changes in effect,
there's still something to be said for "cheap to create", "impossible to
release accidentally", and 'essentially the same thing that @"..." produces
in ObjC'.  There's certainly something to be said for "and deals with non-ASCII
characters" too, but addressing that - as far as it can be addressed by
NSConstantString - is a separate issue.


>
> So here is a replacement, which encodes the lisp string into utf-8, and
> which uses the NSString public API to build a string from the utf-8
> bytes.  (If you have any information about the internals of
> NSConstantString, you may want to "optimize" it).
>
>
> But as it is, I get what I expect:
>
>
>    (lisp-string @"??) --> "??
>
>
>
>
> ;;;
> ;;; Constants strings in objc-runtime don't support unicode characters.
> ;;; so we need to redo it here.
> ;;;
>
> (defun make-utf8-cstring (lstring)
>  (let* ((llen  (length lstring))
>         (clen  (ccl::utf-8-octets-in-string lstring 0 llen)))
>    (declare (fixnum llen clen))
>    (let* ((cstring (ccl::malloc (the fixnum (1+ clen)))))
>      (ccl::utf-8-memory-encode lstring cstring 0 0 llen)
>      (setf (ccl::%get-byte cstring clen) 0)
>      #+testing
>      (print (loop
>                for str = cstring
>                for i from 0
>                for ch = (CCL:%GET-UNSIGNED-BYTE str i)
>                while (plusp ch)
>                collect ch))
>      (values cstring clen))))
>
> (defun %make-constant-nsstring (string-literal)
>  (multiple-value-bind (bytes bytelen) (make-utf8-cstring string-literal)
>    (objc:send (objc:send ns:ns-string 'alloc)
>               :init-with-bytes-no-copy bytes
>               :length bytelen
>               :encoding #$|NSUTF8StringEncoding|
>               :free-when-done t)))
>
>
> (defvar *objc-constant-strings* (make-hash-table :test #'equal))
>
> (defstruct objc-constant-string
>  string
>  nsstringptr)
>
> (defun ns-constant-string (string)
>  (or (gethash string *objc-constant-strings*)
>      (setf (gethash string *objc-constant-strings*)
>            (make-objc-constant-string :string string
>                                       :nsstringptr (%make-constant-nsstring string)))))
>
> (defmethod make-load-form ((s objc-constant-string) &optional env)
>  (declare (ignore env))
>  `(ns-constant-string ,(objc-constant-string-string s)))
>
>
> (defmacro @ (string)
>  `(objc-constant-string-nsstringptr ,(ns-constant-string string)))
>
>
> (defun lisp-string (a-nsstring)
>  #+testing
>  (print (loop
>            for str = (objc:send a-nsstring 'utf8-string)
>            for i from 0
>            for ch = (CCL:%GET-UNSIGNED-BYTE str i)
>            while (plusp ch)
>            collect ch))
>  (ccl::%get-utf-8-cstring (objc:send a-nsstring 'utf8-string)))
>
>
> -- 
> __Pascal Bourguignon__                     http://www.informatimago.com/
> A bad day in () is better than a good day in {}.
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel



More information about the Openmcl-devel mailing list