[Openmcl-devel] ICU sketch

Gary Byers gb at clozure.com
Wed May 2 03:34:18 PDT 2007


I've been playing around a little with interfaces derived from the
Fink libicu-dev package, which describe the ICU functionality that's
built into OSX (in CoreFoundation, /usr/lib/libicucore, and possibly
elsewhere.)  As far as I can tell, everything's there (where
"everything" means "at least all of the C-callable functionality
described in those headers".)

The Fink package I used describes ICU 3.2, which seems to match what
Apple ships with current Tiger releases.  I -think- that that's a
proper superset of what was available in Panther (ICU 2.8 ?).

I uploaded the Darwin PPC32 interfaces that I've been playing with
to

<ftp://clozure.com/pub/testing/darwinppc32-icu-interfaces-070501.tar.gz>

They let you do things like:

(use-interface-dir :icu)

;;; CoreFoundation might be overkill, but I don't know for sure
;;; that "everything" in ICU is available elsewhere.

#+darwin-target
(open-shared-library "CoreFoundation.framework/CoreFoundation")

#-darwin-target
(open-shared-library "libicuuc.so") ; maybe others ?

(defun u-error-name (code)
   (let* ((p (#_u_errorName code)))
     (if (%null-ptr-p p)
       (format nil "unknown error code ~d")
       (%get-cstring p))))

(define-condition icu-error (program-error)
   ((code :initarg :code))
   (:report (lambda (condition stream)
              (with-slots (code) condition
                (format stream "ICU error: ~a" (u-error-name code))))))


(defun ucharname (char)
   (rlet ((perr :<UE>rror<C>ode ))
     (do* ((bufsize 32))
          ()
       (%stack-block ((buf bufsize))
         ;; Clear the error return word before making the
         ;; call, since ICU usually doesn't bother to set
         ;; it on success.  (That must have made sense to
         ;; someone at some time.)
         (setf (pref perr :<UE>rror<C>ode) #$U_ZERO_ERROR)
         (let* ((len (#_u_charName (char-code char)
                                   #$U_UNICODE_CHAR_NAME
                                   buf
                                   bufsize
                                   perr))
                (err (pref perr :<UE>rror<C>ode)))
           (if (or (eql err #$U_ZERO_ERROR)
                   (eql err #$U_STRING_NOT_TERMINATED_WARNING))
             (return (%str-from-ptr buf len))
             (if (eql err #$U_BUFFER_OVERFLOW_ERROR)
               (setq bufsize (* bufsize 2))
               (error 'icu-error :code err))))))))
#|
? (ucharname #\u+12000)
"CUNEIFORM SIGN A"
|#

which is, as you pointed out a few weeks ago, a start. (One thing on the
TODO list should be using ICU's converters - or libiconv's - to handle
the few hundred character encodings that these libraries support that
OpenMCL doesn't.)

For other platforms, the story seems to be:
Linux, FreeBSD - icu is generally widely available/easily installable.
                  it may be the case that function names are mangled -
                  so that #_u_charName becomes #_u_charName_3_6 in ICU 3.6.
                  This convention is easy to use from any language, as
                  long as it's C.
64-bit Darwin  - hey, how about those iPhones ?  (e.g., wait for Leopard.)



More information about the Openmcl-devel mailing list