[Openmcl-devel] ICU sketch
Gary Byers
gb at clozure.com
Wed May 2 03:34:18 PDT 2007
I've been playing around a little with interfaces derived from the
Fink libicu-dev package, which describe the ICU functionality that's
built into OSX (in CoreFoundation, /usr/lib/libicucore, and possibly
elsewhere.) As far as I can tell, everything's there (where
"everything" means "at least all of the C-callable functionality
described in those headers".)
The Fink package I used describes ICU 3.2, which seems to match what
Apple ships with current Tiger releases. I -think- that that's a
proper superset of what was available in Panther (ICU 2.8 ?).
I uploaded the Darwin PPC32 interfaces that I've been playing with
to
<ftp://clozure.com/pub/testing/darwinppc32-icu-interfaces-070501.tar.gz>
They let you do things like:
(use-interface-dir :icu)
;;; CoreFoundation might be overkill, but I don't know for sure
;;; that "everything" in ICU is available elsewhere.
#+darwin-target
(open-shared-library "CoreFoundation.framework/CoreFoundation")
#-darwin-target
(open-shared-library "libicuuc.so") ; maybe others ?
(defun u-error-name (code)
(let* ((p (#_u_errorName code)))
(if (%null-ptr-p p)
(format nil "unknown error code ~d")
(%get-cstring p))))
(define-condition icu-error (program-error)
((code :initarg :code))
(:report (lambda (condition stream)
(with-slots (code) condition
(format stream "ICU error: ~a" (u-error-name code))))))
(defun ucharname (char)
(rlet ((perr :<UE>rror<C>ode ))
(do* ((bufsize 32))
()
(%stack-block ((buf bufsize))
;; Clear the error return word before making the
;; call, since ICU usually doesn't bother to set
;; it on success. (That must have made sense to
;; someone at some time.)
(setf (pref perr :<UE>rror<C>ode) #$U_ZERO_ERROR)
(let* ((len (#_u_charName (char-code char)
#$U_UNICODE_CHAR_NAME
buf
bufsize
perr))
(err (pref perr :<UE>rror<C>ode)))
(if (or (eql err #$U_ZERO_ERROR)
(eql err #$U_STRING_NOT_TERMINATED_WARNING))
(return (%str-from-ptr buf len))
(if (eql err #$U_BUFFER_OVERFLOW_ERROR)
(setq bufsize (* bufsize 2))
(error 'icu-error :code err))))))))
#|
? (ucharname #\u+12000)
"CUNEIFORM SIGN A"
|#
which is, as you pointed out a few weeks ago, a start. (One thing on the
TODO list should be using ICU's converters - or libiconv's - to handle
the few hundred character encodings that these libraries support that
OpenMCL doesn't.)
For other platforms, the story seems to be:
Linux, FreeBSD - icu is generally widely available/easily installable.
it may be the case that function names are mangled -
so that #_u_charName becomes #_u_charName_3_6 in ICU 3.6.
This convention is easy to use from any language, as
long as it's C.
64-bit Darwin - hey, how about those iPhones ? (e.g., wait for Leopard.)
More information about the Openmcl-devel
mailing list