[Openmcl-devel] Extracting unicode from an external source via FFI
John McAleely
john at mcaleely.com
Sat Feb 21 17:07:07 PST 2009
Hi,
I'm attempting to get some unicode strings from an external source (a
MySQL database) into a form I can use within CCL (This would be most
convenient if the c data 'became' a native lisp string). I am having
problems with reading them in, and want to ask what the options are
within the CCL FFI. If anyone's been down this route before, I'd be
grateful for pointers.
I'm using:
Welcome to Clozure Common Lisp Version 1.2-r72:73M-ccl (DarwinX8664)!
(Note that the revision number reflects storage in my own subversion
repository. I'm using an unmodified, locally built, version synced
about a month ago.)
My investigations to date (I'm also using clsql 4.0.3/uffi 1.6.0)
suggest that data can make it from a lisp string into the SQL database
(how I've not looked into yet - but the mysql command line sees the
data correctly). When strings come back across the connection, they
arrive garbled. A two character Chinese string in the SQL table
becomes a six character lisp string.
Rummaging into CLSQL/UFFI, I think that ultimately this bit of code
reads strings from the mysql c interface:
#+openmcl ,@(if length
`((ccl:%str-from-ptr ,stored-obj ,length))
`((ccl:%get-cstring ,stored-obj)))
Having looked at the ccl code, there is a function near %get-cstring
called:
(defun %get-utf-8-cstring (pointer) ....)
This seems interesting. I speculate:
+ The mysql_c interface is sending over c-style strings, in a
character set of its choice.
+ The uffi code chooses to read this with %get-cstring, which chops
the string into 8 bit bytes, and assumes each is one is a character in
some 256 element character set.
+ The CLSQL code then takes this and passes it back to me as a lisp
string.
+ I wonder if I could convince mysql to use utf-8 within its c
strings, and the uffi code to use %get-utf-8-cstring, then I could
successfully read unicode from the database into lisp strings?
So, if you've been down a similar path, does my speculation sound
correct?
Does my gentle use of grep and google appear to have tumbled on the
'right' CCL functions for this work. Is there a 'better' CCL API for
reading a foreign string in some unicode character set?
Looking at the API docs, telling MySQL to use UTF8 seems
straightforward, and I'm willing to hack UFFI/CLSQL to make this work.
Before I start hacking, I thought I'd ask what my options are for
interfacing into CCL's unicode support.
Thanks,
J
More information about the Openmcl-devel
mailing list