[Openmcl-devel] Extracting unicode from an external source via FFI

John McAleely john at mcaleely.com
Sat Feb 21 17:07:07 PST 2009


Hi,

I'm attempting to get some unicode strings from an external source (a  
MySQL database) into a form I can use within CCL (This would be most  
convenient if the c data 'became' a native lisp string). I am having  
problems with reading them in, and want to ask what the options are  
within the CCL FFI. If anyone's been down this route before, I'd be  
grateful for pointers.

I'm using:

Welcome to Clozure Common Lisp Version 1.2-r72:73M-ccl  (DarwinX8664)!

(Note that the revision number reflects storage in my own subversion  
repository. I'm using an unmodified, locally built, version synced  
about a month ago.)

My investigations to date (I'm also using clsql 4.0.3/uffi 1.6.0)  
suggest that data can make it from a lisp string into the SQL database  
(how I've not looked into yet - but the mysql command line sees the  
data correctly). When strings come back across the connection, they  
arrive garbled. A two character Chinese string in the SQL table  
becomes a six character lisp string.

Rummaging into CLSQL/UFFI, I think that ultimately this bit of code  
reads strings from the mysql c interface:

   #+openmcl ,@(if length
	   `((ccl:%str-from-ptr ,stored-obj ,length))
	   `((ccl:%get-cstring ,stored-obj)))

Having looked at the ccl code, there is a function near %get-cstring  
called:

(defun %get-utf-8-cstring (pointer) ....)

This seems interesting. I speculate:

+ The mysql_c interface is sending over c-style strings, in a  
character set of its choice.
+ The uffi code chooses to read this with %get-cstring, which chops  
the string into 8 bit bytes, and assumes each is one is a character in  
some 256 element character set.
+ The CLSQL code then takes this and passes it back to me as a lisp  
string.
+ I wonder if I could convince mysql to use utf-8 within its c  
strings, and the uffi code to use %get-utf-8-cstring, then I could  
successfully read unicode from the database into lisp strings?

So, if you've been down a similar path, does my speculation sound  
correct?

Does my gentle use of grep and google appear to have tumbled on the  
'right' CCL functions for this work. Is there a 'better' CCL API for  
reading a foreign string in some unicode character set?

Looking at the API docs, telling MySQL to use UTF8 seems  
straightforward, and I'm willing to hack UFFI/CLSQL to make this work.  
Before I start hacking, I thought I'd ask what my options are for  
interfacing into CCL's unicode support.

Thanks,

J




More information about the Openmcl-devel mailing list