[Openmcl-devel] encoding/decoding clozure strings to/from UTF-8
rm at seid-online.de
Tue Dec 30 19:49:16 UTC 2014
On Tue, Dec 30, 2014 at 11:46:36AM -0800, Ron Garret wrote:
> You’re working way too hard. Character streams automatically do work for you. All character streams in CCL have a parameter that sets their character encoding, and the default is UTF-8. So unless you want a different encoding, you should just be able to output characters to a stream and the Right Thing will happen. If it doesn’t, then either your default encoding has changed somehow, or you’ve found a bug in CCL.
Well, this is about HTTP, isn't it? Unfortunately, in HTTP you either need to use a byte stream or
neet to use a stream which can switch encoding while open (an HTTP header always needs to be RFC 822
conformant (7bit ASCII) while the content part can have a custom encoding.
Mark, before you reinvent the wheel: have a look at flexi-streams (http://weitz.de/flexi-streams/), a library
written to solve exactly this problem (or you might want to base your server on hunchentoot which does all the
ugly encoding for you ...).
HTH, and a happy new year
> On Dec 30, 2014, at 11:35 AM, Mark Klein <m_klein at MIT.EDU> wrote:
> > Karsten, Matthew,
> > Thanks for the pointers, that makes sense. My next challenge is to change a bunch of web page generation methods so that they output UTF-8. I wrote a little macro that I think will do the job. Am I on the right track, can it be made substantially faster or reliable somehow?
> > (defmacro c2utf8 (&rest body)
> > "Takes everything body sends to stream, and converts it to UTF-8"
> > `(let ((list (encode-string-to-octets (with-output-to-string (stream) , at body) :external-format :utf-8)))
> > (loop for c across list do (write-char (code-char c) stream))))
> > It might be better, for example, if I could somehow redefine the stream so the UTF-32 text that goes in comes out as UTF-8?
> > Thanks!
> > Mark
> >> On Dec 30, 2014, at 2:13 PM, Karsten Poeck <karsten.poeck at gmail.com> wrote:
> >> On 30.12.14 19:46, R. Matthew Emerson wrote:
> >>>> On Dec 30, 2014, at 1:19 PM, Mark Klein <m_klein at MIT.EDU> wrote:
> >>>> I have Clozure 1.10 running on a Mac (OS 10.10.1). I'd like to be able to encode strings to and from UTF-8 (to handle multi-language text in a web application I am developing). Can someone give me some pointers on how I can do that?
> >>> http://ccl.clozure.com/docs/ccl.html#encoding-and-decoding-strings
> >>> Basically, use (encode-string-to-octets string :external-format :utf-8) to encode a lisp string into a vector of octets containing UTF-8-encoded data, and use (decode-string-from-octets vector :external-format :utf-8) to create a lisp string from a vector of octets containing UTF-8-encoded data.
> >> Since Matthew already answered the original question, another example with 1,2 and 3 byte encoding in utf-8 from http://en.wikipedia.org/wiki/UTF-8
> >> (map 'vector #'(lambda(byte)(write-to-string byte :base 16)) (encode-string-to-octets "$¢€" :external-format :utf-8 :use-byte-order-mark nil))
> >> #("24" "C2" "A2" "E2" "82" "AC")
> >> To read an utf-8 encoded file use
> >> (defun read-utf-8-file (file)
> >> (with-open-file (stream file :direction :input :external-format :utf-8)
> >> .......)
> >> regards
> >> Karsten
> >> _______________________________________________
> >> Openmcl-devel mailing list
> >> Openmcl-devel at clozure.com
> >> https://lists.clozure.com/mailman/listinfo/openmcl-devel
> > -------------------------------
> > Mark Klein
> > http://cci.mit.edu/klein
> > Principal Research Scientist
> > Center for Collective Intelligence
> > Massachusetts Institute of Technology
> > Visiting Researcher
> > Dynamic and Distributed Information Systems Group
> > University of Zurich
> > _______________________________________________
> > Openmcl-devel mailing list
> > Openmcl-devel at clozure.com
> > https://lists.clozure.com/mailman/listinfo/openmcl-devel
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
More information about the Openmcl-devel