[Openmcl-devel] string encoding problem

Mark Klein m_klein at mit.edu
Fri Aug 7 08:51:01 PDT 2015


Awesome! Thanks!

>> I had a set of UTF-8 strings (ukrainian text) that I mistakenly wrote to disk as iso-8659-1, and then read back into clozure as UTF-8, so my strings get messed up e.g. into
>> 
>> Економічні реформи України
>> 
>> (#\Latin_Capital_Letter_Eth #\U+0095 #\Latin_Capital_Letter_Eth #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\Latin_Capital_Letter_N_With_Tilde #\U+0087 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\  #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth #\Micro_Sign #\Latin_Capital_Letter_N_With_Tilde #\U+0084 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_Eth #\Cedilla #\  #\Latin_Capital_Letter_Eth #\Pound_Sign #\Latin_Capital_Letter_Eth #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth #\Degree_Sign #\Latin_Capital_Letter_N_With_Tilde #\U+0097 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Cedilla)
>> 
>> Is there some way to recover the original Ukrainian text?
> (ccl:decode-string-from-octets
> (ccl:encode-string-to-octets
>  (coerce '(#\Latin_Capital_Letter_Eth #\U+0095 #\Latin_Capital_Letter_Eth #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\Latin_Capital_Letter_N_With_Tilde #\U+0087 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\  #\Latin_Capital_Letter_N_With_Tilde #\U+0080) 'string)
>  :external-format :iso-8859-1)
> :external-format :utf-8)
> 
> =>
> "Економічні р"
> 
> 
> 
> 
> -- 
> With best regards, Stas.

-------------------------------
Mark Klein
http://cci.mit.edu/klein

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1842 bytes
Desc: not available
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150807/a623a618/attachment.bin>


More information about the Openmcl-devel mailing list