[Openmcl-devel] string encoding problem

Mark Klein m_klein at mit.edu
Fri Aug 7 07:38:33 PDT 2015


I had a set of UTF-8 strings (ukrainian text) that I mistakenly wrote to disk as iso-8659-1, and then read back into clozure as UTF-8, so my strings get messed up e.g. into

Економічні реформи України

(#\Latin_Capital_Letter_Eth #\U+0095 #\Latin_Capital_Letter_Eth #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\Latin_Capital_Letter_N_With_Tilde #\U+0087 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\  #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth #\Micro_Sign #\Latin_Capital_Letter_N_With_Tilde #\U+0084 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_Eth #\Cedilla #\  #\Latin_Capital_Letter_Eth #\Pound_Sign #\Latin_Capital_Letter_Eth #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth #\Degree_Sign #\Latin_Capital_Letter_N_With_Tilde #\U+0097 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Cedilla)

Is there some way to recover the original Ukrainian text?

   Thanks!

	Mark 

-------------------------------
Mark Klein
http://cci.mit.edu/klein

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1842 bytes
Desc: not available
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150807/183bb23d/attachment.bin>


More information about the Openmcl-devel mailing list