[Openmcl-devel] string encoding problem
Stas Boukarev
stassats at gmail.com
Fri Aug 7 08:09:15 PDT 2015
Mark Klein <m_klein at mit.edu> writes:
> I had a set of UTF-8 strings (ukrainian text) that I mistakenly wrote to disk as iso-8659-1, and then read back into clozure as UTF-8, so my strings get messed up e.g. into
>
> ÐкономÑÑÐ½Ñ ÑеÑоÑми УкÑаÑни
>
> (#\Latin_Capital_Letter_Eth #\U+0095 #\Latin_Capital_Letter_Eth #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\Latin_Capital_Letter_N_With_Tilde #\U+0087 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\ #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth #\Micro_Sign #\Latin_Capital_Letter_N_With_Tilde #\U+0084 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_Eth #\Cedilla #\ #\Latin_Capital_Letter_Eth #\Pound_Sign #\Latin_Capital_Letter_Eth #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth #\Degree_Sign #\Latin_Capital_Letter_N_With_Tilde #\U+0097 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Cedilla)
>
> Is there some way to recover the original Ukrainian text?
(ccl:decode-string-from-octets
(ccl:encode-string-to-octets
(coerce '(#\Latin_Capital_Letter_Eth #\U+0095 #\Latin_Capital_Letter_Eth #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\Latin_Capital_Letter_N_With_Tilde #\U+0087 #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\ #\Latin_Capital_Letter_N_With_Tilde #\U+0080) 'string)
:external-format :iso-8859-1)
:external-format :utf-8)
=>
"Економічні р"
--
With best regards, Stas.
More information about the Openmcl-devel
mailing list