[Openmcl-devel] string encoding problem
Bill St. Clair
wws at clozure.com
Fri Aug 7 08:17:55 PDT 2015
Stas Boukarev's method was easier and faster than how I did it. I ran the
code below to get:
"Економічні реформи України"
-Bill
====
(in-package :cl-user)
(defparameter *list*
'(#\Latin_Capital_Letter_Eth #\U+0095 #\Latin_Capital_Letter_Eth
#\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_N_With_Tilde #\U+0096
#\Latin_Capital_Letter_N_With_Tilde #\U+0087 #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\
#\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth
#\Micro_Sign #\Latin_Capital_Letter_N_With_Tilde #\U+0084
#\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters
#\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_Eth #\Cedilla #\
#\Latin_Capital_Letter_Eth #\Pound_Sign #\Latin_Capital_Letter_Eth
#\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_N_With_Tilde #\U+0080
#\Latin_Capital_Letter_Eth #\Degree_Sign
#\Latin_Capital_Letter_N_With_Tilde #\U+0097 #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Cedilla))
(defparameter *string*
(map 'string 'identity *list*))
(with-open-file (stream "~/klein.txt"
:external-format :latin1
:direction :output
:if-exists :supersede)
(write-string *string* stream))
(with-open-file (stream "~/klein.txt"
:external-format :utf-8)
(read-line stream))
On Fri, Aug 7, 2015 at 11:09 AM, Stas Boukarev <stassats at gmail.com> wrote:
> Mark Klein <m_klein at mit.edu> writes:
>
> > I had a set of UTF-8 strings (ukrainian text) that I mistakenly wrote to
> disk as iso-8659-1, and then read back into clozure as UTF-8, so my strings
> get messed up e.g. into
> >
> > Економічні реформи України
> >
> > (#\Latin_Capital_Letter_Eth #\U+0095 #\Latin_Capital_Letter_Eth
> #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_N_With_Tilde #\U+0096
> #\Latin_Capital_Letter_N_With_Tilde #\U+0087 #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\
> #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth
> #\Micro_Sign #\Latin_Capital_Letter_N_With_Tilde #\U+0084
> #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters
> #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_Eth #\Cedilla #\
> #\Latin_Capital_Letter_Eth #\Pound_Sign #\Latin_Capital_Letter_Eth
> #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_N_With_Tilde #\U+0080
> #\Latin_Capital_Letter_Eth #\Degree_Sign
> #\Latin_Capital_Letter_N_With_Tilde #\U+0097 #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Cedilla)
> >
> > Is there some way to recover the original Ukrainian text?
> (ccl:decode-string-from-octets
> (ccl:encode-string-to-octets
> (coerce '(#\Latin_Capital_Letter_Eth #\U+0095 #\Latin_Capital_Letter_Eth
> #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_N_With_Tilde #\U+0096
> #\Latin_Capital_Letter_N_With_Tilde #\U+0087 #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\
> #\Latin_Capital_Letter_N_With_Tilde #\U+0080) 'string)
> :external-format :iso-8859-1)
> :external-format :utf-8)
>
> =>
> "Економічні р"
>
>
>
>
> --
> With best regards, Stas.
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> https://lists.clozure.com/mailman/listinfo/openmcl-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150807/476f4895/attachment.htm>
More information about the Openmcl-devel
mailing list