[Openmcl-devel] string encoding problem

Bill St. Clair wws at clozure.com
Fri Aug 7 08:17:55 PDT 2015


Stas Boukarev's method was easier and faster than how I did it. I ran the
code below to get:

"Економічні реформи України"

-Bill

====

(in-package :cl-user)

(defparameter *list*
  '(#\Latin_Capital_Letter_Eth #\U+0095 #\Latin_Capital_Letter_Eth
#\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_N_With_Tilde #\U+0096
#\Latin_Capital_Letter_N_With_Tilde #\U+0087 #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\
 #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth
#\Micro_Sign #\Latin_Capital_Letter_N_With_Tilde #\U+0084
#\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters
#\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_Eth #\Cedilla #\
 #\Latin_Capital_Letter_Eth #\Pound_Sign #\Latin_Capital_Letter_Eth
#\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_N_With_Tilde #\U+0080
#\Latin_Capital_Letter_Eth #\Degree_Sign
#\Latin_Capital_Letter_N_With_Tilde #\U+0097 #\Latin_Capital_Letter_Eth
#\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Cedilla))

(defparameter *string*
  (map 'string 'identity *list*))

(with-open-file (stream "~/klein.txt"
                        :external-format :latin1
                        :direction :output
                        :if-exists :supersede)
  (write-string *string* stream))

(with-open-file (stream "~/klein.txt"
                        :external-format :utf-8)
  (read-line stream))

On Fri, Aug 7, 2015 at 11:09 AM, Stas Boukarev <stassats at gmail.com> wrote:

> Mark Klein <m_klein at mit.edu> writes:
>
> > I had a set of UTF-8 strings (ukrainian text) that I mistakenly wrote to
> disk as iso-8659-1, and then read back into clozure as UTF-8, so my strings
> get messed up e.g. into
> >
> > Економічні реформи України
> >
> > (#\Latin_Capital_Letter_Eth #\U+0095 #\Latin_Capital_Letter_Eth
> #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_N_With_Tilde #\U+0096
> #\Latin_Capital_Letter_N_With_Tilde #\U+0087 #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\
> #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth
> #\Micro_Sign #\Latin_Capital_Letter_N_With_Tilde #\U+0084
> #\Latin_Capital_Letter_Eth #\Vulgar_Fraction_Three_Quarters
> #\Latin_Capital_Letter_N_With_Tilde #\U+0080 #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_Eth #\Cedilla #\
> #\Latin_Capital_Letter_Eth #\Pound_Sign #\Latin_Capital_Letter_Eth
> #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_N_With_Tilde #\U+0080
> #\Latin_Capital_Letter_Eth #\Degree_Sign
> #\Latin_Capital_Letter_N_With_Tilde #\U+0097 #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth #\Cedilla)
> >
> > Is there some way to recover the original Ukrainian text?
> (ccl:decode-string-from-octets
>  (ccl:encode-string-to-octets
>   (coerce '(#\Latin_Capital_Letter_Eth #\U+0095 #\Latin_Capital_Letter_Eth
> #\Masculine_Ordinal_Indicator #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_Three_Quarters #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Quarter #\Latin_Capital_Letter_N_With_Tilde #\U+0096
> #\Latin_Capital_Letter_N_With_Tilde #\U+0087 #\Latin_Capital_Letter_Eth
> #\Vulgar_Fraction_One_Half #\Latin_Capital_Letter_N_With_Tilde #\U+0096 #\
> #\Latin_Capital_Letter_N_With_Tilde #\U+0080) 'string)
>   :external-format :iso-8859-1)
>  :external-format :utf-8)
>
> =>
> "Економічні р"
>
>
>
>
> --
> With best regards, Stas.
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> https://lists.clozure.com/mailman/listinfo/openmcl-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150807/476f4895/attachment.htm>


More information about the Openmcl-devel mailing list