[Openmcl-devel] Unicode issues, esp security
james anderson
james.anderson at setf.de
Mon Apr 13 13:24:13 PDT 2009
[ironic in this discussion, is that utf-8b is non-conformant - by
definition.]
On 2009-04-13, at 20:37 , Dan Weinreb wrote:
> Luis,
>
> From two Unicode experts I have consulted come
> the following comments:
>
> See:
>
> http://www.unicode.org/reports/tr36/
>
> Cases like this, in which an illegal sequence is explicitly
> transformed into another illegal sequence, would meet with a lot of
> resistance from folks who care about security.
>
> It's important not to do anything outside the definition. Your
> objection to CODE-CHAR returning NIL is incompatible with the Unicode
> concept of "Noncharacters". See the Unicode report section 16.7.
is not 16.7 concerned with unicode interchange? kuhn's proposal, from
which oliviera's 8b efforts follow, is not.
it concerns an unambiguous internal representation. in any case,
kuhn's proposal would also appear to adhere to tr36's
recommendations, in that it neither deletes the initial invalid byte,
nor consumes successors.
one may argue, that the result is not a vector with element type
character.
one may also argue, that the result should be permissible as input to
an utf-8b encoding only and any other attempted encoding would be an
error.
the question remains, should a runtime support efficient decoding of
this class of data and, if so, how should it do that with convenient,
efficient operations on the respective internal representation? if
the answer is "no lisp implementation should," then babel should
eliminate utf-8b. if the answer is "there should be some way," then -
particularly in light of the security issues, all implementations
_should_ behave the same.
More information about the Openmcl-devel
mailing list