[Openmcl-devel] new snapshot tarballs (finally)

Erik Pearson erik at adaptations.com
Wed Oct 25 11:24:39 PDT 2006


Great!

It compiled my stuff with no problems... but then I ran into:

Undefined function CCL::%BIVALENT-IOBLOCK-WRITE-U8-BYTE

It does not appear to exist where it might be expected, in l1-streams.lisp. 
In fact, none of the bivalent write functions appear to be there...

Erik.

--On October 24, 2006 1:39:41 PM -0600 Gary Byers <gb at clozure.com> wrote:

> There are now new (061024) tar archives for DarwinPPC (32 and 64-bit),
> LinuxPPC (32  and 64-bit), LinuxX8664 (64-bit), DarwinX8664 (64-bit), and
> FreeBSDX8664 (64-bit) in ftp://clozure.com/pub/testing
>
> These archives are all self-contained (contain sources, binaries,
> interfaces, the CVS ChangeLog, and release notes); the release-notes
> entry for this snapshot is included below.
>
> I'm sorry that it's taken so long to get things back in synch; now that
> they are, I hope that they'll stay that way for a while and that people
> who want to track the bleeding edge will have an easier time doing so.
>
> Please report bugs!
>
> OpenMCL 1.1-pre-061024
> - The FASL version changed (old FASL files won't work with this
>    lisp version), as did the version information which tries to
>    keep the kernel in sync with heap images.
> - Linux users: it's possible (depending on the distribution that
>    you use) that the lisp kernel will claim to depend on newer
>    versions of some shared libraries than the versions that you
>    have installed.  This is mostly just an artifact of the GNU
>    linker, which adds version information to dependent library
>    references even though no strong dependency exists.  If you
>    run into this, you should be able to simply cd to the appropriate
>    build directory under ccl/lisp-kernel and do a "make".
> - There's now a port of OpenMCL to FreeBSD/amd64; it claims to be
>    of beta quality.  (The problems that made it too unstable
>    to release as of a few months ago have been fixed;  I stil run
>    into occasional FreeBSD-specific issues, and some such issues
>    may remain.)
> - The Darwin X8664 port is a bit more stable (no longer generates
>    obscure "Trace/BKPT trap" exits or spurious-looking FP exceptions.)
>    I'd never want to pass up a chance to speak ill of Mach, but both
>    of these bugs seemed to be OpenMCL problems rather than Mach kernel
>    problems, as I'd previously more-or-less assumed.
> - I generally don't use SLIME with OpenMCL, but limited testing
>    with the 2006-04-20 verson of SLIME seems to indicate that no
>    changes to SLIME are necessary to work with this version.
> - CHAR-CODE-LIMIT is now #x110000, which means that all Unicode
>    characters can be directly represented.  There is one CHARACTER
>    type (all CHARACTERs are BASE-CHARs) and one string type (all
>    STRINGs are BASE-STRINGs.)  This change (and some other changes
>    in the compiler and runtime) made the heap images a few MB larger
>    than in previous versions.
> - As of Unicode 5.0, only about 100,000 of 1114112./#x110000 CHAR-CODEs
>    are actually defined; the function CODE-CHAR knows that certain
>    ranges of code values (notably #xd800-#xddff) will never be valid
>    character codes and will return NIL for arguments in that range,
>    but may return a non-NIL value (an undefined/non-standard CHARACTER
>    object) for other unassigned code values.
> - The :EXTERNAL-FORMAT argument to OPEN/LOAD/COMPILE-FILE has been
>    extended to allow the stream's character encoding scheme (as well
>    as line-termination conventions) to be specified; see more
>    details below.  MAKE-SOCKET has been extended to allow an
>    :EXTERNAL-FORMAT argument with similar semantics.
> - Strings of the form "u+xxxx" - where "x" is a sequence of one
>    or more hex digits- can be used as as character names to denote
>    the character whose code is the value of the string of hex digits.
>    (The +  character is actually optional, so  #\u+0020, #\U0020, and
>    #\U+20 all refer to the #\Space character.)  Characters with codes
>    in the range #xa0-#x7ff (IIRC) also have symbolic names (the
>    names from the Unicode standard with spaces replaced with underscores),
>    so #\Greek_Capital_Letter_Epsilon can be used to refer to the character
>    whose CHAR-CODE is #x395.
> - The line-termination convention popularized with the CP/M operating
>    system (and used in its descendants) - e.g., CRLF - is now supported,
>    as is the use of Unicode #\Line_Separator (#\u+2028).
> - About 15-20 character encoding schemes are defined (so far); these
>    include UTF-8/16/32 and the big-endian/little-endian variants of
>    the latter two and ISO-8859-* 8-bit encodings.  (There is not
>    yet any support for traditional (non-Unicode) ways of externally
>    encoding characters used in Asian languages, support for legacy
>    MacOS encodings, legacy Windows/DOS/IBM encodings, ...)  It's hoped
>    that the existing infrastructure will handle most (if not all) of
>    what's missing; that may not be the case for "stateful" encodings
>    (where the way that a given character is encoded/decoded depend
>    on context, like the value of the preceding/following character.)
> - There isn't yet any support for Unicode-aware collation (CHAR>
>    and related CL functions just compare character codes, which
>    can give meaningless results for non-STANDARD-CHARs), case-inversion,
>    or normalization/denormalization.  There's generally good support
>    for this sort of thing in OS-provided libraries (e.g., CoreFoundation
>    on MacOSX), and it's not yet clear whether it'd be best to duplicate
>    that in lisp or leverage library support.
> - Unicode-aware FFI functions and macros are still in a sort of
>    embryonic state if they're there at all; things like WITH-CSTRs
>    continue to exist (and continue to assume an 8-bit character
>    encoding.)
> - Characters that can't be represented in a fixed-width 8-bit
>    character encoding are replaced with #\Sub (= (code-char 26) =
>    ^Z) on output, so if you do something like:
>
> ? (format t "~a" #\u+20a0)
>
>    you might see a #\Sub character (however that's displayed on
>    the terminal device/Emacs buffer) or a Euro currency sign or
>    practically anything else (depending on how lisp is configured
>    to encode output to *TERMINAL-IO* and on how the terminal/Emacs
>    is configured to decode its input.
>
>    On output to streams with character encodings that can encode
>    the full range of Unicode - and on input from any stream -
>    "unencodable characters" are represented using the Unicode
>    #\Replacement_Character (= #\U+fffd); the presence of such a
>    character usually indicates that something got lost in translation
>    (data wasn't encoded properly or there was a bug in the decoding
>    process.)
> - Streams encoded in schemes which use more than one octet per code unit
>    (UTF-16, UTF-32, ...) and whose endianness is not explicit will be
>    written with a leading byte-order-mark character on (new) output and
>    will expect a BOM on input; if a BOM is missing from input data,
>    that data will be assumed to have been serialized in big-endian order.
>    Streams encoded in variants of these schemes whose endianness is
>    explicit (UTF-16BE, UCS-4LE, ...) will not have byte-order-marks
> written    on output or expected on input.  (UTF-8 streams might also
> contain    encoded byte-order-marks; even though UTF-8 uses a single
> octet per    code unit - and possibly more than one code unit per
> character - this    convention is sometimes used to advertise that the
> stream is UTF-8-    encoded.  The current implementation doesn't skip
> over/ignore leading    BOMs on UTF8-encoded input, but it probably
> should.)
>
>    If the preceding paragraph made little sense, a shorter version is
>    that sometimes the endianness of encoded data matters and there
>    are conventions for expressing the endianness of encoded data; I
>    think that OpenMCL gets it mostly right, but (even if that's true)
>    the real world may be messier.
> - By default, OpenMCL uses ISO-8859-1 encoding for *TERMINAL-IO*
>    and for all streams whose EXTERNAL-FORMAT isn't explicitly specified.
>    (ISO-8859-1 just covers the first 256 Unicode code points, where
>    the first 128 code points are equivalent to US-ASCII.)  That should
>    be pretty much equivalent to what previous versions (that only
>    supported 8-bit characters) did, but it may not be optimal for
>    users working in a particular locale.  The default for *TERMINAL-IO*
>    can be set via a command-line argument (see below) and this setting
>    persists across calls to SAVE-APPLICATION, but it's not clear that
>    there's a good way of setting it automatically (e.g., by checking
>    the POSIX "locale" settings on startup.)  Thing like POSIX locales
>    aren't always set correctly (even if they're set correctly for
>    the shell/terminal, they may not be set correctly when running
>    under Emacs ...) and in general, *TERMINAL-IO*'s notion of the
>    character encoding it's using and the "terminal device"/Emacs
> subprocess's    notion need to agree (and fonts need to contain glyphs
> for the    right set of characters) in order for everything to "work".
> Using    ISO-8859-1 as the default seemed to increase the likelyhood that
>    most things would work even if things aren't quite set up ideally
> (since no character translation occurs for 8-bit characters in
> ISO-8859-1.)
> - In non-Unicode-related news: the rewrite of OpenMCL's stream code
>    that was started a few months ago should now be complete (no more
>    "missing method for BASIC-STREAM" errors, or at least there shouldn't
>    be any.)
> - I haven't done anything with the Cocoa bridge/demos lately, besides
>    a little bit of smoke-testing.
>
> Some implementation/usage details:
>
> Character encodings.
>
> CHARACTER-ENCODINGs are objects (structures) that're named by keywords
> (:ISO-8859-1, :UTF-8, etc.).  The structures contain attributes of
> the encoding and functions used to encode/decode external data, but
> unless you're trying to define or debug an encoding there's little
> reason to know much about the CHARACTER-ENCODING objects and it's
> generally desirable (and sometimes necessary) to refer to the encoding
> via its name.
>
> Most encodings have "aliases"; the encoding named :ISO-8859-1 can
> also be referred to by the names :LATIN1 and :IBM819, among others.
> Where possible, the keywordized name of an encoding is equivalent
> to the preferred MIME charset name (and the aliases are all registered
> IANA charset names.)
>
> NIL is an alias for the :ISO-8859-1 encoding; it's treated a little
> specially by the I/O system.
>
> The function CCL:DESCRIBE-CHARACTER-ENCODINGS will write descriptions
> of all defined character encodings to *terminal-io*; these descriptions
> include the names of the encoding's aliases and a doc string which
> briefly describes each encoding's properties and intended use.
>
> Line-termination conventions.
>
> As noted in the <=1.0 documentation, the keywords :UNIX, :MACOS, and
> :INFERRED can be used to denote a stream's line-termination conventions.
> (:INFERRED is only useful for FILE-STREAMs that're open for :INPUT or
> :IO.)  In this release, the keyword :CR can also be used to indicate
> that a stream uses #\Return characters for line-termination (equivalent
> to :MACOS), the keyword :UNICODE denotes that the stream uses Unicode
># \Line_Separator characters to terminate lines, and the keywords :CRLF,
> :CP/M, :MSDOS, :DOS, and :WINDOWS all indicate that lines are terminated
> via a #\Return #\Linefeed sequence.
>
> In some contexts (when specifying EXTERNAL-FORMATs), the keyword :DEFAULT
> can also be used; in this case, it's equivalent to specifying the value
> of the variable CCL:*DEFAULT-LINE-TERMINATION*.  The initial value of
> this variable is :UNIX.
>
> Note that the set of keywords used to denote CHARACTER-ENCODINGs and
> the set of keywords used to denote line-termination conventions is
> disjoint: a keyword denotes at most a character encoding or a line
> termination convention, but never both.
>
> External-formats.
>
> EXTERNAL-FORMATs are also objects (structures) with two read-only
> fields that can be accessed via the functions
> EXTERNAL-FORMAT-LINE-TERMINATION and EXTERNAL-FORMAT-CHARACTER-ENCODING;
> the values of these fields are line-termination-convention-names and
> character-encoding names as described above.
>
> An EXTERNAL-FORMAT object via the function MAKE-EXTERNAL-FORMAT:
>
> MAKE-EXTERNAL-FORMAT &key domain character-encoding line-termination
>
> (Despite the function's name, it doesn't necessarily create a new,
> unique EXTERNAL-FORMAT object: two calls to MAKE-EXTERNAL-FORMAT
> with the same arguments made in the same dynamic environment will
> return the same (eq) object.)
>
> Both the :LINE-TERMINATION and :CHARACTER-ENCODING arguments default
> to :DEFAULT; if :LINE-TERMINATION is specified as or defaults to
> :DEFAULT, the value of CCL:*DEFAULT-LINE-TERMINATION* is used to
> provide a concrete value.
>
> When the :CHARACTER-ENCODING argument is specifed as/defaults to
> :DEFAULT, the concrete character encoding name that's actually used
> depends on the value of the :DOMAIN argument to MAKE-EXTERNAL-FORMAT.
> The :DOMAIN-ARGUMENT's value can be practically anything; when it's
> the keyword :FILE and the :CHARACTER-ENCODING argument's value is
> :DEFAULT, the concrete character encoding name that's used will be
> the value of the variable CCL:*DEFAULT-FILE-CHARACTER-ENCODING*; the
> initial value of this variable is NIL (which is an alias for :ISO-8859-1).
> If the value of the :DOMAIN argument is :SOCKET and the
> :CHARACTER-ENCODING argument's value is :DEFAULT, the value of
> CCL:*DEFAULT-SOCKET-CHARACTER-ENCODING* is used as a concrete character
> encoding name.  The initial value of
> CCL:*DEFAULT-SOCKET-CHARACTER-ENCODING* is NIL, again denoting the
> :ISO-8859-1 encoding.
> If the value of the :DOMAIN argument is anything else, :ISO-8859-1 is
> also used (but there's no way to override this.)
>
> The result of a call to MAKE-EXTERNAL-FORMAT can be used as the value
> of the :EXTERNAL-FORMAT argument to OPEN, LOAD, COMPILE-FILE, and
> MAKE-SOCKET; it's also possible to use a few shorthand constructs
> in these contexts:
>
> * if ARG is unspecified or specified as :DEFAULT, the value of the
>    variable CCL:*DEFAULT-EXTERNAL-FORMAT* is used.  Since the value
>    of this variable has historically been used to name a default
>    line-termination convention, this case effectively falls into
>    the next one:
> * if ARG is a keyword which names a concrete line-termination convention,
>    an EXTERNAL-FORMAT equivalent to the result of calling
>    (MAKE-EXTERNAL-FORMAT :line-termination ARG)
>     will be used
> * if ARG is a keyword which names a character encoding, an EXTERNAL-FORMAT
>    equvalent to the result of calling
>    (MAKE-EXTERNAL-FORMAT :character-encoding ARG)
>    will be used
> * if ARG is a list, the result of (APPLY #'MAKE-EXTERNAL-FORMAT ARG)
>    will be used
>
> (When MAKE-EXTERNAL-FORMAT is called to create an EXTERNAL-FORMAT
> object from one of these shorthand designators, the value of the
> :DOMAIN keyword argument is :FILE for OPEN,LOAD, and COMPILE-FILE
> and :SOCKET for MAKE-SOCKET.)
>
> STREAM-EXTERNAL-FORMAT.
> The CL function STREAM-EXTERNAL-FORMAT - which is portably defined
> on FILE-STREAMs - can be applied to any open stream in this release
> and will return an EXTERNAL-FORMAT object when applied to an open
> CHARACTER-STREAM. For open CHARACTER-STREAMs (other than STRING-STREAMs),
> SETF can be used with STREAM-EXTERNAL-FORMAT to change the stream's
> character encoding, line-termination, or both.
>
> If a "shorthand" external-format designator is used in a call to
> (SETF STREAM-EXTERNAL-FORMAT), the "domain" used to construct an
> EXTERNAL-FORMAT is derived from the class of the stream in the
> obvious way (:FILE for FILE-STREAMs, :SOCKET for ... well, for
> sockets ...)
>
> Note that the effect or doing something like:
>
> (let* ((s (open "foo" ... :external-format :utf-8)))
>    ...
>    (unread-char ch s)
>    (eetf (stream-external-format s) :us-ascii)
>    (read-char s))
>
> might or might not be what was intended.  The current behavior is
> that the call to READ-CHAR will return the previously unread character
> CH, which might surprise any code which assumes that the READ-CHAR
> will return something encodable in 7 or 8 bits.  Since functions
> like READ may call UNREAD-CHAR "behind your back", it may or may
> not be obvious that this has even occurred; the best approach to
> dealing with this issue might be to avoid using READ or explicit
> calls to UNREAD-CHAR when processing content encoded in multiple
> external formats.
>
> There's a similar issue with "bivalent" streams (sockets) which
> can do both character and binary I/O with an :ELEMENT-TYPE of
> (UNSIGNED-BYTE 8).  Historically, the sequence:
>
>     (unread-char ch s)
>     (read-byte s)
>
> caused the READ-BYTE to return (CHAR-CODE CH); that made sense
> when everything was implicitly encoded as :ISO-8859-1, but may not
> make any sense anymore.  (The only thing that seems to make sense
> in that case is to clear the unread character and read the next
> octet; that's implemented in some cases but I don't think that
> things are always handled consistently.)
>
> Command-line argument for specifying the character encoding to
> be used for *TERMINAL-IO*.
>
> Shortly after a saved lisp image starts up, it creates the standard
> CL streams (like *STANDARD-OUTPUT*, *TERMINAL-IO*, *QUERY-IO*, etc.);
> most of these streams are usually SYNONYM-STREAMS which reference
> the TWO-WAY-STREAM *TERMINAL-IO*, which is itself comprised of
> a pair of CHARACTER-STREAMs.  The character encoding used for
> any CHARACTER-STREAMs created during this process is the one
> named by the value of the variable CCL:*TERMINAL-CHARACTER-ENCODING-NAME*;
> this value is initially NIL.
>
> The -K or --terminal-encoding command-line argument can be used to
> set the value of this variable (the argument is processed before the
> standard streams are created.)  The string which is the value of
> the -K/--terminal-encoding argument is uppercased and interned in
> the KEYWORD package; if an encoding named by that keyword exists,
> CCL:*TERMINAL-CHARACTER-ENCODING-NAME* is set to the name of that
> encoding.  For example:
>
> shell> openmcl -K utf-8
>
> will have the effect of making the standard CL streams use :UTF-8
> as their character encoding.
>
> (It's probably possible - but a bit awkward - to use (SETF
> EXTERNAL-FORMAT) from one's init file or --eval arguments or similar to
> change existing streams' character encodings; the hard/awkward parts of
> doing so include the difficulty of determining which standard streams are
> "real" character streams and which are aliases/composite streams.)
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel



--
Erik Pearson
Adaptations



More information about the Openmcl-devel mailing list