[Openmcl-devel] new snapshot tarballs (finally)
Erik Pearson
erik at adaptations.com
Wed Oct 25 11:24:39 PDT 2006
Great!
It compiled my stuff with no problems... but then I ran into:
Undefined function CCL::%BIVALENT-IOBLOCK-WRITE-U8-BYTE
It does not appear to exist where it might be expected, in l1-streams.lisp.
In fact, none of the bivalent write functions appear to be there...
Erik.
--On October 24, 2006 1:39:41 PM -0600 Gary Byers <gb at clozure.com> wrote:
> There are now new (061024) tar archives for DarwinPPC (32 and 64-bit),
> LinuxPPC (32 and 64-bit), LinuxX8664 (64-bit), DarwinX8664 (64-bit), and
> FreeBSDX8664 (64-bit) in ftp://clozure.com/pub/testing
>
> These archives are all self-contained (contain sources, binaries,
> interfaces, the CVS ChangeLog, and release notes); the release-notes
> entry for this snapshot is included below.
>
> I'm sorry that it's taken so long to get things back in synch; now that
> they are, I hope that they'll stay that way for a while and that people
> who want to track the bleeding edge will have an easier time doing so.
>
> Please report bugs!
>
> OpenMCL 1.1-pre-061024
> - The FASL version changed (old FASL files won't work with this
> lisp version), as did the version information which tries to
> keep the kernel in sync with heap images.
> - Linux users: it's possible (depending on the distribution that
> you use) that the lisp kernel will claim to depend on newer
> versions of some shared libraries than the versions that you
> have installed. This is mostly just an artifact of the GNU
> linker, which adds version information to dependent library
> references even though no strong dependency exists. If you
> run into this, you should be able to simply cd to the appropriate
> build directory under ccl/lisp-kernel and do a "make".
> - There's now a port of OpenMCL to FreeBSD/amd64; it claims to be
> of beta quality. (The problems that made it too unstable
> to release as of a few months ago have been fixed; I stil run
> into occasional FreeBSD-specific issues, and some such issues
> may remain.)
> - The Darwin X8664 port is a bit more stable (no longer generates
> obscure "Trace/BKPT trap" exits or spurious-looking FP exceptions.)
> I'd never want to pass up a chance to speak ill of Mach, but both
> of these bugs seemed to be OpenMCL problems rather than Mach kernel
> problems, as I'd previously more-or-less assumed.
> - I generally don't use SLIME with OpenMCL, but limited testing
> with the 2006-04-20 verson of SLIME seems to indicate that no
> changes to SLIME are necessary to work with this version.
> - CHAR-CODE-LIMIT is now #x110000, which means that all Unicode
> characters can be directly represented. There is one CHARACTER
> type (all CHARACTERs are BASE-CHARs) and one string type (all
> STRINGs are BASE-STRINGs.) This change (and some other changes
> in the compiler and runtime) made the heap images a few MB larger
> than in previous versions.
> - As of Unicode 5.0, only about 100,000 of 1114112./#x110000 CHAR-CODEs
> are actually defined; the function CODE-CHAR knows that certain
> ranges of code values (notably #xd800-#xddff) will never be valid
> character codes and will return NIL for arguments in that range,
> but may return a non-NIL value (an undefined/non-standard CHARACTER
> object) for other unassigned code values.
> - The :EXTERNAL-FORMAT argument to OPEN/LOAD/COMPILE-FILE has been
> extended to allow the stream's character encoding scheme (as well
> as line-termination conventions) to be specified; see more
> details below. MAKE-SOCKET has been extended to allow an
> :EXTERNAL-FORMAT argument with similar semantics.
> - Strings of the form "u+xxxx" - where "x" is a sequence of one
> or more hex digits- can be used as as character names to denote
> the character whose code is the value of the string of hex digits.
> (The + character is actually optional, so #\u+0020, #\U0020, and
> #\U+20 all refer to the #\Space character.) Characters with codes
> in the range #xa0-#x7ff (IIRC) also have symbolic names (the
> names from the Unicode standard with spaces replaced with underscores),
> so #\Greek_Capital_Letter_Epsilon can be used to refer to the character
> whose CHAR-CODE is #x395.
> - The line-termination convention popularized with the CP/M operating
> system (and used in its descendants) - e.g., CRLF - is now supported,
> as is the use of Unicode #\Line_Separator (#\u+2028).
> - About 15-20 character encoding schemes are defined (so far); these
> include UTF-8/16/32 and the big-endian/little-endian variants of
> the latter two and ISO-8859-* 8-bit encodings. (There is not
> yet any support for traditional (non-Unicode) ways of externally
> encoding characters used in Asian languages, support for legacy
> MacOS encodings, legacy Windows/DOS/IBM encodings, ...) It's hoped
> that the existing infrastructure will handle most (if not all) of
> what's missing; that may not be the case for "stateful" encodings
> (where the way that a given character is encoded/decoded depend
> on context, like the value of the preceding/following character.)
> - There isn't yet any support for Unicode-aware collation (CHAR>
> and related CL functions just compare character codes, which
> can give meaningless results for non-STANDARD-CHARs), case-inversion,
> or normalization/denormalization. There's generally good support
> for this sort of thing in OS-provided libraries (e.g., CoreFoundation
> on MacOSX), and it's not yet clear whether it'd be best to duplicate
> that in lisp or leverage library support.
> - Unicode-aware FFI functions and macros are still in a sort of
> embryonic state if they're there at all; things like WITH-CSTRs
> continue to exist (and continue to assume an 8-bit character
> encoding.)
> - Characters that can't be represented in a fixed-width 8-bit
> character encoding are replaced with #\Sub (= (code-char 26) =
> ^Z) on output, so if you do something like:
>
> ? (format t "~a" #\u+20a0)
>
> you might see a #\Sub character (however that's displayed on
> the terminal device/Emacs buffer) or a Euro currency sign or
> practically anything else (depending on how lisp is configured
> to encode output to *TERMINAL-IO* and on how the terminal/Emacs
> is configured to decode its input.
>
> On output to streams with character encodings that can encode
> the full range of Unicode - and on input from any stream -
> "unencodable characters" are represented using the Unicode
> #\Replacement_Character (= #\U+fffd); the presence of such a
> character usually indicates that something got lost in translation
> (data wasn't encoded properly or there was a bug in the decoding
> process.)
> - Streams encoded in schemes which use more than one octet per code unit
> (UTF-16, UTF-32, ...) and whose endianness is not explicit will be
> written with a leading byte-order-mark character on (new) output and
> will expect a BOM on input; if a BOM is missing from input data,
> that data will be assumed to have been serialized in big-endian order.
> Streams encoded in variants of these schemes whose endianness is
> explicit (UTF-16BE, UCS-4LE, ...) will not have byte-order-marks
> written on output or expected on input. (UTF-8 streams might also
> contain encoded byte-order-marks; even though UTF-8 uses a single
> octet per code unit - and possibly more than one code unit per
> character - this convention is sometimes used to advertise that the
> stream is UTF-8- encoded. The current implementation doesn't skip
> over/ignore leading BOMs on UTF8-encoded input, but it probably
> should.)
>
> If the preceding paragraph made little sense, a shorter version is
> that sometimes the endianness of encoded data matters and there
> are conventions for expressing the endianness of encoded data; I
> think that OpenMCL gets it mostly right, but (even if that's true)
> the real world may be messier.
> - By default, OpenMCL uses ISO-8859-1 encoding for *TERMINAL-IO*
> and for all streams whose EXTERNAL-FORMAT isn't explicitly specified.
> (ISO-8859-1 just covers the first 256 Unicode code points, where
> the first 128 code points are equivalent to US-ASCII.) That should
> be pretty much equivalent to what previous versions (that only
> supported 8-bit characters) did, but it may not be optimal for
> users working in a particular locale. The default for *TERMINAL-IO*
> can be set via a command-line argument (see below) and this setting
> persists across calls to SAVE-APPLICATION, but it's not clear that
> there's a good way of setting it automatically (e.g., by checking
> the POSIX "locale" settings on startup.) Thing like POSIX locales
> aren't always set correctly (even if they're set correctly for
> the shell/terminal, they may not be set correctly when running
> under Emacs ...) and in general, *TERMINAL-IO*'s notion of the
> character encoding it's using and the "terminal device"/Emacs
> subprocess's notion need to agree (and fonts need to contain glyphs
> for the right set of characters) in order for everything to "work".
> Using ISO-8859-1 as the default seemed to increase the likelyhood that
> most things would work even if things aren't quite set up ideally
> (since no character translation occurs for 8-bit characters in
> ISO-8859-1.)
> - In non-Unicode-related news: the rewrite of OpenMCL's stream code
> that was started a few months ago should now be complete (no more
> "missing method for BASIC-STREAM" errors, or at least there shouldn't
> be any.)
> - I haven't done anything with the Cocoa bridge/demos lately, besides
> a little bit of smoke-testing.
>
> Some implementation/usage details:
>
> Character encodings.
>
> CHARACTER-ENCODINGs are objects (structures) that're named by keywords
> (:ISO-8859-1, :UTF-8, etc.). The structures contain attributes of
> the encoding and functions used to encode/decode external data, but
> unless you're trying to define or debug an encoding there's little
> reason to know much about the CHARACTER-ENCODING objects and it's
> generally desirable (and sometimes necessary) to refer to the encoding
> via its name.
>
> Most encodings have "aliases"; the encoding named :ISO-8859-1 can
> also be referred to by the names :LATIN1 and :IBM819, among others.
> Where possible, the keywordized name of an encoding is equivalent
> to the preferred MIME charset name (and the aliases are all registered
> IANA charset names.)
>
> NIL is an alias for the :ISO-8859-1 encoding; it's treated a little
> specially by the I/O system.
>
> The function CCL:DESCRIBE-CHARACTER-ENCODINGS will write descriptions
> of all defined character encodings to *terminal-io*; these descriptions
> include the names of the encoding's aliases and a doc string which
> briefly describes each encoding's properties and intended use.
>
> Line-termination conventions.
>
> As noted in the <=1.0 documentation, the keywords :UNIX, :MACOS, and
> :INFERRED can be used to denote a stream's line-termination conventions.
> (:INFERRED is only useful for FILE-STREAMs that're open for :INPUT or
> :IO.) In this release, the keyword :CR can also be used to indicate
> that a stream uses #\Return characters for line-termination (equivalent
> to :MACOS), the keyword :UNICODE denotes that the stream uses Unicode
># \Line_Separator characters to terminate lines, and the keywords :CRLF,
> :CP/M, :MSDOS, :DOS, and :WINDOWS all indicate that lines are terminated
> via a #\Return #\Linefeed sequence.
>
> In some contexts (when specifying EXTERNAL-FORMATs), the keyword :DEFAULT
> can also be used; in this case, it's equivalent to specifying the value
> of the variable CCL:*DEFAULT-LINE-TERMINATION*. The initial value of
> this variable is :UNIX.
>
> Note that the set of keywords used to denote CHARACTER-ENCODINGs and
> the set of keywords used to denote line-termination conventions is
> disjoint: a keyword denotes at most a character encoding or a line
> termination convention, but never both.
>
> External-formats.
>
> EXTERNAL-FORMATs are also objects (structures) with two read-only
> fields that can be accessed via the functions
> EXTERNAL-FORMAT-LINE-TERMINATION and EXTERNAL-FORMAT-CHARACTER-ENCODING;
> the values of these fields are line-termination-convention-names and
> character-encoding names as described above.
>
> An EXTERNAL-FORMAT object via the function MAKE-EXTERNAL-FORMAT:
>
> MAKE-EXTERNAL-FORMAT &key domain character-encoding line-termination
>
> (Despite the function's name, it doesn't necessarily create a new,
> unique EXTERNAL-FORMAT object: two calls to MAKE-EXTERNAL-FORMAT
> with the same arguments made in the same dynamic environment will
> return the same (eq) object.)
>
> Both the :LINE-TERMINATION and :CHARACTER-ENCODING arguments default
> to :DEFAULT; if :LINE-TERMINATION is specified as or defaults to
> :DEFAULT, the value of CCL:*DEFAULT-LINE-TERMINATION* is used to
> provide a concrete value.
>
> When the :CHARACTER-ENCODING argument is specifed as/defaults to
> :DEFAULT, the concrete character encoding name that's actually used
> depends on the value of the :DOMAIN argument to MAKE-EXTERNAL-FORMAT.
> The :DOMAIN-ARGUMENT's value can be practically anything; when it's
> the keyword :FILE and the :CHARACTER-ENCODING argument's value is
> :DEFAULT, the concrete character encoding name that's used will be
> the value of the variable CCL:*DEFAULT-FILE-CHARACTER-ENCODING*; the
> initial value of this variable is NIL (which is an alias for :ISO-8859-1).
> If the value of the :DOMAIN argument is :SOCKET and the
> :CHARACTER-ENCODING argument's value is :DEFAULT, the value of
> CCL:*DEFAULT-SOCKET-CHARACTER-ENCODING* is used as a concrete character
> encoding name. The initial value of
> CCL:*DEFAULT-SOCKET-CHARACTER-ENCODING* is NIL, again denoting the
> :ISO-8859-1 encoding.
> If the value of the :DOMAIN argument is anything else, :ISO-8859-1 is
> also used (but there's no way to override this.)
>
> The result of a call to MAKE-EXTERNAL-FORMAT can be used as the value
> of the :EXTERNAL-FORMAT argument to OPEN, LOAD, COMPILE-FILE, and
> MAKE-SOCKET; it's also possible to use a few shorthand constructs
> in these contexts:
>
> * if ARG is unspecified or specified as :DEFAULT, the value of the
> variable CCL:*DEFAULT-EXTERNAL-FORMAT* is used. Since the value
> of this variable has historically been used to name a default
> line-termination convention, this case effectively falls into
> the next one:
> * if ARG is a keyword which names a concrete line-termination convention,
> an EXTERNAL-FORMAT equivalent to the result of calling
> (MAKE-EXTERNAL-FORMAT :line-termination ARG)
> will be used
> * if ARG is a keyword which names a character encoding, an EXTERNAL-FORMAT
> equvalent to the result of calling
> (MAKE-EXTERNAL-FORMAT :character-encoding ARG)
> will be used
> * if ARG is a list, the result of (APPLY #'MAKE-EXTERNAL-FORMAT ARG)
> will be used
>
> (When MAKE-EXTERNAL-FORMAT is called to create an EXTERNAL-FORMAT
> object from one of these shorthand designators, the value of the
> :DOMAIN keyword argument is :FILE for OPEN,LOAD, and COMPILE-FILE
> and :SOCKET for MAKE-SOCKET.)
>
> STREAM-EXTERNAL-FORMAT.
> The CL function STREAM-EXTERNAL-FORMAT - which is portably defined
> on FILE-STREAMs - can be applied to any open stream in this release
> and will return an EXTERNAL-FORMAT object when applied to an open
> CHARACTER-STREAM. For open CHARACTER-STREAMs (other than STRING-STREAMs),
> SETF can be used with STREAM-EXTERNAL-FORMAT to change the stream's
> character encoding, line-termination, or both.
>
> If a "shorthand" external-format designator is used in a call to
> (SETF STREAM-EXTERNAL-FORMAT), the "domain" used to construct an
> EXTERNAL-FORMAT is derived from the class of the stream in the
> obvious way (:FILE for FILE-STREAMs, :SOCKET for ... well, for
> sockets ...)
>
> Note that the effect or doing something like:
>
> (let* ((s (open "foo" ... :external-format :utf-8)))
> ...
> (unread-char ch s)
> (eetf (stream-external-format s) :us-ascii)
> (read-char s))
>
> might or might not be what was intended. The current behavior is
> that the call to READ-CHAR will return the previously unread character
> CH, which might surprise any code which assumes that the READ-CHAR
> will return something encodable in 7 or 8 bits. Since functions
> like READ may call UNREAD-CHAR "behind your back", it may or may
> not be obvious that this has even occurred; the best approach to
> dealing with this issue might be to avoid using READ or explicit
> calls to UNREAD-CHAR when processing content encoded in multiple
> external formats.
>
> There's a similar issue with "bivalent" streams (sockets) which
> can do both character and binary I/O with an :ELEMENT-TYPE of
> (UNSIGNED-BYTE 8). Historically, the sequence:
>
> (unread-char ch s)
> (read-byte s)
>
> caused the READ-BYTE to return (CHAR-CODE CH); that made sense
> when everything was implicitly encoded as :ISO-8859-1, but may not
> make any sense anymore. (The only thing that seems to make sense
> in that case is to clear the unread character and read the next
> octet; that's implemented in some cases but I don't think that
> things are always handled consistently.)
>
> Command-line argument for specifying the character encoding to
> be used for *TERMINAL-IO*.
>
> Shortly after a saved lisp image starts up, it creates the standard
> CL streams (like *STANDARD-OUTPUT*, *TERMINAL-IO*, *QUERY-IO*, etc.);
> most of these streams are usually SYNONYM-STREAMS which reference
> the TWO-WAY-STREAM *TERMINAL-IO*, which is itself comprised of
> a pair of CHARACTER-STREAMs. The character encoding used for
> any CHARACTER-STREAMs created during this process is the one
> named by the value of the variable CCL:*TERMINAL-CHARACTER-ENCODING-NAME*;
> this value is initially NIL.
>
> The -K or --terminal-encoding command-line argument can be used to
> set the value of this variable (the argument is processed before the
> standard streams are created.) The string which is the value of
> the -K/--terminal-encoding argument is uppercased and interned in
> the KEYWORD package; if an encoding named by that keyword exists,
> CCL:*TERMINAL-CHARACTER-ENCODING-NAME* is set to the name of that
> encoding. For example:
>
> shell> openmcl -K utf-8
>
> will have the effect of making the standard CL streams use :UTF-8
> as their character encoding.
>
> (It's probably possible - but a bit awkward - to use (SETF
> EXTERNAL-FORMAT) from one's init file or --eval arguments or similar to
> change existing streams' character encodings; the hard/awkward parts of
> doing so include the difficulty of determining which standard streams are
> "real" character streams and which are aliases/composite streams.)
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
--
Erik Pearson
Adaptations
More information about the Openmcl-devel
mailing list