[Openmcl-devel] new images for testing

Gary Byers gb at clozure.com
Fri Apr 29 10:25:34 PDT 2005

There are new (050429) bleeding-edge images available in
ftp://clozure.com/pub/testing.  There are also some fairly new (050426 and
050420) interface (.cdb) files; this lisp needs interfaces that are at
least that recent.  (The differences between 050420 and 050426 are in
how some ObjC stuff is encoded; unless you plan on wrestling with GNUstep,
the (slightly) older interfaces should work fine under Linux.)

The FASL version changed (again); FASL files compiled by earlier images
won't work in this image and vice versa.

The biggest change (and the thing that needs pounding on the most) has
to do with how the EGC keeps track of "old" objects that're
destructively modified to point to "new" ones.  (GC literature tends
to call this "the write barrier".)  The old write barrier was imposed
in hardware (via write-protection); the new scheme tries to keep track
of things in software.  The old scheme seemed to incur a lot of OS
overhead (especially on Darwin/OSX), and hopefully this change will
reduce that substantially (there's at least one other source of that
OS overhead that I'm aware of.)

The EGC is off by default (mostly to make it easier to bootstrap
these changes); it can be turned on via:

? (egc t)

Setting a bit in a special variable whose address is known to the
lisp kernel enables some extra consistency checks in the GC:

? (setq ccl::*gc-event-status-bits*
         (logior ccl::*event-status-bits* (ash 1 ccl::$GC-EVENT-STATUS-BIT)))

The integrity checks in the GC often do more work than the GC itself does;
when the EGC is active, this means that instead of getting frequent, quick
GC invocations we get frequent, slow GC invocations.  The good news is that
GC-related problems are usually caught early, and it's often easier to
map back from the symptom to the actual bug in that case.

One of those consistency checks that's relevant here would be reported
(in the kernel debugger) as "Missing memoization in doubleword at
#xXXXXXXX"; this is the GC's quaint way of saying that there's a
problem in the write barrier (some old object has been destructively
modified to point to a new one, but this information hasn't been
recorded for the EGC's benefit.)

If some such consistency check fails, you'd basically want to note
whatever you can about what you were doing and kill the lisp session
('k' in the kernel debugger.)  I've run the current version under
fairly heavy stress without triggering this, though it's certainly
possible that problems remain.  There's some thread-related complexity
here: if thread A enters the GC while thread B is somewhere in the
middle of an operation that stores a new pointer into an old object,
the GC has to treat thread B's activity as if it happened atomically -
either hadn't yet stored the pointer or had stored the pointer and
set a bit in a bitmap to "memoize" the intergenerational reference.
That's -supposed- to work, but I haven't yet tested it thoroughly.

Aside from the EGC changes, the other recent feature of note has to
do with how the Cocoa bridge learns about ObjC classes and methods;
it now bases its model on information derived from the header files
and stored in the interface database.  This seems to allow the demo
IDE to run under Tiger as well as Panther (I haven't checked Jaguar
yet, and the call to #_GetCurrentEventQueue that causes problems
on 10.3.9 is still present and may need to be commented out.)  The
bridge knows about substantially fewer ObjC classes than actually
exist, and may get confused if it encounters a "private" class or
an instance of one.  This doesn't seem to happen in the demo IDE,
but -may- happen in arbitrary code that uses the bridge.

So, if you find yourself with some free time and feel like testing
this, it'd be helpful to either:

  - leave the EGC off and just pound on things in general, especially
    if you have access to Tiger

  - turn the EGC on, enable GC integrity checks, and cons a lot
    (being prepared for a possible trip to the kernel debugger.)

In the latter case, if you drop into the kernel debugger with a
"missing memoization ..." message, it'd be helpful to know (a)
that that happened at all and (b) as much as possible about the
sorts of things your code was doing when it happened.  (It's
sometimes very hard to reproduce this sort of problem, but general
hints can be helpful.)

More information about the Openmcl-devel mailing list