[Openmcl-devel] Another linux86-32: signed doubleword parameters.

Gary Byers gb at clozure.com
Mon Oct 13 05:20:04 PDT 2008


Very generally, that kind of nonsensical error suggests a GC bug (where
the GC moved some object in memory but didn't correctly update some
reference to that object.)  That doesn't help us much: that could
mean a literal bug in the GC, or it could mean the something else
is corrupting memory and the GC's just fouling things up worse.
(I suppose that it could also be a bug in the compiler or elsewhere
that's not handling the stack correctly.

When sockets are involved, are threads also involved ?  (Threads
generally offer more ways for things to go wrong; sockets are mostly
just streams and are probably no more likely to scribble over memory
than other streams are.)

Until last week, the 32-bit x86 ports were running with GC-integrity
checking enabled (which basically means that some fairly rigorous
consistency checks are performed before and after each GC.)  That
tends to slow things down a lot and hadn't caught anything in a
few months, so it seemed to make sense to turn it off.  That sort
of thing never fails to be ill-timed.

If GC-integrity-checking is enabled, the GC will break into the
kernel debugger and report something cryptic, like:

"object at #x1234567 has bogus header"

There are probably around a dozen equally cryptic things that
can be reported, and (unless you want to try to debug the GC)
there's generally nothing user-level that can be done to recover.
Continuing from the kernel debugger sometimes works (in the sense
that the inconcisistency turned out to be in some are of memory
that was garbage), but may just as easily be the first of a large
number of problems.  Running these checks can help us find problems
be reporting the symptoms early (it might be several GCs before
a problem causes a lisp-visible error), and makes tracking down
the problem somewhat easier.

Whether or not the integrity checks are performed is controlled
by a bit in the fixnum which is the global value of the variable
CCL::*GC-EVENT-STATUS-BITS*; doing:

? (setq ccl::*gc-event-status-bits*
              (logior (ash 1 ccl::$gc-integrity-check-bit) ; aka 4
              ccl::*gc-event-status-bits*))

will turn integrity-checks on.  (You can also define -DGC_INTEGRITY_CHECKING
in the CDEFINES variable in the appropriate kernel Makfile; that's commented
out of the linuxx8632 Makefile at the moment.)

It's sometimes very hard to debug this kind of problem, and even harder to
explain to someone else how to do so.  It -might- be interesting to see
what gets reported if you run your code with integrity checking on, but
in practice it's probably necessary for us to either run that code or
run something similar that triggers the same.  If you can send us your
code, we can try to figure out what's going on.

On Sat, 11 Oct 2008, David Brown wrote:

> On Sat, Oct 11, 2008 at 04:59:07PM -0400, R. Matthew Emerson wrote:
>
>>>> Error: value 1073741824 is not of the expected type (UNSIGNED-BYTE 64).
>>>> While executing: BROKEN, in process listener(1).
>>
>> I think this should be fixed in the trunk now.
>
> Seems to work.  At least it gets me past that.  Actually, the
> non-socket part of my test suite now passes.
>
> It fails once sockets get involved, and there isn't a lot of
> consistency to what's going on.  I seem to get things like:
>
>    Signal: Bug (probably): can't determine class of #<BOGUS object @ #x15290B7E>
>
> I've also seen this:
>    Error: value (0 . 0) is not of the expected type CCL::IVECTOR
>
> Where the (0 . 0) should have come from a call to (make-sequence
> '(vector (unsigned-byte 8)) size).
>
> Any ideas on how to track either of these down?  All of these tests
> work fine on ccl64.
>
> David
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list