[Openmcl-devel] Another linux86-32: signed doubleword parameters.

Gary Byers gb at clozure.com
Tue Oct 14 08:41:17 PDT 2008


Is the GC integrity checking bit set ?

The bugs in mark_xp()/forward_xp() can't directly cause something like:

>   Error: (272629760 . 301989888) is not a Lisp string or pointer.

e.g., a register that doesn't get processed correctly can't change
from a string or pointer to a CONS (it'd still have the same tag bits,
but the upper 29 bits would be pointing to nonsense.)

There are probably lots of scenarios that -can- lead to that sort of thing.
One canonical example involves unsafe code storing outside the bounds
of an object.  Suppose that memory contains

|string-header| bytes-of-string ..........| "ABC" . "DEF"|

e.g., a string followd by a CONS of two strings.  (CDRs happen to precede CARs,
for no particular reason.)

If S is the string and something does:

(declare (optimize (speed 3) (safety 0)))

(setf (schar s just-past-the-end) some-char)

then one or both of the strings in the cons would suddenly become ...
something else, and we'd probably get a weird error if we tried to
treat the CAR or CDR of that cons as a string.  (A secondary effect
of something like the register/node-mask bug could also lead to this:
a pointer tagged as a CONS would still be tagged as a CONS if it was
in a register and the GC misinterpreted the node-regs-mask, but the
CAR and CDR of that CONS could be practically anything, including
things that look kind of like tagged lisp objects but aren't.)

The integrity checks usually catch this kind of memory corruption.
If they're enabled and things were OK after the most recent GC,
then ... well, the GC isn't totally exonerated, but it's a less
likely suspect.   Those checks aren't perfect, but they do catch
a lot of inconsistencies.

If you can get your hands on something that the lisp complains about,
calling CCL::DBG with that thing as an argument will cause the
kernel debugger to be entered with that thing in EBX.  Examining
memory "near" that thing might offer a clue about what's doing
the clobbering.

Unsafe Lisp code storing past the end of an object isn't the only
possible culprit, but something storing outside if the bounds of
an object (and clobbering something else in memory) is about the
only way for something's tag to change, which seems to be happening
in several of the cases you quote below.

On Tue, 14 Oct 2008, David Brown wrote:

> On Tue, Oct 14, 2008 at 12:30:03AM -0600, Gary Byers wrote:
>
>> Aha! The bits in the node-regs-mask are supposed to be ordered by the
>> register's "ordinal" number (e.g., the value used to encode the register
>> in an instruction.)
>> 
>> Assuming that I didn't botch something, this should be fixed in
>> r11087.
>
> Well, I haven't been able to get that particular failure to happen
> again, but unfortunately, I'm still seeing other problems.
>
> Once:
>   Overran end of memory range: start = 0x14f2a308, end = 0x15040000,
>   prev = 0x14f2a310, current = 0x181ff7f0
>
> Usually, it's just a scrambled object:
>
>   Error: (272629760 . 301989888) is not a Lisp string or pointer.
>
> (The error is from cffi, but it's not expecting a cons with two
> unusual numbers in it).
>
>   Error: value (#\H . 0) is not of the expected type (SATISFIES
>   CCL::PROPER-LIST-P).
>
> Unfortunately, there doesn't appear to be a pattern here, and most of
> them aren't caught until something tries to examine the corrupted
> object.
>
> David
>
>



More information about the Openmcl-devel mailing list