[Openmcl-devel] Elephant and CCL (Linux, 32-bit)

Gary Byers gb at clozure.com
Tue Apr 13 15:01:04 PDT 2010



On Tue, 13 Apr 2010, Ian Eslick wrote:

> I've had no problem running CCL and Elephant on 64-bit systems and Elephant and SBCL on 32-bit systems.  I haven't had the occasion to do much testing on 32-bit systems until now.  There is some sort of pointer shenanigans happening in the FFI-based serializer.
>
> CCL 1.3 release:
>
> Read error between positions 3064 and 6299 in /home/eslick/lisp-dist/libs/uffi-1.8.6/src/strings.lisp.
>> Error: Reader error: No external symbol named "ENCODE-STRING-TO-OCTETS" in package #<Package "CCL"> .
>> While executing: CCL::%PARSE-TOKEN, in process listener(1).
>> Type :GO to continue, :POP to abort, :R for a list of available restarts.
>> If continued: Use the internal symbol CCL::ENCODE-STRING-TO-OCTETS
>> Type :? for other options.
>

ENCODE-STRING-TO-OCTETS likely existed in some form in 1.3 (and earlier), but the 
symbol wasn't exported and the function wasn't documented until 1.4.  I remember
fiddling around with it for quite a while before being reasonably comfortable
with the current interface and behavior; I don't remember offhand when the
fiddling stopped relative to the 1.3 release.

If an old release doesn't publicly offer functionality that you need, you can
either:

   - try to evaluate whether it offers adequate private support in some maintainable
     way.  (I don't remember whether there were actually any changes to the
     function in question between 1.3 and 1.4; there certainly were lots of changes
     before it became public.)
   - not support the old release in your application.



> Missing API call that Elephant depends on:
>
>
> CCL 1.4 release:
>
>> Error: Fault during read of memory address #x-471FE404
>> While executing: CCL::EQ-HASH-FIND, in process listener(1).
>> Type :POP to abort, :R for a list of available restarts.
>> Type :? for other options.
> 1 > :b
> *(3AEA58) : 0 (EQ-HASH-FIND #<HASH-TABLE :TEST EQ size 892/199404 #x152B0AB6> #<BOGUS object @ #x10C839>) 584
> (3AEA94) : 1 (GETHASH #<BOGUS object @ #x10C839> #<HASH-TABLE :TEST EQ size 892/199404 #x152B0AB6> NIL) 279
> (3AEABC) : 2 (FUNCALL #'#<(:INTERNAL ELEPHANT-SERIALIZER2::%SERIALIZE ELEPHANT-SERIALIZER2::SERIALIZE)> #<BOGUS object @ #x10C839>) 2007
> (3AEADC) : 3 (SERIALIZE #<BOGUS object @ #x10C839> #S(ELEPHANT-MEMUTIL:BUFFER-STREAM :BUFFER #<A Foreign Pointer #xB76148D0> :SIZE 1 ...) #<BDB-STORE-CONTROLLER /home/eslick/lisp-dist/libs/elephant-1.0/tests/testdb/>) 343
> (3AEAF8) : 4 (SERIALIZE #<BOGUS object @ #x10C839> #S(ELEPHANT-MEMUTIL:BUFFER-STREAM :BUFFER #<A Foreign Pointer #xB76148D0> :SIZE 1 ...) #<BDB-STORE-CONTROLLER /home/eslick/lisp-dist/libs/elephant-1.0/tests/testdb/>) 247
> (3AEB30) : 5 (FUNCALL #'#<Anonymous Function #x15491B2E> (#<BOGUS object @ #x10C839>)) 279
>

Something that prints as a #<BOGUS object> looks superficially like a real lisp
object (has reasonable tag bits) but fails some consistency checks (the address
it refers to is in some way invalid, that address should contain a header but
doesn't, or shouldn't contain a header but does, etc.)  It's usually the result
of one or more of:

   - violating a DYNAMIC-EXTENT declaration (referring to a stack-allocated object
     after it's been deallocated)
   - low-level mucking about (storing outside the bounds end of an object in
     incorrect and unsafe code)
   - the GC messing things up, either because of a GC bug or as a symptom of
     some kind of low-level mucking about.

The bactrace shows a bogus object being passed around until something tries to
do GETHASH on it and gets a memory fault.  The first step in trying to figure
out why the fault happened would seem to be to try to determine where the bogus
object is coming from.

FWIW, the printed representation of that object (the address #x10c839) suggests
that it was once a CONS.

>
> CCL 1.5 current:
>
>> Error: Fault during read of memory address #x194B3AEC
>> While executing: CCL::%BIGNUM-SIGN, in process listener(1).
>> Type :POP to abort, :R for a list of available restarts.
>> Type :? for other options.
> 1> :b
> *(53EAE8) : 0 (%BIGNUM-SIGN ???) 12
> (53EB2C) : 1 (FUNCALL #'#<(:INTERNAL GEN-INTEGER)>) 159
> (53EB3C) : 2 (FUNCALL #<CCL:COMPILED-LEXICAL-CLOSURE (:INTERNAL GEN-INTEGER) #x18CB21A6>) 175
> (53EB54) : 3 (PERFORM-RANDOM-TESTING/RUN-ONCE (#<CCL:COMPILED-LEXICAL-CLOSURE # #x18CB21A6>) #<Anonymous Function #x18CB2736>) 231
> (53EB84) : 4 (PERFORM-RANDOM-TESTING (#<CCL:COMPILED-LEXICAL-CLOSURE # #x18CB21A6>) #<Anonymous Function #x18CB2736>) 319
> (53EBA4) : 5 (FUNCALL #'#<(:INTERNAL IT.BESE.FIVEAM::RUN-IT (IT.BESE.FIVEAM::RUN-TEST-LAMBDA (IT.BESE.FIVEAM::TEST-CASE)))>) 1239
>

Some function defined inside GEN-INTEGER called a function named
%BIGNUM-SIGN, which apparently didn't like its argument very much.
Both of those function names seem to be in the current package;
there's a function named %BIGNUM-SIGN in the CCL package, whose
implementation is word-size dependent.  CCL::%BIGNUM-SIGN would
certainly segfault if its argument isn't a BIGNUM; it's not intended
to be user-callable.

> Help in tracking this down would be helpful.

If your code is dealing with low-level details of object representation:

   - many of those details are considered "architecture-specific" in CCL
   - this is the sort of thing where "close" doesn't count.

I don't know where the last two bugs are, but the first step in both cases
would seem to be to be absolutely sure that code which deals with things
at that level gets things absolutely right; if I see that something
constructed in a local function of a function named GEN-INTEGER is
passed to a function named %BIGNUM-SIGN (that may be CCL::%BIGNUM-SIGN),
I'd certainly want to look at that function.

>
> Thank you,
> Ian
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list