[Openmcl-devel] SLOT-UNBOUND broken?

Gary Byers gb at clozure.com
Wed May 6 23:54:55 PDT 2009


I think that this is fixed in both the trunk and 1.3 as of r12012.

If anyone's interested: the problem was that while calling back to lisp
from foreign code, we pushed the register that contained the CPU flags/
condition codes on the stack, ensured that the stack was aligned on
a 16-byte boundary (as it is at that point on most x8632 platforms),
called out to lisp and, if we returned, popped the flags off of the
stack and returned to the lisp kernel code that did the callback. 
Unfortunately, if the "ensure that the stack was aligned" operation
actually changed the stack pointer - as it often would on Linux -
we weren't popping the flags from the place they'd been pushed, and
the CPU flags were set to unpredictable values: I was often getting
the trace flag set (and crashing in a storm of SIGTRAP exceptions);
the "direction flag" (which controls whether memory-moving instructions
increment or decrement the registers that address their operands) could
have been set to an unexpected value, and general wackiness would have
ensued.

I had thought that continuing (via STORE-VALUE/USE-VALUE/CONTINUE)
after an UNBOUND-VARIABLE error was also affected by this.  It was,
but there are other problems with the way that those errors are
signaled and handled that keep those restarts from working.  Until
that's fixed, we won't automatically generate those restarts on
UNBOUND-VARIABLE errors.  (As far as I can tell, the restarts have
never worked on x86* versions of CCL; apparently, the feature isn't
used that often, since I don't think that it's failure to work has
ever been reported.

On Wed, 6 May 2009, Gary Byers wrote:

> What I find is that it seems to work fine on all ppc and x86-64
> platforms that I've tried it on and on 32-bit x86 Darwin, Windows,
> Solaris, and FreeBSD.  On 32-bit x86 Linux, the unbound slot causes a
> trap (as it should on all platforms); the lisp kernel calls out to a
> lisp handler, the handler (seems to) call the appropriate SLOT-UNBOUND
> method, and when it returns to the lisp kernel, something is very,
> very wrong.
>
> Ah.  In general, returning/continuing after an error trap seems to be broken
> on 32-bit x86 Linux:
>
> ? (defvar *x*)
> *X*
> ? (defun foo () *x*)
> FOO
> ? (foo)>
>> Error: Unbound variable: *X*
>> While executing: FOO, in process listener(1).
>> Type :GO to continue, :POP to abort, :R for a list of available restarts.
>> If continued: Retry getting the value of *X*.
>> Type :? for other options.
> 1 > (use-value 1)
> -> crashes very hard for me; it even kills the process.
>
> On Wed, 6 May 2009, Leslie P. Polzer wrote:
>
>>
>> The following code (from the test suite) fails horribly
>> on my machine:
>>
>> (defclass slot-unbound-class-01 ()
>>  ((a :reader sunb-a)
>>   (b :accessor sunb-b)
>>   (c :writer sunb-c)
>>   (e :reader sunb-e)
>>   (f :reader sunb-f)))
>>
>> (defmethod slot-unbound ((class t) (obj slot-unbound-class-01) (slot-name t))
>>  (list (class-name class) slot-name))
>>
>> (let ((obj (make-instance 'slot-unbound-class-01)))
>>  (values
>>   (slot-value obj 'a)
>>   (slot-value obj 'b)
>>   (slot-value obj 'c)))
>>
>> Here's some more information:
>>
>> [17190] Clozure CL kernel debugger: b
>> current thread: tcr = 0xb7c12a90, native thread ID = 0x432a, interrupts enabled
>>
>>
>> (#xB7AADD44) #x00000000 : #<Function %MAYBE-STD-SLOT-VALUE-USING-CLASS #x14142EDE> + ??
>> (#xB7AADD7C) #x14CB40C5 : #<Anonymous Function #x14CB4076> + 79
>> (#xB7AADD90) #x14A3001D : #<Function (:INTERNAL %DO DO-ENTRY) #x14A2FD3E> + 735
>> (#xB7AADDA0) #x14A309FD : #<Function DO-ENTRY #x14A3061E> + 991
>> (#xB7AADDE4) #x14A3B335 : #<Function DO-ENTRIES #x14A3B1AE> + 391
>> (#xB7AADDF8) #x14A3C00D : #<Function DO-TESTS #x14A3BD66> + 679
>> (#xB7AADE5C) #x14891EE5 : #<Function REPORT-TIME #x14891DAE> + 311
>> (#xB7AADE94) #x143802FD : #<Function CALL-CHECK-REGS #x14380206> + 247
>> (#xB7AADEB0) #x143E467D : #<Function (:INTERNAL EVAL-STRING STARTUP-CCL) #x143E44BE> + 447
>> (#xB7AADED4) #x143E4D5D : #<Function STARTUP-CCL #x143E4786> + 1495
>> (#xB7AADF04) #x14329A65 : #<Function (:INTERNAL (TOPLEVEL-FUNCTION
>> (LISP-DEVELOPMENT-SYSTEM T))) #x14329A1E> + 71
>> (#xB7AADF14) #x143E357D : #<Function (:INTERNAL MAKE-MCL-LISTENER-PROCESS) #x143E3336> +
>> 583
>> (#xB7AADF60) #x1433030D : #<Function RUN-PROCESS-INITIAL-FORM #x1433006E> + 671
>> (#xB7AADFA4) #x14330C9D : #<Function (:INTERNAL (%PROCESS-PRESET-INTERNAL (PROCESS)))
>> #x14330B4E> + 335
>> (#xB7AADFCC) #x1431856D : #<Function (:INTERNAL THREAD-MAKE-STARTUP-FUNCTION)
>> #x14318456> + 279
>>
>> [17190] Clozure CL kernel debugger: L
>> %ebx (arg_z) = #<header ? #xFFFFFFFF>
>> %esi (arg_y) = #<header ? #x08060F67>
>> ------
>> %edi (fn) = #<header ? #xFFFFFFFF>
>> ------
>> %ecx (temp0) = 0
>> %edx (temp1) = marked as unboxed (DF set)
>> ------
>> %edx (nargs) = -303020452 (maybe)
>>
>> Intriguingly one of the registers pointed at a vector in an earlier
>> session; now the addresses seem to be bogus.
>>
>> On a tangent, is there a way to instrument low-level functions
>> (like the %maybe-std-slot-value-using-class) with print statements?
>
> One issue with doing so is that you want to avoid the infinite recursion
> that'd occur when PRINT calls %MAYBE-STD-SLOT-VALUE-USING-CLASS (which now
> calls some flavor of PRINT, which eventually tries to tell you that the
> stack has overflowed but can't do that without PRINTing ...)
>
>>
>> I've tried it but then rebuiling CCL would fail at the heap
>> building stage.
>
> Yup.  (It's probably true that %MAYBE-STD-SLOT-VALUE-USING-CLASS gets called
> before anything has any idea of what a STREAM is, much less how to print to one.)
>
> What sort of works (doesn't scale well but avoids recursion/bootstrapping issues)
> is to use CCL::DBG to enter the kernel debugger, which can sort of (if you squint)
> print lisp objects (some better than others.)  CCL::DBG's optional argument gets
> loaded into the arg_z register (%ebx on x8632), and you can sometimes make
> sense of arg_z in the kernel debugger's L output.
>
> In this particular case, it looks to me like everything on the lisp side (at least
> up until the unbound slot is discovered) is OK:
>
> ? (slot-value (make-instance 'example) 'slot)
>> Error: Slot SLOT is unbound in #<EXAMPLE #x14B352CE>
>> While executing: #<STANDARD-KERNEL-METHOD SLOT-UNBOUND (T T T)>, in process listener(1).
>> Type :POP to abort, :R for a list of available restarts.
>> Type :? for other options.
> 1 >
>
> and we call (in this case) the default SLOT-UNBOUND method; the bug seems to have
> to do with being able to return to the point where the unbound slot was discovered
> in the case where SLOT-UNBOUND tries to return a value.  (And, apparently, with
> continuing after an error that generates a trap on x8632 Linux in general.  All
> of that code is machine- and OS-dependent.)
>
>>
>> I also attempted to look at some symbols but the kernel debugger
>> seems to have a different idea of what a symbol means as it couldn't
>> find any I asked it about.
>
> ? for help
> [4479] Clozure CL kernel debugger: s
>
>  symbol name :READ-CHAR            ; it doesn't upper-case the pname for you
> Symbol READ-CHAR at #x140A60E6
>   value    : #<Unbound>
>   function : #<Function READ-CHAR #x1470522E>
>
> All that that debugger function does is to scan the lisp heap until it finds
> a symbol whose pname matches the (case-sensitive) string that you enter; if
> it finds such a symbol, it shows its global function and value cell values.
>
>
>>
>>  Leslie
>>
>> --
>> http://www.linkedin.com/in/polzer
>>
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>>
>>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list