[Openmcl-devel] CCL trunk on Linux ARM / Ubuntu - Unhandled exception 4

Gary Byers gb at clozure.com
Sat May 31 02:35:29 PDT 2014


I don't have a good guess, but another bad guess is that this is an instruction
cache problem.  (For anyone who cares: most RISC machines have separate instruction
and data caches; the X86 has a unified cache.  When machine code is written to
memory, it's often actually only written to the data cache and isn't actually
loaded into the instuction cache unless special steps are taken.  A typical
symptom of failing to take those steps is that the processor executes code
"that isn't there" (what the processor actually executes isn't consistent
with what appears to be in memory.)

The symptoms that you're seeing are at least somewhat similar to that.
When that sort of cache-consistency problem occurs, it is sometimes
possible to continue after the exception (in the hope that the code
in memory will actually be executed this time.)

If continuing from the kernel debugger (X) dies in the same way at
the same address, then it's less likely that there really is an illegal
instruction there (and the problem becomes trying to figure out how it
got there); if it dies in some other way/place then the theory that
it's a cache consistency problem needs to be further explored.

CCL "puts code in memory" all the time, but when it's starting up
(which seems to be when it's dying for you) the only code in memory
was mapped there from the image file and the kernel.

One other thing that'd be helpful: the kernel debugger prints the
process ID (PID) in its prompt (4597 below).  While the process is
still running, please do

$ cat /proc/PID/maps    ; where PID is the process ID of the crashed CCL

and send me the output.


Onn Fri, 30 May 2014, Rainer Joswig wrote:

> A third one - I'll stop now ;-)?
> 
> ... at odroidxu:~/Lisp/ccl/cclnew$ ./armcl -R 1G --no-init
> Welcome to Clozure Common Lisp Version 1.10-dev-r16089M-trunk? (LinuxARM32)!
> ? Unhandled exception 4 at 0x10083464, context->regs at #xbed57230
> ? for help
> [4597] Clozure CL kernel debugger: B
> current thread: tcr = 0x31448, native thread ID = 0x11f5, interrupts enabled
> 
> 
> (#xBED57500) #x102D8EA4 : #<Method-Function STREAM-FORCE-OUTPUT
> (BASIC-OUTPUT-STREAM) #x1410014e> + 248
> (#xBED57530) #x102D8E3C : #<Method-Function STREAM-FORCE-OUTPUT
> (BASIC-OUTPUT-STREAM) #x1410014e> + 144
> (#xBED57558) #x103C28EC : #<Function FORCE-OUTPUT #x1416dbfe> + 120
> (#xBED57568) #x1046282C : #<Function AUTO-FLUSH-INTERACTIVE-STREAMS
> #x141b096e> + 416
> (#xBED57598) #x104626E0 : #<Function AUTO-FLUSH-INTERACTIVE-STREAMS
> #x141b096e> + 84
> (#xBED575C0) #x10463400 : #<Function HOUSEKEEPING #x141b0d66> + 364
> (#xBED575D0) #x103A4890 : #<Function HOUSEKEEPING-LOOP #x1415c71e> + 348
> (#xBED57600) #x103A4840 : #<Function HOUSEKEEPING-LOOP #x1415c71e> + 268
> (#xBED57630) #x103A4830 : #<Function HOUSEKEEPING-LOOP #x1415c71e> + 252
> (#xBED57670) #x103A47F0 : #<Function HOUSEKEEPING-LOOP #x1415c71e> + 188
> (#xBED576E8) #x00009AFC : (subprimitive ret1valn)
> (#xBED576F8) #x103A4BD0 : #<Function (:INTERNAL (TOPLEVEL-FUNCTION
> (LISP-DEVELOPMENT-SYSTEM T))) #x1415c906> + 140
> (#xBED57728) #x103A4BB4 : #<Function (:INTERNAL (TOPLEVEL-FUNCTION
> (LISP-DEVELOPMENT-SYSTEM T))) #x1415c906> + 112
> (#xBED579B0) #x0000D430 : (subprimitive (null))
> (#xBED579E0) #x0000D420 : (subprimitive (null))
> (#xBED579F0) #x0000D4D4 : (subprimitive start_lisp)
> 
> 
> ?TCR = 0xb6900468, cstack area #xb6902aa0,? native thread ID = 0x11f6,
> interrupts enabled
> 
> 
> [4597] Clozure CL kernel debugger: R
> r00 = 0x00000013? ? r08 = 0x1401D586
> r01 = 0x00000004? ? r09 = 0x14026556
> r02 = 0x00000008? ? r10 = 0xB6D75F7C
> r03 = 0x00031448? ? r11 = 0x14026556
> r04 = 0xBED5754E? ? r12 = 0x144B9950
> r05 = 0x144BCF3E? ? r13 = 0xBED57500
> r06 = 0xB6D75F90? ? r14 = 0x102D8EA4
> r07 = 0x00000000? ? r15 = 0x10083464
> [4597] Clozure CL kernel debugger:? ? ?
> 
> 
> 
> Am 30.05.2014 um 15:08 schrieb Gary Byers <gb at clozure.com>:
>
>       One of the changes that I neglected to mention in my message the
>       other day is
>       that the ARM port now tried to reserve more address space on
>       startup than it
>       had in the past. ?It traditionally reserved about .5GB and now
>       it tries to
>       reserve 1.5GB.
>
>       The thing that seems to kind of shoot that theory down is that
>       it looks
>       like the image loaded for you where it's supposed to now
>       (starting at
>       #x10000000).
>
>       A couple of things that may be worth trying:
>
>       1) do things behave differently if you tell the lisp to try to
>       reserve less
>       memory ? ?E.g.
>
>       $ ./armcl -R 1G --no-init
>
>       2) If it still crashes, could I see the output of the kernel
>       debugger's R
>       command (which just prints the machine registers in hex.)
>
>       It doesn't make sense that the main thread would die with an
>       illegal instruction
>       188 bytes into DRAIN-TERMINATION-QUEUE or that any thread would
>       die with an
>       "unhandled exception 4" (SIGILL/illegal instruction) at
>       #x10083464), and we may
>       just be seeing some artifact related to memory-mapping
>       limitations.
> 
>
>       On Fri, 30 May 2014, Rainer Joswig wrote:
>
>             Hi,
>
>             I've got a fresh CCL from the SVN trunk into a new
>             directory.
>
>             Machine is a ODROID XU. Ubuntu Linux. On my ODROID
>             U3 it seems to work, though.
>
>             Recompiling the kernel doesn't help. The error
>             appears immediately after starting armcl .
> 
>
>             ... at odroidxu:~/Lisp/ccl/ccl$ ./armcl --no-init
>             Welcome to Clozure Common Lisp Version
>             1.10-dev-r16089M-trunk ?(LinuxARM32)!
>             ? Unhandled exception 4 at 0x10083464, context->regs
>             at #xbed4c288
>             ? for help
>             [3493] Clozure CL kernel debugger: B
>             current thread: tcr = 0x31448, native thread ID =
>             0xda5, interrupts enabled
> 
>
>             (#xBED4C558) #x1039C558 : #<Function
>             DRAIN-TERMINATION-QUEUE #x14158be6> + 188
>             (#xBED4C588) #x1039C4F0 : #<Function
>             DRAIN-TERMINATION-QUEUE #x14158be6> + 84
>             (#xBED4C5B0) #x108679B4 : #<Function (:INTERNAL
>             ADD-GC-HOOK) #x143dd686> + 88
>             (#xBED4C5C0) #x104632C0 : #<Function HOUSEKEEPING
>             #x141b0d66> + 44
>             (#xBED4C5D0) #x103A4890 : #<Function
>             HOUSEKEEPING-LOOP #x1415c71e> + 348
>             (#xBED4C600) #x103A4840 : #<Function
>             HOUSEKEEPING-LOOP #x1415c71e> + 268
>             (#xBED4C630) #x103A4830 : #<Function
>             HOUSEKEEPING-LOOP #x1415c71e> + 252
>             (#xBED4C670) #x103A47F0 : #<Function
>             HOUSEKEEPING-LOOP #x1415c71e> + 188
>             (#xBED4C6E8) #x00009AFC : (subprimitive ret1valn)
>             (#xBED4C6F8) #x103A4BD0 : #<Function (:INTERNAL
>             (TOPLEVEL-FUNCTION (LISP-DEVELOPMENT-SYSTEM T)))
>             #x1415c906> + 140
>             (#xBED4C728) #x103A4BB4 : #<Function (:INTERNAL
>             (TOPLEVEL-FUNCTION (LISP-DEVELOPMENT-SYSTEM T)))
>             #x1415c906> + 112
>             (#xBED4C9B0) #x0000D430 : (subprimitive (null))
>             (#xBED4C9E0) #x0000D420 : (subprimitive (null))
>             (#xBED4C9F0) #x0000D4D4 : (subprimitive start_lisp)
> 
>
>             TCR = 0xb6900468, cstack area #xb6902aa0, ?native
>             thread ID = 0xda6, interrupts enabled
>
>             Regards,
>
>             Rainer Joswig
>
>             _______________________________________________
>             Openmcl-devel mailing list
>             Openmcl-devel at clozure.com
>             http://lists.clozure.com/mailman/listinfo/openmcl-devel
> 
> 
> 
> 
>



More information about the Openmcl-devel mailing list