[Openmcl-devel] BUG: kernel debugger on incorrect args to QUOTE (armel)

Gary Byers gb at clozure.com
Fri Nov 18 05:39:36 PST 2011


Whoops.  I've been reading "linuxarm32" as "linuxx8632" througout this
thread.  Please ignore most of what I've said (if you haven't done
so already).

The details are (very) different, but the implementation's similar:
symbols that name global macros or special operators have something
that traps when called in their function cells; symbols that aren't
FBOUNDP have something that traps a little differently when called.
In both cases, the lisp kernel handles the trap and calls out to
lisp; the lisp code decodes the trap and signals an error.  Decoding
the trap is a few orders of magnitude simpler on the ARM, simply because
the ARM was designed by sane people ...

This machine happens to be a Pandaboard, running Ubuntu 11.04:
-----------
[src/ccl-dev] gb at lingling> uname -a
Linux lingling 2.6.38-1208-omap4 #11-Ubuntu SMP PREEMPT Fri Apr 15 16:34:35 UTC 2011 armv7l armv7l armv7l GNU/Linux
[src/ccl-dev] gb at lingling> ccl
Welcome to Clozure Common Lisp Version 1.8-dev-r15074M-trunk  (LinuxARM32)!
? (funcall 'quote 1)
> Error: Special operator or global macro-function QUOTE can't be FUNCALLed or APPLYed
> While executing: FUNCALL, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
1 > 
? (foo)
> Error: Undefined function FOO called with arguments () .
> While executing: CCL::TOPLEVEL-EVAL, in process listener(1).
> Type :GO to continue, :POP to abort, :R for a list of available restarts.
> If continued: Retry applying FOO to NIL.
> Type :? for other options.
1 > 
-----------

Something that I didn't pick up on in your original message (don't laugh,
you'll all be old and decrepit someday, too):

>> | ? (funcall 'quote 1)
>> | Unhandled exception 4 at 0x500008fe, context->regs at #x405addf8

The short version is that that address isn't a valid ARM instruction
address (it's not a multiple of 4 bytes); it is a valid THUMB2 address,
and there may be some confusion somewhere about whether an illegal
ARM or THUMB2 instruction was being executed.  (CCL and the C/ASM
code in the kernel are all supposed to be ARM code.)



On Fri, 18 Nov 2011, Gary Byers wrote:

> This isn't running in qemu:
>
> [src/ccl-dev] gb at havoc> cat /etc/issue
> Ubuntu 11.10 \n \l
>
> [src/ccl-dev] gb at havoc> ccl
> Welcome to Clozure Common Lisp Version 1.8-dev-r15074M-trunk  (LinuxX8632)!
> ? (funcall 'quote 1)
>> Error: Special operator or global macro-function QUOTE can't be FUNCALLed 
>> or APPLYed
>> While executing: CCL::TOPLEVEL-EVAL, in process listener(1).
>> Type :POP to abort, :R for a list of available restarts.
>> Type :? for other options.
> 1 > (ccl::dbg)
> Lisp Breakpoint
> ? for help
> [18295] Clozure CL kernel debugger: v
> Lisp kernel svn revision: 15074M
> Symbol *OPENMCL-VERSION* at #x14040D86
>  value    : "1.8-dev-r15074M-trunk  (~A)"
> [18295] Clozure CL kernel debugger:
>
> The kernel svn revision (shown by the kernel debugger 'v' command)
> doesn't usually have to match the image version exactly (though
> it will if everything's been rebuilt.)
>
> Trying to funcall a symbol that names a macro or special operator
> causes an illegal instruction to be executed, and signalling the
> error in response depends on lisp code being able to interpet some
> context information that the lisp kernel sets up.  Until a month
> or so ago, that context info represented the absolute address of
> the illegal instruction in this case.  Depending on where the
> image gets mapped into memory, that address may not fit in a fixnum
> (and this was true of the 32-bit x86 CCL running on 64-bit Fedora
> 15.)  We changed that (in both the kernel and the lisp), and that's
> why my first reaction was to suspect that you were running a slightly
> mismatched kernel and image.
>
> The particular illegal instruction is actually an "int $#xc9" instruction.
> On Linux, the exception that trying to execute such an instruction generates
> maps to the SIGSEGV signal (11); you're getting a SIGILL (4).  I don't
> know whether the SIGILL is caused by attempting to execute that "int"
> instruction or for some other reason, but virtual environments sometimes
> get this sort of thing wrong.  (If this was qemu's fault, it's pretty
> clearly getting it right a lot of the time; the image wouldn't run at
> all if it didn't.)
>
> When I ran into this on Fedora 15 a month or two ago, I fixed it in a
> pretty ugly way.  Matt was working on similar things at the time and
> came up with a cleaner fix; in the meantime, I installed Ubuntu on that
> machine.  The original problem was a symptom of the fact that Fedora 15
> was mapping the system libraries into a region of the address space that
> historically happened to be free for CCL's use; CCL mapped the image to
> a higher address range, the illegal instruction wound up at an address
> that wouldn't fit in a fixnum, and ... that was the problem.  I haven't
> seen that (system libraries at lower addresses) before or since.  Matt
> may have, but it'd be worth checking to be sure that what we thought
> we were fixing actually got fixed.
>
> Do you get a reasonable lisp error signaled if you try to call a symbol
> that's not FBOUNDP ? E.g.,
>
> ? (fmakunbound 'foo)
> FOO
> ? (foo)
>> Error: Undefined function FOO called with arguments () .
> [...]
>
> That uses a very similar mechanism, only the software interrupt is
> "int $#xc7"; the address of that instruction is very near that of the
> instruction used in the (FUNCALL 'QUOTE ...) case.  If calling FOO
> signals a lisp error and calling QUOTE gets lost,  I'd be very suspicious
> that qemu isn't treating those two interrupts the same way; real hardware
> does so.
>
>
> On Fri, 18 Nov 2011, Eric Marsden wrote:
>
>>>>>>> "gb" == Gary Byers <gb at clozure.com> writes:
>>
>>  gb> You need to make sure that the kernel and image that you're
>>  gb> running are both up to date; this looks very much like a symptom
>>  gb> of them being out of synch.  We changed some details about how
>>  gb> the kernel passes exception information out to lisp code a month
>>  gb> or so ago.  Sometimes when we make that kind of ABI change we also
>>  gb> change version numbers so that mismatched kernels and images couldn't
>>  gb> be used together, but didn't do so this time.
>>
>>  I did a full rebuild, on another machine, and am still seeing the same
>>  symptoms. Note that this is running in qemu.
>> 
>> ,----
>> | Welcome to Clozure Common Lisp Version 1.8-dev-r15074M-trunk 
>> (LinuxARM32)!
>> | ? (funcall 'quote 1)
>> | Unhandled exception 4 at 0x500008fe, context->regs at #x405addf8
>> | ? for help
>> `----
>> 
>> -- 
>> Eric Marsden
>> 
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>> 
>> 
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list