[Openmcl-devel] BUG: kernel debugger on incorrect args to QUOTE (armel)

Gary Byers gb at clozure.com
Fri Nov 18 02:31:24 PST 2011


This isn't running in qemu:

[src/ccl-dev] gb at havoc> cat /etc/issue
Ubuntu 11.10 \n \l

[src/ccl-dev] gb at havoc> ccl
Welcome to Clozure Common Lisp Version 1.8-dev-r15074M-trunk  (LinuxX8632)!
? (funcall 'quote 1)
> Error: Special operator or global macro-function QUOTE can't be FUNCALLed or APPLYed
> While executing: CCL::TOPLEVEL-EVAL, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
1 > (ccl::dbg)
Lisp Breakpoint
? for help
[18295] Clozure CL kernel debugger: v
Lisp kernel svn revision: 15074M
Symbol *OPENMCL-VERSION* at #x14040D86
   value    : "1.8-dev-r15074M-trunk  (~A)"
[18295] Clozure CL kernel debugger:

The kernel svn revision (shown by the kernel debugger 'v' command)
doesn't usually have to match the image version exactly (though
it will if everything's been rebuilt.)

Trying to funcall a symbol that names a macro or special operator
causes an illegal instruction to be executed, and signalling the
error in response depends on lisp code being able to interpet some
context information that the lisp kernel sets up.  Until a month
or so ago, that context info represented the absolute address of
the illegal instruction in this case.  Depending on where the
image gets mapped into memory, that address may not fit in a fixnum
(and this was true of the 32-bit x86 CCL running on 64-bit Fedora
15.)  We changed that (in both the kernel and the lisp), and that's
why my first reaction was to suspect that you were running a slightly
mismatched kernel and image.

The particular illegal instruction is actually an "int $#xc9" instruction.
On Linux, the exception that trying to execute such an instruction generates
maps to the SIGSEGV signal (11); you're getting a SIGILL (4).  I don't
know whether the SIGILL is caused by attempting to execute that "int"
instruction or for some other reason, but virtual environments sometimes
get this sort of thing wrong.  (If this was qemu's fault, it's pretty
clearly getting it right a lot of the time; the image wouldn't run at
all if it didn't.)

When I ran into this on Fedora 15 a month or two ago, I fixed it in a
pretty ugly way.  Matt was working on similar things at the time and
came up with a cleaner fix; in the meantime, I installed Ubuntu on that
machine.  The original problem was a symptom of the fact that Fedora 15
was mapping the system libraries into a region of the address space that
historically happened to be free for CCL's use; CCL mapped the image to
a higher address range, the illegal instruction wound up at an address
that wouldn't fit in a fixnum, and ... that was the problem.  I haven't
seen that (system libraries at lower addresses) before or since.  Matt
may have, but it'd be worth checking to be sure that what we thought
we were fixing actually got fixed.

Do you get a reasonable lisp error signaled if you try to call a symbol
that's not FBOUNDP ? E.g.,

? (fmakunbound 'foo)
FOO
? (foo)
> Error: Undefined function FOO called with arguments () .
[...]

That uses a very similar mechanism, only the software interrupt is
"int $#xc7"; the address of that instruction is very near that of the
instruction used in the (FUNCALL 'QUOTE ...) case.  If calling FOO
signals a lisp error and calling QUOTE gets lost,  I'd be very suspicious
that qemu isn't treating those two interrupts the same way; real hardware
does so.


On Fri, 18 Nov 2011, Eric Marsden wrote:

>>>>>> "gb" == Gary Byers <gb at clozure.com> writes:
>
>  gb> You need to make sure that the kernel and image that you're
>  gb> running are both up to date; this looks very much like a symptom
>  gb> of them being out of synch.  We changed some details about how
>  gb> the kernel passes exception information out to lisp code a month
>  gb> or so ago.  Sometimes when we make that kind of ABI change we also
>  gb> change version numbers so that mismatched kernels and images couldn't
>  gb> be used together, but didn't do so this time.
>
>  I did a full rebuild, on another machine, and am still seeing the same
>  symptoms. Note that this is running in qemu.
>
> ,----
> | Welcome to Clozure Common Lisp Version 1.8-dev-r15074M-trunk  (LinuxARM32)!
> | ? (funcall 'quote 1)
> | Unhandled exception 4 at 0x500008fe, context->regs at #x405addf8
> | ? for help
> `----
>
> -- 
> Eric Marsden
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list