[Openmcl-devel] ARM testing
Gary Byers
gb at clozure.com
Tue Jan 25 00:14:12 PST 2011
On Mon, 24 Jan 2011, David Brown wrote:
> On Fri, Jan 21 2011, Gary Byers wrote:
>
>> In the kernel debugger, the 'm' command will show the bounds of various
>> heap and stack memory areas that the lisp knows about. If you get a chance,
>> could you provoke the crash and look at the bounds of the 'cstack' area ?
>> There'll be only one of them, and the address that's generating the fault
>> should be in that area and near the lower bound. Is it ?
>
> I've caused the crash, but it doesn't drop into the kernel debugger, it
> just prints "Bus error" and returns me to a shell.
>
> Any suggestions?
This is a bit of a fishing expedition, but if you want to try running
under GDB:
0) Build the lisp kernel with C optimization disabled.
Edit ccl/lisp-kernel/linuxarm/Makefile; find the line
that may read:
COPT = -O2
and ensure that the -O2 is commented out:
COPT = #-O2
then
$ cd ccl/lisp-kernel/linuxarm
$ make cclean
$ make
$ cd ../..
1) Run GDB on the CCL kernel
$ gdb ./armcl
"source" an init file that tells GDB to ignore signals that lisp handles
(gdb) source lisp-kernel/linuxarm/.gdbinit
That'll incidentally set a breakpoint at the C function Bug().
Tell GDB to pass the right arguments to the lisp
(gdb) set args -I arm-boot
Around line 435 in the file lisp-kernel/arm-gc.c is a line of C code:
if (current_stack_pointer() > GCstack_limit) {
That's in the function rmark(), and rather than worrying about getting
the line number exactly right we can set a breakpoint at rmark(); neither
of the things being compared will change between the start of that function
and the comparison.
(gdb) br rmark
And run the lisp:
(gdb) r
After a few fasl files load, we'll hit the breakpoint and be back at
the GDB prompt. We want to examine the value of the variable GCstack_limit
and the value of the stack pointer register (r13):
(gdb) p/x GCstack_limit
(gdb) info reg r13
The stack pointer should be greater than the limit by somewhere around 1.2MB,
+/- a few 10s of KB.
I don't have a good theory that says "if that's true, it means ___", but if
it's false - if we think that there's a lot more room for recursion than
there in fact is - that'd explain what you're seeing.
2) (for extra credit)
With the lisp still sitting in GDB, determine the pid of the lisp
process and, in another shell, do:
$ cat /proc/PID/maps
That'll show a textual representation of the mapped memory regions
of the process. On my system, user processes seem to be limited to
the low 2GB of the address space, so the end of that output looks
like:
7eca0000-7ecd2000 r-xp 00000000 00:00 0
7edef000-7ee04000 rwxp 00000000 00:00 0 [stack]
On most other ARM Linux systems, the initial thread's stack is around
1GB (#x40000000) higher, so you may see numbers around #xbe****** instead
of #x7e******.
What that output shows is a write-protected region of around 200KB, a
gap of about 1.2MB, and the mapped pages of the initial thread's stack.
The value of GCstack_limit on my system is #x7ecd3000, e.g., 1 4K page
beyond the write-protected region, so the comparison is supposed to be
saying "if the stack pointer's getting close to the write-protected
region, stop recursing." On your system, it seems to be recursing
into the write-protected region, or there's something else between
the currently mapped stack pages and those guard pages, or the comparison's
being done wrong, or something like that. Seeing the end of the memory
map might say something about which of those things is happening.
There's no reason to keep the breakpoint in GDB or to let the process
continue: we know that after it calls rmark() a few million times,
it'll crash ... You can just quit out of GDB (killing the lisp in the
process) unless you want to see that happen ...
Sorry you asked ?
>
> David
>
>
More information about the Openmcl-devel
mailing list