[Openmcl-devel] ARM testing

Tue Jan 25 00:14:12 PST 2011

On Mon, 24 Jan 2011, David Brown wrote:

> On Fri, Jan 21 2011, Gary Byers wrote:
>
>> In the kernel debugger, the 'm' command will show the bounds of various
>> heap and stack memory areas that the lisp knows about.  If you get a chance,
>> could you provoke the crash and look at the bounds of the 'cstack' area ?
>> There'll be only one of them, and the address that's generating the fault
>> should be in that area and near the lower bound.  Is it ?
>
> I've caused the crash, but it doesn't drop into the kernel debugger, it
> just prints "Bus error" and returns me to a shell.
>
> Any suggestions?

This is a bit of a fishing expedition, but if you want to try running
under GDB:

0) Build the lisp kernel with C optimization disabled.
    Edit ccl/lisp-kernel/linuxarm/Makefile; find the line
    that may read:

COPT = -O2

and ensure that the -O2 is commented out:

COPT = #-O2

    then

$ cd ccl/lisp-kernel/linuxarm
$ make cclean
$ make
$ cd ../..

1) Run GDB on the CCL kernel

$ gdb ./armcl

    "source" an init file that tells GDB to ignore signals that lisp handles

(gdb) source lisp-kernel/linuxarm/.gdbinit

    That'll incidentally set a breakpoint at the C function Bug().
    Tell GDB to pass the right arguments to the lisp

(gdb) set args -I arm-boot

    Around line 435 in the file lisp-kernel/arm-gc.c is a line of C code:

   if (current_stack_pointer() > GCstack_limit) {

    That's in the function rmark(), and rather than worrying about getting
    the line number exactly right we can set a breakpoint at rmark(); neither
    of the things being compared will change between the start of that function
    and the comparison.

(gdb) br rmark

    And run the lisp:

(gdb) r

    After a few fasl files load, we'll hit the breakpoint and be back at
    the GDB prompt.  We want to examine the value of the variable GCstack_limit
    and the value of the stack pointer register (r13):

(gdb) p/x GCstack_limit

(gdb) info reg r13

   The stack pointer should be greater than the limit by somewhere around 1.2MB,
   +/- a few 10s of KB.

   I don't have a good theory that says "if that's true, it means ___", but if
   it's false - if we think that there's a lot more room for recursion than
   there in fact is - that'd explain what you're seeing.

2) (for extra credit)
   With the lisp still sitting in GDB, determine the pid of the lisp
   process and, in another shell, do:

$ cat /proc/PID/maps

   That'll show a textual representation of the mapped memory regions
   of the process.  On my system, user processes seem to be limited to
   the low 2GB of the address space, so the end of that output looks
   like:

7eca0000-7ecd2000 r-xp 00000000 00:00 0 
7edef000-7ee04000 rwxp 00000000 00:00 0          [stack]

   On most other ARM Linux systems, the initial thread's stack is around
   1GB (#x40000000) higher, so you may see numbers around #xbe****** instead
   of #x7e******.

   What that output shows is a write-protected region of around 200KB, a
   gap of about 1.2MB, and the mapped pages of the initial thread's stack.
   The value of GCstack_limit on my system is #x7ecd3000, e.g., 1 4K page
   beyond the write-protected region, so the comparison is supposed to be
   saying "if the stack pointer's getting close to the write-protected
   region, stop recursing."  On your system, it seems to be recursing
   into the write-protected region, or there's something else between
   the currently mapped stack pages and those guard pages, or the comparison's
   being done wrong, or something like that.  Seeing the end of the memory
   map might say something about which of those things is happening.

There's no reason to keep the breakpoint in GDB or to let the process
continue: we know that after it calls rmark() a few million times,
it'll crash ...  You can just quit out of GDB (killing the lisp in the
process) unless you want to see that happen ...

Sorry you asked ?

>
> David
>
>