[Openmcl-devel] ARM testing

Thu Jan 27 06:38:23 PST 2011

On Tue, 25 Jan 2011, David Brown wrote:

> On Tue, Jan 25 2011, Gary Byers wrote:
>
>> If your stack limit is < 1.2MB and setting it to ~2MB in the shell before
>> running CCL fixes the problem, that would likely confirm the theory.  (If
>> it doesn't, that'd be disapointing, since it's otherwise a great theory ...)
>
> $ ulimit -s
> 8192
>

Certainly disappointing.

> Also note that giving a large -S value allows it to work.

Certainly ... confusing.  One theory (for which there doesn't seem to be 
any direct evidence) is that the behavior that you're seeing is simply
a bug in the Linux kernel that you're running.  There isn't enough
circumstantial evidence to convict, but there's probably enough to be
a bit suspicious of that.

If you want to try a couple of more fishing expeditions, they might tell
us something.

1) The CCL kernel defines a function 'os_get_current_thread_stack_bounds'
that returns by reference the 'base' (bottom) and size of the calling
thread's C stack.  Most system's thread libraries provide a non-portable
way of determining this, though they differ in the details.  On Linux,
'pthread_attr_getstack' returns the arithmetically lowest address and
size of the stack, and we add the size to the low address to find the
bottom.

This function is actually called by the startup code to determine the
bottom of the initial thread's stack; we assume that the size is
determined by the resource limit and is much larger than we want to
use, so we ignore it.

That function will be called exactly once when the lisp loads the
bootstrapping image.  If you set a breakpoint on that function and
step through it until after the call to 'pthread_attr_getstack', what
does:

(gdb) p *size

print ?

On the ARM Linux box I just tried that on here, I get 8MB, which is exactly
what 'limit stack' suggests that I should.

2) If you continue and let the image load fasl files, you'll reach the point
where it gets the SIGBUS in rmark().   If GDB gets control when the SIGBUS
is raised and the process isn't killed, you should be able to determine the
value of the stack pointer (r13) at the point where the fault occurred:

(gdb) info reg r13

If you then look at /proc/PID/maps, you'll likely see that r13 is near or
just past the lowest address of the stack region.

How far is it from the high address of that region, e.g., how large was
the stack allowed to grow before something decided that it had overflowed ?

>
> David
>
>