[Openmcl-devel] ARM testing

Tue Jan 25 17:09:38 PST 2011

On Tue, 25 Jan 2011, David Brown wrote:

> GCstack_limit is 0xbe9e8000 whereas the memory region is is
>
> be9b5000-be9e7000 r-xp 00000000 00:00 0
> beaf8000-beb19000 rwxp 00000000 00:00 0    [stack]
>
> I looks like the CGstack_limit is a little over a MB earlier than the
> beginning of the actual stack.  (stack randomization is enabled, so it's
> different each time I run it).  It isn't obvious to me where this value
> is coming from for the initial stack.
>
> David
>

That actually looks sane: GCstack_limit is 4K bytes "before" (from the
point of view of a stack pointer growing down) the highest address of
the write-protected region, and the guard pages are about 1.2MB from
the bottom of the stack.  (That's supposed to be 1MB, but I think that
the size of the guard page region is being added to the size of the
usable stack.  That's generally harmless, but it might explain what's
happening here.)

The initial thread's stack is a bit different from that of other
threads.  A stack created by pthread_create() is a memory region
that contains pages with (at least) read and write permissions.  Like
most accessable regions created with mmap(), the pages are "zero-filled,
copy-on-write" and don't actually have physical or virtual memory associated
with them until they're touched (at which point a physical page is faulted
in.)

The initial thread's stack is allocated in the high end of the process's
address space (randomization might change the exact address of the stack's
bottom somewhat, but AFAIK it's still "near" the high end of the address
space.)  Most of the stack's (potential) pages are initially unmapped (a
page or two right below the stack bottom may be mapped); as the stack
grows, the logical pages are added to the low address of the stack region
and physical pages are associated with them.

There are usually a LOT of unmapped pages below the top (= low address)
of that region; they may eventually fill up with heap data/other stacks/
whatever.  AFAIK, the growth of the initial thread's stack is limited
by the process's stack resource limit.

In 32-bit CCL, the size of (that flavor of) stack nominally defaults
to 1MB.  If the intended stack size is below the resource limit, we
try to increase the limit to match the stack size.  Unfortunately,
after ensuring that the stack resource limit is 1MB (by default),
we effectively create a ~1.2MB stack; the resource limit is smaller
than (what we intend to be) the usable size of the stack, as marked
by things like GCstack_limit.

The bad news is that that's really, really bad; the good news is that
it provides a consistent explanation for the behavior that you're
seeing, and I've been having some difficulty coming up with one of
those.  (Part of what had been confusing is that you die with SIGBUS;
if the stack had sailed past GCstack_limit and something was written
to the guard pages beyond, you would have gotten a SIGSEGV.  I'm willing
to believe that trying to grow the stack region beyond the resource limit
results in SIGBUS on ARM Linux.)

On most platforms, the stack resource limit is large (a quick survey of
some of the machines I'm logged into - including an ARM netbook - shows
values ranging from 8MB to 512MB.)  This theory would be pretty much
proven if the stack resource limit when the bootstrapping image runs
on your system is < ~1.2M, and if raising that limit before running
CCL with the bootstrapping image caused the problem to go away.

(In bash,

$ ulimit -s

will show the stack resource limit in kb, and

$ ulimit -S -s 2048

will set that limit to 2MB.  In tcsh,

$ limit stack

and

$ limit stack 2048

will have the same effects.)

If your stack limit is < 1.2MB and setting it to ~2MB in the shell before
running CCL fixes the problem, that would likely confirm the theory.  (If
it doesn't, that'd be disapointing, since it's otherwise a great theory ...)