[Openmcl-devel] ARM testing

David Brown lisp at davidb.org
Thu Jan 27 22:52:52 PST 2011


On Thu, Jan 27 2011, Gary Byers wrote:

> On Thu, 27 Jan 2011, David Brown wrote:

> In that case, we're running into the write-protected guard pages at the end
> of the listener thread's control stack; the same sequence of events happens
> for me if I do:
>
> ? (process-interrupt ccl::*initial-process* #'foo 0)

> If foreign code (including the GC, including rmark()) tries to write
> to those guard pages we expect to get a SIGSEGV; in general, it's
> harder to
> recover from an exception in foreign code, and I think that we just drop
> into the kernel debugger in that case.  (Or at least try to.)
>
> Do you get a STACK-OVERFLOW condition signaled in lisp ?  Or does this just
> die with SIGBUS ? Or does something else happen ?

It dies with a SIGBUS.

> Here's another theory that makes so much sense (at the moment) that it's probably
> completely wrong: it's possible that recent Linux kernels are refusing to map
> the last page of a stack region and signaling SIGBUS (at least on ARM) when
> attempts are made to write to that page.  (That's actually reminiscent of a
> Linux kernel change made last summer, where mmap() with the MAP_GROWSDOWN option
> refused to map the lowest page in the region it returned; that redefinition of
> mmap's behavior was - according to my possibly garbled understanding - related
> to stack growth/overflow detection.

Probably this:

  commit 320b2b8de12698082609ebbc1a17165727f4c893
  Author: Linus Torvalds <torvalds at linux-foundation.org>
  Date:   Thu Aug 12 17:54:33 2010 -0700
   
      mm: keep a guard page below a grow-down stack segment
      
      This is a rather minimally invasive patch to solve the problem of the
      user stack growing into a memory mapped area below it.  Whenever we fill
      the first page of the stack segment, expand the segment down by one
      page.
      
      Now, admittedly some odd application might _want_ the stack to grow down
      into the preceding memory mapping, and so we may at some point need to
      make this a process tunable (some people might also want to have more
      than a single page of guarding), but let's try the minimal approach
      first.
      
      Tested with trivial application that maps a single page just below the
      stack, and then starts recursing.  Without this, we will get a SIGSEGV
      _after_ the stack has smashed the mapping.  With this patch, we'll get a
      nice SIGBUS just as the stack touches the page just above the mapping.
      
      Requested-by: Keith Packard <keithp at keithp.com>
      Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>

It blocks the stack expansion when growing down if it would bump into an
adjacent page.  The patch explicitly causes a SIGBUS on the guard page.

I guess CCL is the "odd application".  I'm curious why I'm not seeing
this on x86/amd64, since I'm running a kernel with the same change, even
the process-interrupt above nicely produces a STACK-OVERFLOW condition.

> At the moment, I like this theory (but of course I liked the one from the other
> day, too.)  One way of testing it is to move GCstack_limit a page higher; it's
> set near the start of the function gc() in lisp-kernel/gc-common.c:
>
> /* ignore the other case of the containing 'if'.  This is around
>    line 1394 */
>
>     GCstack_limit = (natural)(tcr->cs_limit)+(natural)page_size;
>
> If we change 'page_size' to '2*page_size' in that line and recompile
> the kernel, does the problem (loading the bootstrapping image) persist ?

This fixes the bootstrap issue.  The process-interrupt still dies with
the SIGBUS, so maybe it just needs to be handled.

I was going to try running this on an earlier kernel, but I haven't been
able to get the old kernel to be stable enough to even boot fully.
Given this, I'm guessing that it would work fine.

David



More information about the Openmcl-devel mailing list