[Openmcl-devel] Unix signal handling

Gary Byers gb at clozure.com
Wed Jul 7 10:51:43 PDT 2010



On Wed, 7 Jul 2010, Ron Garret wrote:
>

> That makes perfect sense.  And I get that this is a real issue.
> Like I said in a followup to this thread, I was able to get Lisp to
> hang (easily) with an empty signal handler and a constantly-consing
> background thread.  >

> What I don't get now is how SIGINT can possibly work, even though it
> obviously does.  In between when the SIGINT is received and the
> underlying signal handler has a chance to run its first instruction
> it seems to me that the system must be in the same state that it
> would be if the signal handler for SIGINT were the empty signal
> handler.  It seems to me that at that point the scenario you
> describe above could play itself out.  And yet it obviously doesn't.

If the GC wants to run in some other thread, it sends all threads
a "suspend" signal.  (Since there aren't a whole lot of unused signal
numbers on Darwin, it may coopt SIGUSR1 for this purpose.)  A thread
receiving that "suspend" signal saves its context (as of the point
of the "suspend" in a TCR slot, signals a semaphore (tcr->suspend) to
acknowledge the suspend request, and waits for the GC thread to signal
another semaphore (tcr->resume).  The handler for the suspend signal
is called "suspend_resume_handler()".

user_signal_handler() just sets a flag; it runs with all signals
masked.  If a suspend signal is sent to a thread that's inside
user_signal_handler(), it's not delivered until user_signal_handler()
returns (and the thread's signal mask is restored to what it was 
when the SIGINT or whatever was delivered.)


> Looking through the code I see that user_signal_handler calls
> something called DarwinSigReturn, which is a macro that invokes
> darwin_sigreturn, which is a little snippet of assembler code.  Is
> that the secret sauce?  How does it work?

The way that signal delivery works in general is that the OS saves the
thread's signal context on the stack (this doesn't work too well if
the stack's overflowed and the stack pointer's referencing unmapped
memory ...), pushes a siginfo_t structure, pushes the signal
number/siginfo_t/sigcontext args (or loads them into argument registers),
pushes a return address, points the PC at the first instruction of the
handler function, and awakens the thread.  On most platforms, the return
address points to some code that does a "sigreturn" system call, which
magically restores the thread's state/signal mask.

As the manual actually confesses, we don't like to have a lisp stack
(which ordinarily contains nothing but tagged lisp objects on x86) to
have a signal context dumped on it in a way that's visible to the GC,
so we sometimes copy the signal context/siginfo to another stack.  (What
a tangled web we weave ...).  On most platforms, we can just arrange
that the handler (running on some other stack) will just return to the
same return address and do the magic sigreturn syscall; on Darwin, the
handler routine would return to some C library glue code that does nothing
particularly useful before doing a sigreturn.  Unfortunately, that glue
code thinks that it knows where the sigcontext is, and returning to that
code would have the effect of restoring a context that hasn't been visible
to the GC.  (We had a bug a few years ago where that was happening; the
amazing part is that that didn't always crash immediately.)

So, when we've done this copying of signal contexts on Darwin, we generally
want to be sure that we restore the copied signal context; the DarwinSigReturn()
macro expands into code that executes a sigreturn() syscall.

user_signal_handler() doesn't do any stack switching, so we could just return
from it and let the library glue do the sigreturn.  I suspect that after the
bug of a few years back - where a DarwinSigreturn was missing - we got a little
paranoid and tried to make sure that all handlers exited that way on Darwin.
(It doesn't hurt anything to do the sigreturn ourselves, though I don't think
it's necessary in all cases.)

Sorry you asked ?



> rg
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list