[Openmcl-devel] Unix signal handling

Wed Jul 7 08:32:38 PDT 2010

On Jul 7, 2010, at 12:12 AM, Gary Byers wrote:

> 
> 
> On Tue, 6 Jul 2010, Ron Garret wrote:
> 
>> 
>> On Jul 6, 2010, at 9:18 PM, Gary Byers wrote:
>> 
>>> 
>>> 
>>> On Tue, 6 Jul 2010, Ron Garret wrote:
>>> 
>>>> Actually, on thinking about this some more, a message queue isn't necessary
>>>> because signals are already segregated by the OS.  So something like this
>>>> should work:
>>>> (defmacro set-signal-handler (signo &body body)
>>>> (let ((sem (make-semaphore))
>>>>       (handler (gensym "HANDLER")))
>>>>   `(progn
>>>>      (defcallback ,handler (:int signo :void)
>>>>        (declare (ignore signo))
>>>>        (signal-semaphore ',sem))
>>>>      (#_signal ,signo ,handler)
>>>>      (process-run-function ,(format nil "SIGNAL ~A HANDLER" signo)
>>>>                            (lambda ()
>>>>                              (loop
>>>>                                (wait-on-semaphore ',sem)
>>>>                                , at body))))))
>>>> I tried it and it actually does seem to work.  To really make this
>>>> bulletproof you'd want to tweak it so that calling set-signal-handler
>>>> multiple times on the same signals didn't leave garbage processes lying
>>>> around.
>>>> rg
>>> 
>>> [I realize that you're just thinking out loud here; sorry if this reply
>>> sounds like an overreaction.]
>>> 
>>> A few messages ago in this thread, I think that I said something to the
>>> effect that you can't just define signal handlers via the FFI like this:
>>> that it'd work some of the time, but that there were GC issues.  (If
>>> the GC runs in some thread at around the time that the signal handler
>>> runs in another thread, the GC has no way of seeing the state of the
>>> interrupted thread at the time that the signal occurred.)
>>> 
>>> I did in fact say that, so my conscience is clear in this case.
>> 
> 
>> Indeed you did say that.  And I actually read it.  This solution was
>> specifically designed with your caveats in mind.  The signal handler
>> only does one thing: call signal-semaphore, which is itself just an
>> FFI call.  All the Lispy stuff happens in a separate thread.  Is
>> there a reason that would not work reliably?  The kind of GC
>> interaction you describe would seem to me to be impossible if the
>> signal handler thread doesn't cons, and doesn't reference anything
>> that might become garbage.
> 
> CCL's GC moves lisp objects around in memory.  Functions are lisp objects.
> 
> So: some thread is minding its own business, running the function FOO.
> 
> A signal is delivered to that thread when the PC is N bytes into FOO.
> The OS saves the state of the thread (the signal context) and executes
> the signal handler.  The GC runs and stops all other threads; the
> thread in question is about to signal the semaphore.  The GC decides
> that memory would look better if #'FOO was moved somewhere.  It
> carefully updates all references to #'FOO and to PC values inside
> #'FOO on all stacks and in all signal contexts that it's aware of.
> (It's not aware of the signal context involving FOO and the signal
> handler.)
> 
> The GC finishes its work and resumes all other threads.  The thread
> running the signal handler signals the semaphore and the handler function
> returns; the OS then restores the thread's state (register values, mostly)
> to the values saved in the signal context.  Code resumes execution at
> an address where #'FOO used to be before the GC moved it.
> 
> That's not good.
> 
> The handler function itself is entirely well-behaved; it doesn't even cons.
> (The GC is invoked in this scenario because some other thread consed.)  The
> problem here has to do with the fact that there's this sort of magic control
> transfer/state change from "running FOO" to "running a signal handler" and
> there'll be another magic transition back to "running FOO", and the interrupted
> state - the signal context - isn't visible to the GC.
> 
> I hope that this makes sense.  There's a very real issue here; it's certainly
> an obscure one, but it's not just hypothetical.

That makes perfect sense.  And I get that this is a real issue.  Like I said in a followup to this thread, I was able to get Lisp to hang (easily) with an empty signal handler and a constantly-consing background thread.

What I don't get now is how SIGINT can possibly work, even though it obviously does.  In between when the SIGINT is received and the underlying signal handler has a chance to run its first instruction it seems to me that the system must be in the same state that it would be if the signal handler for SIGINT were the empty signal handler.  It seems to me that at that point the scenario you describe above could play itself out.  And yet it obviously doesn't.

Looking through the code I see that user_signal_handler calls something called DarwinSigReturn, which is a macro that invokes darwin_sigreturn, which is a little snippet of assembler code.  Is that the secret sauce?  How does it work?

rg