[Openmcl-devel] Unix signal handling
gb at clozure.com
Wed Jul 7 07:12:36 UTC 2010
On Tue, 6 Jul 2010, Ron Garret wrote:
> On Jul 6, 2010, at 9:18 PM, Gary Byers wrote:
>> On Tue, 6 Jul 2010, Ron Garret wrote:
>>> Actually, on thinking about this some more, a message queue isn't necessary
>>> because signals are already segregated by the OS. So something like this
>>> should work:
>>> (defmacro set-signal-handler (signo &body body)
>>> (let ((sem (make-semaphore))
>>> (handler (gensym "HANDLER")))
>>> (defcallback ,handler (:int signo :void)
>>> (declare (ignore signo))
>>> (signal-semaphore ',sem))
>>> (#_signal ,signo ,handler)
>>> (process-run-function ,(format nil "SIGNAL ~A HANDLER" signo)
>>> (lambda ()
>>> (wait-on-semaphore ',sem)
>>> , at body))))))
>>> I tried it and it actually does seem to work. To really make this
>>> bulletproof you'd want to tweak it so that calling set-signal-handler
>>> multiple times on the same signals didn't leave garbage processes lying
>> [I realize that you're just thinking out loud here; sorry if this reply
>> sounds like an overreaction.]
>> A few messages ago in this thread, I think that I said something to the
>> effect that you can't just define signal handlers via the FFI like this:
>> that it'd work some of the time, but that there were GC issues. (If
>> the GC runs in some thread at around the time that the signal handler
>> runs in another thread, the GC has no way of seeing the state of the
>> interrupted thread at the time that the signal occurred.)
>> I did in fact say that, so my conscience is clear in this case.
> Indeed you did say that. And I actually read it. This solution was
> specifically designed with your caveats in mind. The signal handler
> only does one thing: call signal-semaphore, which is itself just an
> FFI call. All the Lispy stuff happens in a separate thread. Is
> there a reason that would not work reliably? The kind of GC
> interaction you describe would seem to me to be impossible if the
> signal handler thread doesn't cons, and doesn't reference anything
> that might become garbage.
CCL's GC moves lisp objects around in memory. Functions are lisp objects.
So: some thread is minding its own business, running the function FOO.
A signal is delivered to that thread when the PC is N bytes into FOO.
The OS saves the state of the thread (the signal context) and executes
the signal handler. The GC runs and stops all other threads; the
thread in question is about to signal the semaphore. The GC decides
that memory would look better if #'FOO was moved somewhere. It
carefully updates all references to #'FOO and to PC values inside
#'FOO on all stacks and in all signal contexts that it's aware of.
(It's not aware of the signal context involving FOO and the signal
The GC finishes its work and resumes all other threads. The thread
running the signal handler signals the semaphore and the handler function
returns; the OS then restores the thread's state (register values, mostly)
to the values saved in the signal context. Code resumes execution at
an address where #'FOO used to be before the GC moved it.
That's not good.
The handler function itself is entirely well-behaved; it doesn't even cons.
(The GC is invoked in this scenario because some other thread consed.) The
problem here has to do with the fact that there's this sort of magic control
transfer/state change from "running FOO" to "running a signal handler" and
there'll be another magic transition back to "running FOO", and the interrupted
state - the signal context - isn't visible to the GC.
I hope that this makes sense. There's a very real issue here; it's certainly
an obscure one, but it's not just hypothetical.
More information about the Openmcl-devel