[Openmcl-devel] Unix signal handling
Gary Byers
gb at clozure.com
Tue Jul 6 21:18:39 PDT 2010
On Tue, 6 Jul 2010, Ron Garret wrote:
> Actually, on thinking about this some more, a message queue isn't necessary
> because signals are already segregated by the OS. So something like this
> should work:
> (defmacro set-signal-handler (signo &body body)
> (let ((sem (make-semaphore))
> (handler (gensym "HANDLER")))
> `(progn
> (defcallback ,handler (:int signo :void)
> (declare (ignore signo))
> (signal-semaphore ',sem))
> (#_signal ,signo ,handler)
> (process-run-function ,(format nil "SIGNAL ~A HANDLER" signo)
> (lambda ()
> (loop
> (wait-on-semaphore ',sem)
> , at body))))))
>
> I tried it and it actually does seem to work. To really make this
> bulletproof you'd want to tweak it so that calling set-signal-handler
> multiple times on the same signals didn't leave garbage processes lying
> around.
>
> rg
[I realize that you're just thinking out loud here; sorry if this reply
sounds like an overreaction.]
A few messages ago in this thread, I think that I said something to the
effect that you can't just define signal handlers via the FFI like this:
that it'd work some of the time, but that there were GC issues. (If
the GC runs in some thread at around the time that the signal handler
runs in another thread, the GC has no way of seeing the state of the
interrupted thread at the time that the signal occurred.)
I did in fact say that, so my conscience is clear in this case.
I'm not making this stuff up.
If the thread on which the signal handler runs is just executing a blocking
system call (sleeping, waiting for I/O) when the signal handler runs, this'll
probably work reliably.
Suppose that the thread is executing lisp code when the signal arrives.
This code, for example:
(let* ((x (cons y z)))
; <<- interrupt happens here
(foo x))
To keep from going mad, we won't even think about what happens if the interrupt
happens in the middle of the CONS operation; we'll just say that at the time
that the interrupt happens, the CONS bound to X is in some machine register.
The signal occurs; the OS saves the state of the interrupted thread (the "signal
context") and calls the handler function. When the signal handling function
returns, the OS will restore the interrupted thread's state from the signal
context and resume execution, generally as if nothing had happened.
As luck would have it, let's say that when the thread enters the
signal handling function some other thread decides to GC. If the GC
thread can't "see" the signal context of the interrupted thread, it
has no way of knowing that the CONS cell X is referenced. (The
register or stack location X is the only reference to that
newly-allocated cons cell in the entire lisp.) The GC frees the
unreferenced CONS, the signal handler returns, and FOO gets called with
a freed CONS cell (whatever that is ...) as an argument. A few instructions/
hours/days later, this causes a memory fault or mysterious error.
How likely this scenario is depends on lots of factors (what thread the
signal's delivered to, when - on what instruction boundary - the delivery
occurs, what the thread was doing at the time), but it certainly can happen.
CCL's own signal/exception handlers go through a fairly strict protocol to
try to ensure GC safety, and anything we do to support user-defined signal
handlers would have to deal with similar issues in similar ways.
More information about the Openmcl-devel
mailing list