[Openmcl-devel] killing foreign thread in CCL::PREPARE-TO-QUIT

Mon Jun 4 05:04:26 PDT 2012

A foreign thread (in this context) is a thread that lisp didn't
create but which has called into lisp at some point.

CCL::PREPARE-TO-QUIT is effectively called by QUIT and by
SAVE-APPLICATION.  It does the same things in both cases, but the
constraints are different.  CCL::PREPARE-TO-QUIT tries get other
(lisp) threads to shut themselves down via PROCESS-KILL (so that
UNWIND-PROTECT cleanups run, etc) and if they don't terminate fairly
quickly it tries to arrange that the thread uses pthread_exit() to
kill itself.  (pthread_exit() will run "cancellation handlers" but
won't run UNWIND-PROTECT cleanups, so a thread killed in this abrupt
manner won't release locks/flush buffers/close files and ... well,
there are other problems there (and those problems are occasionally
visible.)

pthread_exit() isn't explicitly noted as being safe to call from a
signal handler.  We seem to get away with it in many cases, but if
it fails and/or confuses things we can't easily blame those other
things.  We could try to "cancel" the thread (via pthread_cancel()),
but that can take a long time to take effect.

Things like PROCESS-RESET and PROCESS-KILL make as much sense as they
do if lisp code (PROCESS-RUN-FUNCTION/MAKE-PROCESS) created the thread,
but probably don't make any sense at all for other (foreign) threads;
as far as I know, they'd be NOPs in that case.  (If this - exiting
cleanly from a foreign thread - meant something, what would that be ?
If the foreign thread was running lisp code when it was told to exit
cleanly, the most that it could do would be to throw to the point where
lisp code was entered via a callback, and throwing to anything other
than the most recent callback would be left as an exercise.

The abrupt termination mechanism works on Unix by sending the thread 
a signal whose handler does a little bit of stuff and then calls 
pthread_exit(), unwinding the stack (well. the foreign parts of the
stack) along the way.  If foreign code masks the signal, this can't
work at all; if "the stack" involves a lot of interleaved lisp and C++
frames, some things will be cleaned up and some things won't.

On Unix systems (I have a vague memory that a Windows process lives as
long as at least one thread in the process stays alive, but I hope that
that's just a minor hallucination), QUIT tries to kill other threads
"to be nice"; it's just going to call exit() anyway, and giving threads
a chance to exit as cleanly as they can is desirable but not critical.
SAVE-APPLICATION's situation is a little different: it generally wants
to save an image from a dynamic environment that's as much as possible
like the environment in which it's loaded, and part of trying to ensure
that involves actually saving the image in a single-threaded environment.
(Another thread running - and modifying lisp objects in the heap - while
the heap is being written doesn't sound like a good idea; another thread
interrupted or suspended while trying to modify some data structure doesn't
sound like a substantially better idea, and the most tractable case is
one in which all lisp threads have exited cleanly when the image is written.)

Back to your specific issues: I might be convinced otherwise, but I
don't feel too bad about the status quo if I express the issue as "CCL
can't reliably terminate threads that it didn't create".  (If Qt could
reliably terminate lisp threads that it became aware of because they'd
called Qt code at some point, I'd wonder how on earth it did that.)  I
don't think that trying to terminate a foreign thread cleanly (via
PROCESS-KILL) is likely more than a NOP (*), but it's not clear that
trying to PROCESS-KILL a foreign thread is worthwhile.

If there's a short-term fix, it'd probably involve:
- making CCL::PREPARE-TO-QUIT effectively take an argument indicating
   whether it's critical that other threads be killed cleanly (the
   SAVE-APPLICATION case) or merely desirable (QUIT).
- making CCL::PREPARE-TO-QUIT not even try to terminate foreign threads
- at least strongly consider getting rid of the whole concept of abrupt
   thread termination
- if thread termination is "critical" and foreign threads are present
   (or other threads just wouldn't die cleanly in a reasonable amount of
   time), don't just proceed to quietly save the image (print an error
   message or warning or something.)

Qt may offer a way to quit a Qt application and do or not do whatever 
thread cleanup of its threads is desirable; if Qt's use of threads is
a relatively new feature, there may be ways to disable it at runtime
or at build time.

There certainly could be things that I haven't thought of on the lisp
side, but my sense is that there are some pretty severe inherent
limitations here.  The "foreign thread" mechanism was intended to
allow a thread that wasn't created by Lisp code to call into lisp.
In the cases where it's been useful, the foreign threads in question
were relatively short-lived (the assumption was that the thread would
perform some computation and exit, and the issue(s) involved what's
necessary to allow that thread to access lisp data.)  Foreign threads
that're persistent parts of the application raise a different set of
issues, and my intuition is that those issues don't have particularly
clean solutions.

(*) Some Lisp Interaction Modes for Emacs think that it's a good idea
for them to handle SERIOUS-CONDITIONs. PROCESS-KILL involves
signalling a particular kind of SERIOUS-CONDITION and depending on how
this is handled (I don't remember whether it logs or prints a message
- always wise things to do when handling a SERIOUS-CONDITION like
ALMOST-OUT-OF-MEMORY-DAMNIT or
NOW-YOU|'|RE-ALMOST-OUT-OF-STACK-SPACE-ARE-YOU-GOING-TO-LOG-THIS-TOO -
you might see a message about an unhandled condition.  The Lisp Interaction
Mode for Emacs that I use isn't particularly Superior, but my recollection
is that this logging/message whenever a thread which established this handler
was reset.

On Mon, 4 Jun 2012, Ivan Shvedunov wrote:

>  Hello.
>  I've noticed an unpleasant effect when working with recent Qt
> versions using CommonQt library. Namely, when using QtWebKit which
> tends to start some extra worker threads Qt 4.7.4 produces strange
> warning about unexpected exception and Qt 4.8/4.8.1 crashes after
> calling CCL:QUIT, sometimes losing some unflushed output. After
> spending many hours trying to debug the problem, I've discovered
> that exempting foreign threads from being killed in CCL::PREPARE-TO-QUIT
> prevents the app from the crash. Apparently the way CCL tries
> to kill them doesn't play well with QThread / QThreadPool mechanisms.
> What would you advice? Maybe CCL really shouldn't try to kill foreign
> threads? I'm afraid it's very difficult to force Qt to kill all
> these threads before quitting.
>
> -- 
> Ivan Shvedunov <ivan4th at gmail.com>
> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9? F7D0 613E C0F8 0BC5 2807
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>