[Openmcl-devel] threads & crashes

Gary Byers gb at clozure.com
Sun Jun 12 00:11:52 UTC 2005



On Sun, 12 Jun 2005, Ralf Stoye wrote:

> Hi,
> i have the following problem with Openmcl (tried in 0.14.3 and 14.2-p1) on OS 
> X 10.3.8  / G5 Dual 2GHz
> i experience crashes using some packages utilising multithreading
> (one of these is portable aserve, another one a simple client-server system 
> we use inhouse)
> reducing it to the basics i seems that openmcl crashes when some threads 
> producing garbage (e.g. short-living lists)
> running at the same time
> here are some examples to reproduce this behaviour,
> i would be very happy if anybody could comment on this
> Thanks!
>
> Ralf
>

The good news is that I think that this is fixed in the current 
development sources; I was able to do:

(dotimes (k 1000)
    (process-run-function '(:name "hello" :priority -1)
                          #'make-garbage 1000)
    (sleep 0.01))

without experiencing any problems.

The bad news is that I think that there are 3 or 4 possible changes
that could have fixed this.  The most likely culprit (and the simplest
change) is in the following code, in the function "suspend_tcr()" in
"ccl:lisp-kernel;thread-manager.c":

Boolean
suspend_tcr(TCR *tcr)
{
   int suspend_count = atomic_incf(&(tcr->suspend_count));
   if (suspend_count == 1) {
#ifdef DARWIN
       if (mach_suspend_tcr(tcr)) {
 	tcr->flags |= TCR_FLAG_BIT_ALT_SUSPEND;
 	return true;
       }
#endif
     if (pthread_kill((pthread_t)ptr_from_lispobj(tcr->osid), thread_suspend_signal) == 0) {

It seems to be the case that if a thread has a pending exception when
it's suspended via mach_suspend_tcr(), Mach will send the same exception
message twice.  (That's bad.)  That can cause the symptom that you reported
and a few others.

I believe that the call to mach_suspend_tcr() was added to 0.14.2-p1
CVS last fall; I'm fairly sure that it wasn't in the released version
of 0.14.2-p1.  I'd ordinarily say that this is a fairly likely culprit,
but if you experience this in the released 0.14.2-p1, I'm less confident
of that.  (If it's in something built from CVS after 0.14.2-p1 was released,
that's a different story.)

If you conditionalize-out the Darwin-specific code above (e.g.

Boolean
suspend_tcr(TCR *tcr)
{
   int suspend_count = atomic_incf(&(tcr->suspend_count));
   if (suspend_count == 1) {
#if 0
#ifdef DARWIN
       if (mach_suspend_tcr(tcr)) {
 	tcr->flags |= TCR_FLAG_BIT_ALT_SUSPEND;
 	return true;
       }
#endif
#endif
     if (pthread_kill((pthread_t)ptr_from_lispobj(tcr->osid), thread_suspend_signal) == 0) {


and recompile the kernel, does the problem persist ?





More information about the Openmcl-devel mailing list