[Openmcl-devel] Random crashing

Gail Zacharias gz at clozure.com
Thu Jul 17 12:43:06 PDT 2008


At 7/17/2008 03:28 PM, Osei Poku wrote:
>Hello,
>
>I updated today from svn but this thing happened again.  Again the PC
>was in the pthread memory region and %rdi was 0.  I verified that the
>fix (r9997 i think) was in my ccl working directory (somewhere in
>thread_manager.c right?).
>
>My current version is:
>Clozure Common Lisp Version 1.2-r10073M-RC1  (LinuxX8664)!
>
>Is there anything other than (rebuild-ccl :force t) that I need to do
>to recompile the c source for the lisp kernel?


To rebuild the kernel, you need to do (rebuild-ccl :FULL t).



>On Jul 9, 2008, at 3:05 PM, Gary Byers wrote:
>
> >
> >
> > --On July 9, 2008 2:26:56 PM -0400 Osei Poku <osei.poku at gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> It crashed again for me.  This time I managed to grab the contents of
> >> /proc/pid/maps before I killed it.  Logs of the tty session and
> >> memory
> >> maps are attached.  I had also managed to update from the
> >> repository to
> >> r9890-RC1.
> >>
> >> Osei
> >>
> >
> >
> > It seems to be crashed in the threads library (libpthread.so).
> >
> > There's a race condition in the code which suspends threads
> > on entry to the GC: the thread that's running the GC looks
> > at each thread that it wants to suspend to see if it's
> > still alive (the data structure that represents a thread
> > might still be around, even if the OS-level thread has
> > exited.)  The suspending thread looks at the tcr->osid
> > field of the target, notes that it's non-zero, then
> > calls a function to send the os-level thread a signal.
> > That function accesses the tcr->osid field again (which,
> > when non-zero, represents a POSIX thread ID) and calls
> > pthread_kill()).
> >
> > When a thread dies, it clears its tcr->osid field, so
> > if the target thread dies between the point when the
> > suspending thread looks and the point where it leaps,
> > we wind up calling pthread_kill() with a first argument
> > of 0, and it crashes.  That's consistent with the
> > register information: we're somewhere in the threads
> > library (possibly in pthread_kill()), and the register
> > in which C functions receive their first argument (%rdi)
> > is  0.
> >
> > I'll try to check in a fix for that (look before leaping)
> > soon.  As I understand it, SLIME will sometimes (depending
> > on the setting of a "communication style" variable)
> > spawn a thread in which to run each form being evaluated
> > (via C-M-x or whatever); whether that's a good idea or
> > not, consing short-lived threads all the time is probably
> > a good way to trigger this bug.  I don't use SLIME, and
> > don't know what the consequences of changing the communication
> > style variable would be.
> >
> >
> >
>
>_______________________________________________
>Openmcl-devel mailing list
>Openmcl-devel at clozure.com
>http://clozure.com/mailman/listinfo/openmcl-devel




More information about the Openmcl-devel mailing list