[Openmcl-devel] Random crashing

Osei Poku osei.poku at gmail.com
Thu Jul 17 12:44:42 PDT 2008


On Jul 17, 2008, at 3:43 PM, Gail Zacharias wrote:

> At 7/17/2008 03:28 PM, Osei Poku wrote:
>> Hello,
>>
>> I updated today from svn but this thing happened again.  Again the PC
>> was in the pthread memory region and %rdi was 0.  I verified that the
>> fix (r9997 i think) was in my ccl working directory (somewhere in
>> thread_manager.c right?).
>>
>> My current version is:
>> Clozure Common Lisp Version 1.2-r10073M-RC1  (LinuxX8664)!
>>
>> Is there anything other than (rebuild-ccl :force t) that I need to do
>> to recompile the c source for the lisp kernel?
>
>
> To rebuild the kernel, you need to do (rebuild-ccl :FULL t).

Ah.  Cool thanks.

>
>
>
>
>> On Jul 9, 2008, at 3:05 PM, Gary Byers wrote:
>>
>> >
>> >
>> > --On July 9, 2008 2:26:56 PM -0400 Osei Poku <osei.poku at gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> It crashed again for me.  This time I managed to grab the  
>> contents of
>> >> /proc/pid/maps before I killed it.  Logs of the tty session and
>> >> memory
>> >> maps are attached.  I had also managed to update from the
>> >> repository to
>> >> r9890-RC1.
>> >>
>> >> Osei
>> >>
>> >
>> >
>> > It seems to be crashed in the threads library (libpthread.so).
>> >
>> > There's a race condition in the code which suspends threads
>> > on entry to the GC: the thread that's running the GC looks
>> > at each thread that it wants to suspend to see if it's
>> > still alive (the data structure that represents a thread
>> > might still be around, even if the OS-level thread has
>> > exited.)  The suspending thread looks at the tcr->osid
>> > field of the target, notes that it's non-zero, then
>> > calls a function to send the os-level thread a signal.
>> > That function accesses the tcr->osid field again (which,
>> > when non-zero, represents a POSIX thread ID) and calls
>> > pthread_kill()).
>> >
>> > When a thread dies, it clears its tcr->osid field, so
>> > if the target thread dies between the point when the
>> > suspending thread looks and the point where it leaps,
>> > we wind up calling pthread_kill() with a first argument
>> > of 0, and it crashes.  That's consistent with the
>> > register information: we're somewhere in the threads
>> > library (possibly in pthread_kill()), and the register
>> > in which C functions receive their first argument (%rdi)
>> > is  0.
>> >
>> > I'll try to check in a fix for that (look before leaping)
>> > soon.  As I understand it, SLIME will sometimes (depending
>> > on the setting of a "communication style" variable)
>> > spawn a thread in which to run each form being evaluated
>> > (via C-M-x or whatever); whether that's a good idea or
>> > not, consing short-lived threads all the time is probably
>> > a good way to trigger this bug.  I don't use SLIME, and
>> > don't know what the consequences of changing the communication
>> > style variable would be.
>> >
>> >
>> >
>>
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>




More information about the Openmcl-devel mailing list