[Openmcl-devel] Random crashing

Osei Poku osei.poku at gmail.com
Thu Jul 17 12:28:34 PDT 2008


Hello,

I updated today from svn but this thing happened again.  Again the PC  
was in the pthread memory region and %rdi was 0.  I verified that the  
fix (r9997 i think) was in my ccl working directory (somewhere in  
thread_manager.c right?).

My current version is:
Clozure Common Lisp Version 1.2-r10073M-RC1  (LinuxX8664)!

Is there anything other than (rebuild-ccl :force t) that I need to do  
to recompile the c source for the lisp kernel?

Thanks,
Osei

On Jul 9, 2008, at 3:05 PM, Gary Byers wrote:

>
>
> --On July 9, 2008 2:26:56 PM -0400 Osei Poku <osei.poku at gmail.com>  
> wrote:
>
>> Hi,
>>
>> It crashed again for me.  This time I managed to grab the contents of
>> /proc/pid/maps before I killed it.  Logs of the tty session and  
>> memory
>> maps are attached.  I had also managed to update from the  
>> repository to
>> r9890-RC1.
>>
>> Osei
>>
>
>
> It seems to be crashed in the threads library (libpthread.so).
>
> There's a race condition in the code which suspends threads
> on entry to the GC: the thread that's running the GC looks
> at each thread that it wants to suspend to see if it's
> still alive (the data structure that represents a thread
> might still be around, even if the OS-level thread has
> exited.)  The suspending thread looks at the tcr->osid
> field of the target, notes that it's non-zero, then
> calls a function to send the os-level thread a signal.
> That function accesses the tcr->osid field again (which,
> when non-zero, represents a POSIX thread ID) and calls
> pthread_kill()).
>
> When a thread dies, it clears its tcr->osid field, so
> if the target thread dies between the point when the
> suspending thread looks and the point where it leaps,
> we wind up calling pthread_kill() with a first argument
> of 0, and it crashes.  That's consistent with the
> register information: we're somewhere in the threads
> library (possibly in pthread_kill()), and the register
> in which C functions receive their first argument (%rdi)
> is  0.
>
> I'll try to check in a fix for that (look before leaping)
> soon.  As I understand it, SLIME will sometimes (depending
> on the setting of a "communication style" variable)
> spawn a thread in which to run each form being evaluated
> (via C-M-x or whatever); whether that's a good idea or
> not, consing short-lived threads all the time is probably
> a good way to trigger this bug.  I don't use SLIME, and
> don't know what the consequences of changing the communication
> style variable would be.
>
>
>




More information about the Openmcl-devel mailing list