[Openmcl-devel] Random crashing
Osei Poku
osei.poku at gmail.com
Thu Jul 17 12:28:34 PDT 2008
Hello,
I updated today from svn but this thing happened again. Again the PC
was in the pthread memory region and %rdi was 0. I verified that the
fix (r9997 i think) was in my ccl working directory (somewhere in
thread_manager.c right?).
My current version is:
Clozure Common Lisp Version 1.2-r10073M-RC1 (LinuxX8664)!
Is there anything other than (rebuild-ccl :force t) that I need to do
to recompile the c source for the lisp kernel?
Thanks,
Osei
On Jul 9, 2008, at 3:05 PM, Gary Byers wrote:
>
>
> --On July 9, 2008 2:26:56 PM -0400 Osei Poku <osei.poku at gmail.com>
> wrote:
>
>> Hi,
>>
>> It crashed again for me. This time I managed to grab the contents of
>> /proc/pid/maps before I killed it. Logs of the tty session and
>> memory
>> maps are attached. I had also managed to update from the
>> repository to
>> r9890-RC1.
>>
>> Osei
>>
>
>
> It seems to be crashed in the threads library (libpthread.so).
>
> There's a race condition in the code which suspends threads
> on entry to the GC: the thread that's running the GC looks
> at each thread that it wants to suspend to see if it's
> still alive (the data structure that represents a thread
> might still be around, even if the OS-level thread has
> exited.) The suspending thread looks at the tcr->osid
> field of the target, notes that it's non-zero, then
> calls a function to send the os-level thread a signal.
> That function accesses the tcr->osid field again (which,
> when non-zero, represents a POSIX thread ID) and calls
> pthread_kill()).
>
> When a thread dies, it clears its tcr->osid field, so
> if the target thread dies between the point when the
> suspending thread looks and the point where it leaps,
> we wind up calling pthread_kill() with a first argument
> of 0, and it crashes. That's consistent with the
> register information: we're somewhere in the threads
> library (possibly in pthread_kill()), and the register
> in which C functions receive their first argument (%rdi)
> is 0.
>
> I'll try to check in a fix for that (look before leaping)
> soon. As I understand it, SLIME will sometimes (depending
> on the setting of a "communication style" variable)
> spawn a thread in which to run each form being evaluated
> (via C-M-x or whatever); whether that's a good idea or
> not, consing short-lived threads all the time is probably
> a good way to trigger this bug. I don't use SLIME, and
> don't know what the consequences of changing the communication
> style variable would be.
>
>
>
More information about the Openmcl-devel
mailing list