[Openmcl-devel] Random crashing
osei.poku at gmail.com
Thu Jul 17 19:44:42 UTC 2008
On Jul 17, 2008, at 3:43 PM, Gail Zacharias wrote:
> At 7/17/2008 03:28 PM, Osei Poku wrote:
>> I updated today from svn but this thing happened again. Again the PC
>> was in the pthread memory region and %rdi was 0. I verified that the
>> fix (r9997 i think) was in my ccl working directory (somewhere in
>> thread_manager.c right?).
>> My current version is:
>> Clozure Common Lisp Version 1.2-r10073M-RC1 (LinuxX8664)!
>> Is there anything other than (rebuild-ccl :force t) that I need to do
>> to recompile the c source for the lisp kernel?
> To rebuild the kernel, you need to do (rebuild-ccl :FULL t).
Ah. Cool thanks.
>> On Jul 9, 2008, at 3:05 PM, Gary Byers wrote:
>> > --On July 9, 2008 2:26:56 PM -0400 Osei Poku <osei.poku at gmail.com>
>> > wrote:
>> >> Hi,
>> >> It crashed again for me. This time I managed to grab the
>> contents of
>> >> /proc/pid/maps before I killed it. Logs of the tty session and
>> >> memory
>> >> maps are attached. I had also managed to update from the
>> >> repository to
>> >> r9890-RC1.
>> >> Osei
>> > It seems to be crashed in the threads library (libpthread.so).
>> > There's a race condition in the code which suspends threads
>> > on entry to the GC: the thread that's running the GC looks
>> > at each thread that it wants to suspend to see if it's
>> > still alive (the data structure that represents a thread
>> > might still be around, even if the OS-level thread has
>> > exited.) The suspending thread looks at the tcr->osid
>> > field of the target, notes that it's non-zero, then
>> > calls a function to send the os-level thread a signal.
>> > That function accesses the tcr->osid field again (which,
>> > when non-zero, represents a POSIX thread ID) and calls
>> > pthread_kill()).
>> > When a thread dies, it clears its tcr->osid field, so
>> > if the target thread dies between the point when the
>> > suspending thread looks and the point where it leaps,
>> > we wind up calling pthread_kill() with a first argument
>> > of 0, and it crashes. That's consistent with the
>> > register information: we're somewhere in the threads
>> > library (possibly in pthread_kill()), and the register
>> > in which C functions receive their first argument (%rdi)
>> > is 0.
>> > I'll try to check in a fix for that (look before leaping)
>> > soon. As I understand it, SLIME will sometimes (depending
>> > on the setting of a "communication style" variable)
>> > spawn a thread in which to run each form being evaluated
>> > (via C-M-x or whatever); whether that's a good idea or
>> > not, consing short-lived threads all the time is probably
>> > a good way to trigger this bug. I don't use SLIME, and
>> > don't know what the consequences of changing the communication
>> > style variable would be.
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
More information about the Openmcl-devel