[Openmcl-devel] process-enable issue
gb at clozure.com
Fri Jul 25 20:25:18 UTC 2008
In the original bug report, the backtrace for what was thread #35 showed
(2AAAAD619B18) : 0 (PROCESS-ENABLE #<PROCESS Worker thread(38) [Active] #x300043A1C8ED> [...]) 405
(2AAAAD619B68) : 1 (%PROCESS-RUN-FUNCTION '(:NAME "Worker thread") #<COMPILED-LEXICAL-CLOSURE (:INTERNAL ACL2::RUN-THREAD) #x300043A1CD7F> NIL) 1373
(2AAAAD619C58) : 2 (PROCESS-RUN-FUNCTION "Worker thread" #<COMPILED-LEXICAL-CLOSURE (:INTERNAL ACL2::RUN-THREAD) #x300043A1CD7F> [...]) 213
and :proc showed
38 : Worker thread [Active]
35 : Worker thread [semaphore wait] (Requesting terminal input)
14 : Worker thread [semaphore wait]
1 : -> listener [Active]
0 : Initial [Active]
In other words, thread 35 created thread 38 and was waiting for it
to signal a semaphore that would indicate that it's reset itself
and is ready to be enabled (given a function to run). :PROC shows
that thread 38 is already running, which doesn't make much sense.
The Linux kernel that David Rager was running was one that allegedly
had just fixed a bug which could cause the the wrong thread to be
awakened via FUTEX_WAIT, and it seemed plausible that that bug hadn't
really been fixed there. The case that failed reliably for David
on the machine that David was using worked reliably for me, similar
cases seemed to work for others, and blaming this on something at
the OS level makes more sense than anything else that I can think
of. (Another fuzzy explanation is that malloc() - when called
from two threads at the same time - returned the same block of
memory to both callers because of a locking problem, so two
threads wound up sharing the same "pointer to semaphore".)
There's a separate issue in that PROCESS-ENABLE waits for the target
thread to indicate that it's "ready" with a timeout of 1 second.
That's usually long enough, but it's entirely arbitrary (how long
it actually takes depends on the load on and the whims of the
scheduler.) Taking longer than a second might indicate that the
newly-created thread isn't getting enough CPU time to signal its
readiness to run, The whole notion of having a timeout for
something that can take an indeterminate amount of time is
questionable, so it probably makes sense to not use a one-second
timeout in PROCESS-ENABLE by default, at the very least.
Can you tell whether it was the first case (where PROCESS-ENABLE
was waiting to enable a thread that - somehow - seems to have
already been enabled) or the second (the one-second timeout is
too short, and quite possibly the entire idea of a timeout is
misguided) or the second ?
In the former case, the thread being enabled would be on the
list returned by (ALL-PROCESSES) or in the output displayed
by :PROC, and in the latter case it wouldn't.
On Fri, 25 Jul 2008, Milan Jovanovic wrote:
> Hi, i have problems with multi-threading on linux, i think it's the same
> like "http://trac.clozure.com/openmcl/ticket/297"
> First it was "Unable to enable process #<PROCESS ...have been trying for 1
> seconds" and inferior-list segmentation fault after 2-3 hours of running
> (this was on SUSE LINUX 10.0 X86-64 2.6.13-15-smp)
> After Gary Byers suggestion that it is meaby linux kernel bug i tried on
> SUSE Server 10 (x86_64) - kernel 2.6.24. After more then day of running
> with no errors i saw one more "Unable to enable process #<PROCESS ...have
> been trying for 1 seconds" but this time no segmentation fault.
> So I'm asking is it problem/bug if this happens or only if it happens with
> segmentation fault following ?
> btw. i tried code on sbcl to be sure that it's not something there and it's
> running couple of days with no problems
More information about the Openmcl-devel