[Openmcl-devel] process-enable issue

Fri Jul 25 15:46:14 PDT 2008

My mistake: if you just do:

? (make-process "foo")

the process will run a little bit of code, add itself to the list of
all processes, then signal a semaphore and wait to be preset and
enabled.

In David's case, the creating thread's wait had timed out, but by the
time he did :proc, interrupted the waiting thread, and printed a
backtrace, the thread was initialized and ready to go, and its
whostate was "Active".  That's a change in how whostates are
implemented; in 1.1, the newly-reset thread would have reported itself
as "Reset" instead of "Active", and the former's more accurate.  The
thread isn't really "Active" - it's still waiting to be preset and
enabled - and I started postulating that the thread had somehow been
enabled due to very low-level wires getting crossed somewhere.

So, there are two bugs here:

1) the whole idea of a timeout in PROCESS-ENABLE is wrong (since we
don't generally know how long it'll take for the target thread to
get ready to run), and we should just wait indefinitely.

2) a newly-created or newly-reset thread should not have a whostate of
"Active"; that's an unintentional change which can cause at least one
person (the person who made the change) to get very confused.

Sorry; will fix.

On Fri, 25 Jul 2008, David Rager wrote:

> In the case described, when I (:y 35), and type :go (or whatever made the
> lisp system ignore the warning), IIRC, it all worked.  Therrefore, IIRC,
> it's probably the latter, where one second isn't enough (or something new is
> occurring to make threads not swap in as much).
>
> The thing that may be indicative that it's not an OS problem, is that this
> just started happening when I upgraded to the RC verson of CCL (RC 1.2?).  I
> can inquire of our IT department if you would find whether there was an OS
> change during this period to be relevant information.  RC 1.2 fixed another
> OpenMCL problem (which I was quite pleased about), so it wasn't like I could
> just keep using the old OpenMCL.
>
> At least now our group is no longer the only group seeing and reporting this
> behavior.
>
> On Fri, Jul 25, 2008 at 1:25 PM, Gary Byers <gb at clozure.com> wrote:
>
>> In the original bug report, the backtrace for what was thread #35 showed
>>
>>  (2AAAAD619B18) : 0 (PROCESS-ENABLE #<PROCESS Worker thread(38) [Active]
>> #x300043A1C8ED> [...]) 405
>>  (2AAAAD619B68) : 1 (%PROCESS-RUN-FUNCTION '(:NAME "Worker thread")
>> #<COMPILED-LEXICAL-CLOSURE (:INTERNAL ACL2::RUN-THREAD) #x300043A1CD7F> NIL)
>> 1373
>>  (2AAAAD619C58) : 2 (PROCESS-RUN-FUNCTION "Worker thread"
>> #<COMPILED-LEXICAL-CLOSURE (:INTERNAL ACL2::RUN-THREAD) #x300043A1CD7F>
>> [...]) 213
>>
>> and :proc showed
>>
>> 38 :    Worker thread  [Active]
>> 35 :    Worker thread  [semaphore wait]  (Requesting terminal input)
>> 14 :    Worker thread  [semaphore wait]
>> 1 : -> listener     [Active]
>> 0 :    Initial      [Active]
>>
>> In other words, thread 35 created thread 38 and was waiting for it
>> to signal a semaphore that would indicate that it's reset itself
>> and is ready to be enabled (given a function to run).  :PROC shows
>> that thread 38 is already running, which doesn't make much sense.
>> The Linux kernel that David Rager was running was one that allegedly
>> had just fixed a bug which could cause the the wrong thread to be
>> awakened via FUTEX_WAIT, and it seemed plausible that that bug hadn't
>> really been fixed there.  The case that failed reliably for David
>> on the machine that David was using worked reliably for me, similar
>> cases seemed to work for others, and blaming this on something at
>> the OS level makes more sense than anything else that I can think
>> of.  (Another fuzzy explanation is that malloc() - when called
>> from two threads at the same time - returned the same block of
>> memory to both callers because of a locking problem, so two
>> threads wound up sharing the same "pointer to semaphore".)
>>
>> There's a separate issue in that PROCESS-ENABLE waits for the target
>> thread to indicate that it's "ready" with a timeout of 1 second.
>> That's usually long enough, but it's entirely arbitrary (how long
>> it actually takes depends on the load on and the whims of the
>> scheduler.)  Taking longer than a second might indicate that the
>> newly-created thread isn't getting enough CPU time to signal its
>> readiness to run,  The whole notion of having a timeout for
>> something that can take an indeterminate amount of time is
>> questionable, so it probably makes sense to not use a one-second
>> timeout in PROCESS-ENABLE by default, at the very least.
>>
>> Can you tell whether it was the first case (where PROCESS-ENABLE
>> was waiting to enable a thread that - somehow - seems to have
>> already been enabled) or the second (the one-second timeout is
>> too short, and quite possibly the entire idea of a timeout is
>> misguided) or the second ?
>>
>> In the former case, the thread being enabled would be on the
>> list returned by (ALL-PROCESSES) or in the output displayed
>> by :PROC, and in the latter case it wouldn't.
>>
>> On Fri, 25 Jul 2008, Milan Jovanovic wrote:
>>
>>> Hi, i have problems with multi-threading on linux, i think it's the same
>>> like "http://trac.clozure.com/openmcl/ticket/297"
>>> First it was "Unable to enable process #<PROCESS ...have been trying for
>> 1
>>> seconds" and inferior-list segmentation fault after 2-3 hours of running
>>> (this was on SUSE LINUX 10.0 X86-64 2.6.13-15-smp)
>>>
>>> After Gary Byers suggestion  that it is meaby linux kernel bug i tried
>>  on
>>> SUSE Server 10 (x86_64) -  kernel 2.6.24. After more then day of running
>>> with no errors i saw one more  "Unable to enable process #<PROCESS
>> ...have
>>> been trying for 1 seconds" but this time no segmentation fault.
>>> So I'm asking is it problem/bug if this happens  or only if it happens
>> with
>>> segmentation fault following ?
>>>
>>> btw. i tried code on sbcl to be sure that it's not something there and
>> it's
>>> running couple of days with no problems
>>>
>>> Thanks
>>> Best,Milan
>>>
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>>
>