[Openmcl-devel] process-enable issue

Sat Jul 26 04:07:07 PDT 2008

I think that it's the case when process-enable tries to enable process that
is already running.
If i am getting this right  from manual process-run-function is using
process-enable when creating process and my case is that I am creating fixed
number of worker processes at the start of program and those messages/errors
I get after hours of working (and not creating processes no more) so why
process-enable then ?
If i get this completely wrong ... sorry :)

On Fri, Jul 25, 2008 at 10:25 PM, Gary Byers <gb at clozure.com> wrote:

> In the original bug report, the backtrace for what was thread #35 showed
>
>  (2AAAAD619B18) : 0 (PROCESS-ENABLE #<PROCESS Worker thread(38) [Active]
> #x300043A1C8ED> [...]) 405
>  (2AAAAD619B68) : 1 (%PROCESS-RUN-FUNCTION '(:NAME "Worker thread")
> #<COMPILED-LEXICAL-CLOSURE (:INTERNAL ACL2::RUN-THREAD) #x300043A1CD7F> NIL)
> 1373
>  (2AAAAD619C58) : 2 (PROCESS-RUN-FUNCTION "Worker thread"
> #<COMPILED-LEXICAL-CLOSURE (:INTERNAL ACL2::RUN-THREAD) #x300043A1CD7F>
> [...]) 213
>
> and :proc showed
>
> 38 :    Worker thread  [Active] 35 :    Worker thread  [semaphore wait]
>  (Requesting terminal input)
> 14 :    Worker thread  [semaphore wait] 1 : -> listener     [Active] 0 :
>  Initial      [Active]
>
> In other words, thread 35 created thread 38 and was waiting for it
> to signal a semaphore that would indicate that it's reset itself
> and is ready to be enabled (given a function to run).  :PROC shows
> that thread 38 is already running, which doesn't make much sense.
> The Linux kernel that David Rager was running was one that allegedly
> had just fixed a bug which could cause the the wrong thread to be
> awakened via FUTEX_WAIT, and it seemed plausible that that bug hadn't
> really been fixed there.  The case that failed reliably for David
> on the machine that David was using worked reliably for me, similar
> cases seemed to work for others, and blaming this on something at
> the OS level makes more sense than anything else that I can think
> of.  (Another fuzzy explanation is that malloc() - when called
> from two threads at the same time - returned the same block of
> memory to both callers because of a locking problem, so two
> threads wound up sharing the same "pointer to semaphore".)
>
> There's a separate issue in that PROCESS-ENABLE waits for the target
> thread to indicate that it's "ready" with a timeout of 1 second. That's
> usually long enough, but it's entirely arbitrary (how long
> it actually takes depends on the load on and the whims of the
> scheduler.)  Taking longer than a second might indicate that the
> newly-created thread isn't getting enough CPU time to signal its
> readiness to run,  The whole notion of having a timeout for
> something that can take an indeterminate amount of time is
> questionable, so it probably makes sense to not use a one-second
> timeout in PROCESS-ENABLE by default, at the very least.
>
> Can you tell whether it was the first case (where PROCESS-ENABLE
> was waiting to enable a thread that - somehow - seems to have
> already been enabled) or the second (the one-second timeout is
> too short, and quite possibly the entire idea of a timeout is
> misguided) or the second ?
>
> In the former case, the thread being enabled would be on the
> list returned by (ALL-PROCESSES) or in the output displayed
> by :PROC, and in the latter case it wouldn't.
>
>
> On Fri, 25 Jul 2008, Milan Jovanovic wrote:
>
>  Hi, i have problems with multi-threading on linux, i think it's the same
>> like "http://trac.clozure.com/openmcl/ticket/297"
>> First it was "Unable to enable process #<PROCESS ...have been trying for 1
>> seconds" and inferior-list segmentation fault after 2-3 hours of running
>> (this was on SUSE LINUX 10.0 X86-64 2.6.13-15-smp)
>>
>> After Gary Byers suggestion  that it is meaby linux kernel bug i tried  on
>> SUSE Server 10 (x86_64) -  kernel 2.6.24. After more then day of running
>> with no errors i saw one more  "Unable to enable process #<PROCESS ...have
>> been trying for 1 seconds" but this time no segmentation fault.
>> So I'm asking is it problem/bug if this happens  or only if it happens
>> with
>> segmentation fault following ?
>>
>> btw. i tried code on sbcl to be sure that it's not something there and
>> it's
>> running couple of days with no problems
>>
>> Thanks
>> Best,Milan
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20080726/7c1abc52/attachment.htm>