[Openmcl-devel] process-enable issue

Fri Jul 25 15:13:09 PDT 2008

In the case described, when I (:y 35), and type :go (or whatever made the
lisp system ignore the warning), IIRC, it all worked.  Therrefore, IIRC,
it's probably the latter, where one second isn't enough (or something new is
occurring to make threads not swap in as much).

The thing that may be indicative that it's not an OS problem, is that this
just started happening when I upgraded to the RC verson of CCL (RC 1.2?).  I
can inquire of our IT department if you would find whether there was an OS
change during this period to be relevant information.  RC 1.2 fixed another
OpenMCL problem (which I was quite pleased about), so it wasn't like I could
just keep using the old OpenMCL.

At least now our group is no longer the only group seeing and reporting this
behavior.

On Fri, Jul 25, 2008 at 1:25 PM, Gary Byers <gb at clozure.com> wrote:

> In the original bug report, the backtrace for what was thread #35 showed
>
>  (2AAAAD619B18) : 0 (PROCESS-ENABLE #<PROCESS Worker thread(38) [Active]
> #x300043A1C8ED> [...]) 405
>  (2AAAAD619B68) : 1 (%PROCESS-RUN-FUNCTION '(:NAME "Worker thread")
> #<COMPILED-LEXICAL-CLOSURE (:INTERNAL ACL2::RUN-THREAD) #x300043A1CD7F> NIL)
> 1373
>  (2AAAAD619C58) : 2 (PROCESS-RUN-FUNCTION "Worker thread"
> #<COMPILED-LEXICAL-CLOSURE (:INTERNAL ACL2::RUN-THREAD) #x300043A1CD7F>
> [...]) 213
>
> and :proc showed
>
> 38 :    Worker thread  [Active]
> 35 :    Worker thread  [semaphore wait]  (Requesting terminal input)
> 14 :    Worker thread  [semaphore wait]
> 1 : -> listener     [Active]
> 0 :    Initial      [Active]
>
> In other words, thread 35 created thread 38 and was waiting for it
> to signal a semaphore that would indicate that it's reset itself
> and is ready to be enabled (given a function to run).  :PROC shows
> that thread 38 is already running, which doesn't make much sense.
> The Linux kernel that David Rager was running was one that allegedly
> had just fixed a bug which could cause the the wrong thread to be
> awakened via FUTEX_WAIT, and it seemed plausible that that bug hadn't
> really been fixed there.  The case that failed reliably for David
> on the machine that David was using worked reliably for me, similar
> cases seemed to work for others, and blaming this on something at
> the OS level makes more sense than anything else that I can think
> of.  (Another fuzzy explanation is that malloc() - when called
> from two threads at the same time - returned the same block of
> memory to both callers because of a locking problem, so two
> threads wound up sharing the same "pointer to semaphore".)
>
> There's a separate issue in that PROCESS-ENABLE waits for the target
> thread to indicate that it's "ready" with a timeout of 1 second.
> That's usually long enough, but it's entirely arbitrary (how long
> it actually takes depends on the load on and the whims of the
> scheduler.)  Taking longer than a second might indicate that the
> newly-created thread isn't getting enough CPU time to signal its
> readiness to run,  The whole notion of having a timeout for
> something that can take an indeterminate amount of time is
> questionable, so it probably makes sense to not use a one-second
> timeout in PROCESS-ENABLE by default, at the very least.
>
> Can you tell whether it was the first case (where PROCESS-ENABLE
> was waiting to enable a thread that - somehow - seems to have
> already been enabled) or the second (the one-second timeout is
> too short, and quite possibly the entire idea of a timeout is
> misguided) or the second ?
>
> In the former case, the thread being enabled would be on the
> list returned by (ALL-PROCESSES) or in the output displayed
> by :PROC, and in the latter case it wouldn't.
>
> On Fri, 25 Jul 2008, Milan Jovanovic wrote:
>
> > Hi, i have problems with multi-threading on linux, i think it's the same
> > like "http://trac.clozure.com/openmcl/ticket/297"
> > First it was "Unable to enable process #<PROCESS ...have been trying for
> 1
> > seconds" and inferior-list segmentation fault after 2-3 hours of running
> > (this was on SUSE LINUX 10.0 X86-64 2.6.13-15-smp)
> >
> > After Gary Byers suggestion  that it is meaby linux kernel bug i tried
>  on
> > SUSE Server 10 (x86_64) -  kernel 2.6.24. After more then day of running
> > with no errors i saw one more  "Unable to enable process #<PROCESS
> ...have
> > been trying for 1 seconds" but this time no segmentation fault.
> > So I'm asking is it problem/bug if this happens  or only if it happens
> with
> > segmentation fault following ?
> >
> > btw. i tried code on sbcl to be sure that it's not something there and
> it's
> > running couple of days with no problems
> >
> > Thanks
> > Best,Milan
> >
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20080725/198e9d0f/attachment.htm>