[Openmcl-devel] Semaphore troubles

James M. Lawrence llmjjmll at gmail.com
Thu May 10 01:46:53 UTC 2012


I appreciate the time you've taken to respond.

When I wrote "0.5 seconds or whatever", I meant that it fails for
apparently any amount of time. As I mentioned in response to another
reply, a 10 second sleep produces the failure as well. Is that
consonant with your explanation?

It is also unclear why

* it does not fail when the loop is removed (along with the push nils)

* it does not fail when (format t ".") is removed

Perhaps these are just curiosities due to entropy in the underlying
system calls.

The upshot of what you're saying is that Clozure cannot reliably
distribute work across threads, while other CL implementations can. I
would not call it a bug, but it's at least unfortunate. In fact
Clozure scales better than SBCL for parallel mapping and other
functions (stats available upon request), barring these peculiar
hiccups.


On Wed, May 9, 2012 at 8:44 PM, Gary Byers <gb at clozure.com> wrote:
>
>
> On Wed, 9 May 2012, James M. Lawrence wrote:
>
>> I thought my example was straightforward enough, though as I mentioned
>> I wish it were smaller. Following your suggestion, I have replaced the
>> queue with a stack. I have also taken out the condition-wait function
>> copied from bordeaux-threads. My pop function now resembles your
>> consume function.
>>
>> The same assertion failure occurs.
>>
>> I am unable to reproduce it with high debug settings, or with tracing,
>> or with logging.
>
>
>>
>> The test consists of a pair of worker threads pulling from a task
>> queue. We push two tasks: one task returns immediately, the other task
>> sleeps for 0.2 seconds (it can be 0.5 seconds or whatever, it just
>> takes longer to fail). Since we have two workers, we should always
>> obtain the result of the sleeping task second. A signal is getting
>> missed, or something.
>
>
> You're assuming that whatever thread pulls the lambda that returns
> 'SOONER will off of TASKS will push 'SOONER onto RECEIVER before
> another thread pulls another lambda that sleeps for .2 seconds before
> returning 'LATER pushes 'LATER on RECEIVER.  That assumption is likely
> to hold a high percentage of the time, but I can't think of anything
> that guarantees it. (The OS scheduler may have decided that it should
> let Emacs re-fontify some buffers for a while, or let the kernel
> process all of those network packets that've been gumming up the
> works, and when it gets back to CCL it finds that it's time for the
> sleeping thread to wake up and it gets scheduled and pushes LATER
> on RECEIVER before the other thread even wakes up.  This kind of scenario
> isn't as likely as one where 'SOONER is pushed first, but
> it's not wildly improbable, either.  It's "likely" that 'SOONER will
> be pushed first - maybe even "highly likely".  It's more likely (more
> highly likely ?) if the sleeping thread sleeps longer, but non-realtime
> OSes (like most flavors of Linux, like OSX, like ...) don't make the
> scheduling guarantees that you seem to be assuming.
>
> While you're thinking "this thread should run before the other one because
> it's ready to run and the other one is sleeping", the scheduler's thinking
> "that CPU has been really active lately; better shut it down for a little
> while so that it doesn't get too hot or consume too much power", or
> something
> equally obscure and unintuitive.  If you change compiler options, or
> do printing or logging (or otherwise change how threads use the CPU cycles
> they're given), your code looks different to the scheduler and behaves
> differently (in subtle and not-always-predictable ways.)
>
> Of all the thread-related bugs that've ever existed in CCL, the most
> common cause has probably been "code wasn't prepared to deal with
> concurrency"; a close second is probably "code is making unwarranted
> assumptions about scheduler behavior."  After many years of getting beaten
> by those things, I think and hope that I'm more inclined to question some
> assumptions that I used to make automatically and implicitly, and my first
> reaction is to question the assumption that you're making.  It's more likely
> that the thread that doesn't sleep will push 'SOONER before the thread that
> sleeps pushes 'LATER, but nothing guarantees this, lots of factors affect
> what happens, and all that I can see is that things that're statistically
> unlikely happen occasionally.
>
> Scheduling behavior is likely beyond the grasp of mere mortals; we can have
> a reasonable, largely accurate model of how things will behave, but we have
> to bear in mind that that's all we have.
>
> Semaphores in CCL are very thin wrappers around whatever the OS provides
> (POSIX
> semaphores, Mach semaphores, something-or-other on Windows.)  If you say "a
> [semaphore] must be getting dropped", you're either saying that there's a
> problem
> in that very thin wrapper or that we're all doomed (because what the OS
> provides
> doesn't work), and you're also saying that your code demonstrates this
> problem
> and no one else's notices.  Some or all of those things could be true, but
> you're
> claiming that they must be because you think that you know which thread will
> run before which other thread.  You don't know that; all you really know is
> that's
> probably true.
>
>
>  (defun test ()
>>
>>  (let ((tasks (make-stack)))
>>   (loop
>>      :repeat 2
>>      :do (ccl:process-run-function
>>           "test"
>>           (lambda ()
>>             (loop (funcall (or (pop-stack tasks)
>>                                (return)))))))
>>   (let ((receiver (make-stack)))
>>     (push-stack (lambda ()
>>                   (push-stack (progn (sleep 0.2) 'later)
>>                               receiver))
>>                 tasks)
>>     (push-stack (lambda ()
>>                   (push-stack 'sooner receiver))
>>                 tasks)
>>     (let ((result (pop-stack receiver)))
>>       (assert (eq 'sooner result)))
>>     (let ((result (pop-stack receiver)))
>>       (assert (eq 'later result))))
>>   (push-stack nil tasks)
>>   (push-stack nil tasks))
>>  (format t "."))
>>
>> (defun run ()
>>  (loop (test)))
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>>
>>
>



More information about the Openmcl-devel mailing list