[Openmcl-devel] Semaphore troubles

Erik Pearson erik at defunweb.com
Wed May 9 16:43:30 PDT 2012


HI James,

Interesting question.

What about the case where the two tasks pushed onto the task stack complete
their subsequent pushes to the receive stack before the first call to
popping the receive stack? Since this is all a matter of timing, it looks
like this should become a viable case at certain timing boundary
conditions. Specifically, when the sleep time is close to the time it takes
the initial thread to slog through those pushes, the ensuing threads  are
tickled to life and disrupt the machine, and of course CCL is meanwhile
fiddling with its thread internals, and perhaps doing a gc during all this.
That time would be quite variable, seems to me, having to do with how busy
ccl is and other threads running on the system.

For that matter, when the sleep time is small enough the "later" task will
sometimes beat the "sooner" one.

Also, the printing of the "." in the test function may cause even more
thread mayhem. I found that eliminating that decreased the likelihood of
this effect.

I modifed your test a bit, and have run several iterations with various
sleep times, the the results pretty much bear this out, I think. At sleep
of 0.03 I get an average of about 1500 iterations before encountering a
case of popping 'later first, after 'sooner and 'later are on the stack.
The likelihood of this happening increases as the sleep time decreases
0.02, 171 iterations
0.01, 978
0.001 296
0.0001 78

And at low sleeps like 0.00001 am about to pop the 'later before 'sooner is
even pushed.

Erik.

On Wed, May 9, 2012 at 11:55 AM, James M. Lawrence <llmjjmll at gmail.com>wrote:

> I thought my example was straightforward enough, though as I mentioned
> I wish it were smaller. Following your suggestion, I have replaced the
> queue with a stack. I have also taken out the condition-wait function
> copied from bordeaux-threads. My pop function now resembles your
> consume function.
>
> The same assertion failure occurs.
>
> I am unable to reproduce it with high debug settings, or with tracing,
> or with logging.
>
> The test consists of a pair of worker threads pulling from a task
> queue. We push two tasks: one task returns immediately, the other task
> sleeps for 0.2 seconds (it can be 0.5 seconds or whatever, it just
> takes longer to fail). Since we have two workers, we should always
> obtain the result of the sleeping task second. A signal is getting
> missed, or something.
>
> Clozure does not pass the stress tests for my library, while other CL
> implementations do. I've put much effort into narrowing down this
> Clozure-only bug to this test case.
>
> I have found and fixed race conditions in Ruby which persisted for
> years. We both know that multi-threaded code can seem OK until poked
> in right (wrong?) place.
>
> My first inclination was to point the finger at bordeaux-threads,
> which is why I asked about its condition-wait function. It may not
> have a race condition since Clozure uses atomic counts (which remember
> the signal) instead of condition variables (which don't). However it
> is not obvious what happens for arbitrary numbers of threads waiting
> and signaling at arbitrary times. I had hoped that someone would
> reject the validity of bordeaux's condition-wait.
>
> This is now moot since condition-wait is out of the picture.
> Incidentally if bordeaux-threads has a bogus implementation on Clozure
> then this is news to me. If not then my original pop-queue should
> work, though somewhat roundaboutly as Clozure sees it.
>
> I also wondered if threads were somehow accumulating, causing Clozure
> to become overwhelmed, but ccl:all-processes reports the same number
> of threads on each iteration.
>
> ;;; raw-stack
>
> (defstruct raw-stack
>  (data nil))
>
> (defun push-raw-stack (value q)
>  (setf (raw-stack-data q) (cons value (raw-stack-data q))))
>
> (defun pop-raw-stack (q)
>  (if (raw-stack-data q)
>      (multiple-value-prog1 (values (car (raw-stack-data q)) t)
>        (setf (raw-stack-data q) (cdr (raw-stack-data q))))
>      (values nil nil)))
>
> ;;; stack
>
> (defstruct stack
>  (impl (make-raw-stack))
>  (lock (ccl:make-lock))
>  (sema (ccl:make-semaphore)))
>
> (defun push-stack (object stack)
>  (ccl:with-lock-grabbed ((stack-lock stack))
>    (push-raw-stack object (stack-impl stack))
>    (ccl:signal-semaphore (stack-sema stack))))
>
> (defun pop-stack (stack)
>  (ccl:wait-on-semaphore (stack-sema stack))
>  (ccl:with-lock-grabbed ((stack-lock stack))
>    (multiple-value-bind (value presentp)
>        (pop-raw-stack (stack-impl stack))
>      (assert presentp)
>      value)))
>
> ;;; run
>
> (defun test ()
>  (let ((tasks (make-stack)))
>    (loop
>       :repeat 2
>       :do (ccl:process-run-function
>            "test"
>            (lambda ()
>              (loop (funcall (or (pop-stack tasks)
>                                 (return)))))))
>    (let ((receiver (make-stack)))
>      (push-stack (lambda ()
>                    (push-stack (progn (sleep 0.2) 'later)
>                                receiver))
>                  tasks)
>      (push-stack (lambda ()
>                    (push-stack 'sooner receiver))
>                  tasks)
>      (let ((result (pop-stack receiver)))
>        (assert (eq 'sooner result)))
>      (let ((result (pop-stack receiver)))
>        (assert (eq 'later result))))
>    (push-stack nil tasks)
>    (push-stack nil tasks))
>  (format t "."))
>
> (defun run ()
>  (loop (test)))
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20120509/a133946b/attachment.htm>


More information about the Openmcl-devel mailing list