[Openmcl-devel] Semaphore troubles

Wed May 9 21:04:05 PDT 2012

Try this code.

Had to borrow code for queue.  (My is comment is *damn* stack and *damn* dequeueing threads)

try (test 50000)

;;; -*- Mode: Lisp; Syntax: Common-Lisp; -*- File: utilities/queue.lisp

;;;; The Queue datatype

;;; We can remove elements form the front of a queue.  We can add elements in
;;; three ways: to the front, to the back, or ordered by some numeric score.
;;; This is done with the following enqueing functions, which make use of the
;;; following implementations of the elements:
;;;   ENQUEUE-AT-FRONT - elements are a list
;;;   ENQUEUE-AT-END   - elements are a list, with a pointer to end
;;;   ENQUEUE-BY-PRIORITY - elements are a heap, implemented as an array
;;; The best element in the queue is always in position 0.

;;; The heap implementation is taken from "Introduction to Algorithms" by
;;; Cormen, Lieserson & Rivest [CL&R], Chapter 7.  We could certainly speed
;;; up the constant factors of this implementation.  It is meant to be clear
;;; and simple and O(log n), but not super efficient.  Consider a Fibonacci
;;; heap [Page 420 CL&R] if you really have large queues to deal with.

(defstruct q
  (key #'identity)
  (last nil)
  (elements nil))

;;;; Basic Operations on Queues

(defun make-empty-queue () (make-q))

(defun empty-queue? (q)
  "Are there no elements in the queue?"
  (= (length (q-elements q)) 0))

(defun queue-front (q)
  "Return the element at the front of the queue."
  (elt (q-elements q) 0))

(defun remove-front (q)
  "Remove the element from the front of the queue and return it."
  (pop (q-elements q)))

;;;; The Three Enqueing Functions

(defun enqueue-at-front (q items)
  "Add a list of items to the front of the queue."
  (setf (q-elements q) (nconc items (q-elements q))))

(defun enqueue-at-end (q items)
  "Add a list of items to the end of the queue."
  ;; To make this more efficient, keep a pointer to the last cons in the queue
  (cond ((null items) nil)
	((or (null (q-last q)) (null (q-elements q)))
	 (setf (q-last q) (last items)
	       (q-elements q) (nconc (q-elements q) items)))
	(t (setf (cdr (q-last q)) items
		 (q-last q) (last items)))))

(defstruct queue
 (impl (make-empty-queue))
 (lock (ccl:make-lock))
 (sema (ccl:make-semaphore)))

(defun queue (object queue)
 (ccl:with-lock-grabbed ((queue-lock queue))
   (enqueue-at-end (queue-impl queue) (list object)))
 (ccl:signal-semaphore (queue-sema queue)))

(defun dequeue (queue)
  (ccl:wait-on-semaphore (queue-sema queue))
  (ccl:with-lock-grabbed ((queue-lock queue))
    (if (empty-queue? (queue-impl queue))
	(values nil nil)
	(values (remove-front (queue-impl queue)) t))))

(defun test (n)
  (let ((tasks (make-queue))
	(receiver (make-queue))
	(lock (ccl:make-lock)))
    (loop repeat 2 do
	 (ccl:process-run-function
	  "test"
	  (lambda ()
	    (loop 
	       (ccl:with-lock-grabbed (lock)
		 (queue (funcall (or (dequeue tasks)
				     (return)))
			receiver))))))
    (loop repeat n do
	 (queue (lambda () 'sooner) tasks)
	 (queue (lambda () 'later) tasks)
	 (let ((result (dequeue receiver)))
	   (assert (eq 'sooner result)))
	 (let ((result (dequeue receiver)))
	   (assert (eq 'later result)))
	 (format t "."))
    (queue nil tasks)))

On 2012-05-09, at 7:46 PM, James M. Lawrence wrote:

> I appreciate the time you've taken to respond.
> 
> When I wrote "0.5 seconds or whatever", I meant that it fails for
> apparently any amount of time. As I mentioned in response to another
> reply, a 10 second sleep produces the failure as well. Is that
> consonant with your explanation?
> 
> It is also unclear why
> 
> * it does not fail when the loop is removed (along with the push nils)
> 
> * it does not fail when (format t ".") is removed
> 
> Perhaps these are just curiosities due to entropy in the underlying
> system calls.
> 
> The upshot of what you're saying is that Clozure cannot reliably
> distribute work across threads, while other CL implementations can. I
> would not call it a bug, but it's at least unfortunate. In fact
> Clozure scales better than SBCL for parallel mapping and other
> functions (stats available upon request), barring these peculiar
> hiccups.
> 
> 
> On Wed, May 9, 2012 at 8:44 PM, Gary Byers <gb at clozure.com> wrote:
>> 
>> 
>> On Wed, 9 May 2012, James M. Lawrence wrote:
>> 
>>> I thought my example was straightforward enough, though as I mentioned
>>> I wish it were smaller. Following your suggestion, I have replaced the
>>> queue with a stack. I have also taken out the condition-wait function
>>> copied from bordeaux-threads. My pop function now resembles your
>>> consume function.
>>> 
>>> The same assertion failure occurs.
>>> 
>>> I am unable to reproduce it with high debug settings, or with tracing,
>>> or with logging.
>> 
>> 
>>> 
>>> The test consists of a pair of worker threads pulling from a task
>>> queue. We push two tasks: one task returns immediately, the other task
>>> sleeps for 0.2 seconds (it can be 0.5 seconds or whatever, it just
>>> takes longer to fail). Since we have two workers, we should always
>>> obtain the result of the sleeping task second. A signal is getting
>>> missed, or something.
>> 
>> 
>> You're assuming that whatever thread pulls the lambda that returns
>> 'SOONER will off of TASKS will push 'SOONER onto RECEIVER before
>> another thread pulls another lambda that sleeps for .2 seconds before
>> returning 'LATER pushes 'LATER on RECEIVER.  That assumption is likely
>> to hold a high percentage of the time, but I can't think of anything
>> that guarantees it. (The OS scheduler may have decided that it should
>> let Emacs re-fontify some buffers for a while, or let the kernel
>> process all of those network packets that've been gumming up the
>> works, and when it gets back to CCL it finds that it's time for the
>> sleeping thread to wake up and it gets scheduled and pushes LATER
>> on RECEIVER before the other thread even wakes up.  This kind of scenario
>> isn't as likely as one where 'SOONER is pushed first, but
>> it's not wildly improbable, either.  It's "likely" that 'SOONER will
>> be pushed first - maybe even "highly likely".  It's more likely (more
>> highly likely ?) if the sleeping thread sleeps longer, but non-realtime
>> OSes (like most flavors of Linux, like OSX, like ...) don't make the
>> scheduling guarantees that you seem to be assuming.
>> 
>> While you're thinking "this thread should run before the other one because
>> it's ready to run and the other one is sleeping", the scheduler's thinking
>> "that CPU has been really active lately; better shut it down for a little
>> while so that it doesn't get too hot or consume too much power", or
>> something
>> equally obscure and unintuitive.  If you change compiler options, or
>> do printing or logging (or otherwise change how threads use the CPU cycles
>> they're given), your code looks different to the scheduler and behaves
>> differently (in subtle and not-always-predictable ways.)
>> 
>> Of all the thread-related bugs that've ever existed in CCL, the most
>> common cause has probably been "code wasn't prepared to deal with
>> concurrency"; a close second is probably "code is making unwarranted
>> assumptions about scheduler behavior."  After many years of getting beaten
>> by those things, I think and hope that I'm more inclined to question some
>> assumptions that I used to make automatically and implicitly, and my first
>> reaction is to question the assumption that you're making.  It's more likely
>> that the thread that doesn't sleep will push 'SOONER before the thread that
>> sleeps pushes 'LATER, but nothing guarantees this, lots of factors affect
>> what happens, and all that I can see is that things that're statistically
>> unlikely happen occasionally.
>> 
>> Scheduling behavior is likely beyond the grasp of mere mortals; we can have
>> a reasonable, largely accurate model of how things will behave, but we have
>> to bear in mind that that's all we have.
>> 
>> Semaphores in CCL are very thin wrappers around whatever the OS provides
>> (POSIX
>> semaphores, Mach semaphores, something-or-other on Windows.)  If you say "a
>> [semaphore] must be getting dropped", you're either saying that there's a
>> problem
>> in that very thin wrapper or that we're all doomed (because what the OS
>> provides
>> doesn't work), and you're also saying that your code demonstrates this
>> problem
>> and no one else's notices.  Some or all of those things could be true, but
>> you're
>> claiming that they must be because you think that you know which thread will
>> run before which other thread.  You don't know that; all you really know is
>> that's
>> probably true.
>> 
>> 
>>  (defun test ()
>>> 
>>>  (let ((tasks (make-stack)))
>>>   (loop
>>>      :repeat 2
>>>      :do (ccl:process-run-function
>>>           "test"
>>>           (lambda ()
>>>             (loop (funcall (or (pop-stack tasks)
>>>                                (return)))))))
>>>   (let ((receiver (make-stack)))
>>>     (push-stack (lambda ()
>>>                   (push-stack (progn (sleep 0.2) 'later)
>>>                               receiver))
>>>                 tasks)
>>>     (push-stack (lambda ()
>>>                   (push-stack 'sooner receiver))
>>>                 tasks)
>>>     (let ((result (pop-stack receiver)))
>>>       (assert (eq 'sooner result)))
>>>     (let ((result (pop-stack receiver)))
>>>       (assert (eq 'later result))))
>>>   (push-stack nil tasks)
>>>   (push-stack nil tasks))
>>>  (format t "."))
>>> 
>>> (defun run ()
>>>  (loop (test)))
>>> _______________________________________________
>>> Openmcl-devel mailing list
>>> Openmcl-devel at clozure.com
>>> http://clozure.com/mailman/listinfo/openmcl-devel
>>> 
>>> 
>> 
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel