If anyone feels like testing a slightly different version of the patch ...

The current definition of CCL::%NANOSLEEP in ccl/level-1/l1-lisp-threads.lisp
looks like:

(defun %nanosleep (seconds nanoseconds)
   (with-process-whostate ("Sleep")
     (rlet ((a :timespec)
            (b :timespec))
       (setf (pref a :timespec.tv_sec) seconds
             (pref a :timespec.tv_nsec) nanoseconds)
       (let* ((aptr a)
              (bptr b))
           (let* ((result
                   (external-call #+darwin-target "_nanosleep"
                                  #-darwin-target "nanosleep"
                                  :address aptr
                                  :address bptr
             (declare (type (signed-byte 32) result))
             (if (and (< result 0)
                      (eql (%get-errno) (- #$EINTR)))
               ;; x86-64 Leopard bug.
               (let* ((asec (pref aptr :timespec.tv_sec))
                      (bsec (pref bptr :timespec.tv_sec)))
                 (if (and (>= bsec 0)
                          (or (< bsec asec)
                              (and (= bsec asec)
                                   (< (pref bptr :timespec.tv_nsec)
                                      (pref aptr :timespec.tv_nsec)))))
                   (psetq aptr bptr bptr aptr)

(It should look like that in all relevant recent versions of CCL; the code
hasn't changed in years.)  Erik suggested replacing the LET* which follows
the comment ";; x86-64 Leopard bug" with just the PSETQ (so that we do the
PSETQ and try to sleep a little longer, unconditionally); I'm curious about
whether it would also work if we did the sanity-checking a little more
rigorously, by replacing the:

                                   (< (pref bptr :timespec.tv_nsec)
                                      (pref aptr :timespec.tv_nsec)))))


                                   (<= (pref bptr :timespec.tv_nsec)
                                       (pref aptr :timespec.tv_nsec)))))

As things have stood, if the "seconds" and "nanoseconds" fields in both
"a" and "b" are exactly equal, we won't go back to sleep at all (and this
could conceivably happen if we get interrupted before nanosleep goes to

If that change fixes the problem that James reported, I'm marginaly more
comfortable with it than I am with removing the sanity checking at all,
simply because:

  - I don't know if the bug that the sanity-checking was intended to defend
    against is still present in some supported version of OSX
  - if it is, it's really nasty.  IIRC, it was present in pre-releases of
    10.5, I reported it to Apple (and I think that my bug report was marked
    as a duplicate), it wasn't fixed in the final 10.5, and ... that was
    5 years ago and I don't know what's happened since.


On Thu, 10 May 2012, James M. Lawrence wrote:

> On Thu, May 10, 2012 at 12:16 PM, Erik Pearson <erik at defunweb.com> wrote:
>> Hi James,
>> I'm sure Gary et al. will have a fix soon -- today if past performance is
>> any measure -- but for now try this. In your ccl directory (/opt/ccl/ccl in
>> my system, because I install my ccl from svn in /opt/ccl), in the level-1
>> directory, in the file l1-lisp-threads.lisp, ?hunt down and replace the
>> %nanosleep function with this:
>> #-windows-target
>> (defun %nanosleep (seconds nanoseconds)
>> ? (with-process-whostate ("Sleep")
>> ? ? (rlet ((a :timespec)
>> ? ? ? ? ? ?(b :timespec))
>> ?(setf (pref a :timespec.tv_sec) seconds
>> (pref a :timespec.tv_nsec) nanoseconds)
>> ?(let ((aptr a)
>> (bptr b))
>> ? ?(loop
>> ? ? ? (let ((result
>> ? ? ?(external-call #+darwin-target "_nanosleep"
>> ? ? #-darwin-target "nanosleep"
>> ? ? :address aptr
>> ? ? :address bptr
>> ? ? :signed-fullword)))
>> (declare (type (signed-byte 32) result))
>> (if (and (< result 0)
>> ?(eql (%get-errno) (- #$EINTR)))
>> ? ? (psetq aptr bptr bptr aptr)
>> ? ? (return))))))))
>> All I did was remove the OS X workaround code. I'm working with the
>> up-to-date trunk, v 1.9.
> That appears to have fixed it. I went back and forth between the old
> and new %nanosleep for good measure. Congrats to all.
> Using latest lx86cl in trunk with 2 second sleeps.
> With old %nanosleep:
> fail at 18 iterations
> fail at 32
> fail at 46
> fail at 11
> fail at 74
> With new %nanosleep:
> no fail after 166 iterations
> restart CCL
> no fail after 189
> restart CCL
> no fail after 221
> restart CCL
> no fail after 159
> restart CCL
> no fail after 653 and still running
