Hi Gary,<div><br></div><div>Note this passage in the definition of nanosleep:</div><div><br></div><div><span style="font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">If the interval specified in </span><i style="font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">req</i><span style="font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)"> is not an exact multiple of the granularity underlying clock (see </span><b style="font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)"><a href="http://linux.die.net/man/7/time" style="color:rgb(102,0,0)">time</a></b><span style="font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">(7)), then the interval will be rounded up to the next multiple.</span></div>
<div><a href="http://linux.die.net/man/2/nanosleep">http://linux.die.net/man/2/nanosleep</a></div><div><br></div><div>or from glibc docs:</div><div><br></div><div>The actual elapsed time of the sleep interval might be
longer since the system rounds the elapsed time you request up to the
next integer multiple of the actual resolution the system can deliver. </div><div><a href="http://www.gnu.org/software/libc/manual/html_node/Sleeping.html">http://www.gnu.org/software/libc/manual/html_node/Sleeping.html</a></div>
<div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">So here is the crux: The time remaining after an interrupted nanosleep may actually be greater than the requested time if the interrupt happens right after the timer starts, within the first increment of the resolution of the timer. (Printing debugging text from within the workaround code of %nanosleep proves that it this is indeed the cause of our problems.)</font></div>
<div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">With 100 captures of the remaining time being greater than requested time (of 0.01 sec, or 10,000,000 ns), the range was from 10,000,551 to 10,045,251ns (and for 750 captures of 0.001sec sleeps, from 1000272 to 1044971). So on my computer, and assuming that the timer is being set to requested plus at most one increment of the timer resolution, my timer is about 50,000ns resolution. So if the interrupt happens within 50,000ns of the timer being set, the workaround code will cause the timer to exit prematurely.</font></div>
<div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">So I'd vote for either conditionalizing the code for that version of OS X (where maybe having timers fail early is better than some other disaster, although I'm sure the workaround code can be tweaked to work better in that situation.) </font></div>
<div><div><br></div><div>Erik.<br><br><div class="gmail_quote">On Thu, May 10, 2012 at 12:28 PM, Gary Byers <span dir="ltr"><<a href="mailto:gb@clozure.com" target="_blank">gb@clozure.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
If anyone feels like testing a slightly different version of the patch ...<br>
<br>
The current definition of CCL::%NANOSLEEP in ccl/level-1/l1-lisp-threads.<u></u>lisp<br>
looks like:<div class="im"><br>
<br>
#-windows-target<br>
(defun %nanosleep (seconds nanoseconds)<br>
(with-process-whostate ("Sleep")<br>
(rlet ((a :timespec)<br>
(b :timespec))<br>
(setf (pref a :timespec.tv_sec) seconds<br>
(pref a :timespec.tv_nsec) nanoseconds)<br></div>
(let* ((aptr a)<br>
(bptr b))<br>
(loop<br>
(let* ((result<div class="im"><br>
(external-call #+darwin-target "_nanosleep"<br>
#-darwin-target "nanosleep"<br>
:address aptr<br>
:address bptr<br>
:signed-fullword)))<br>
(declare (type (signed-byte 32) result))<br>
(if (and (< result 0)<br>
(eql (%get-errno) (- #$EINTR)))<br></div>
;; x86-64 Leopard bug.<br>
(let* ((asec (pref aptr :timespec.tv_sec))<br>
(bsec (pref bptr :timespec.tv_sec)))<br>
(if (and (>= bsec 0)<br>
(or (< bsec asec)<br>
(and (= bsec asec)<br>
(< (pref bptr :timespec.tv_nsec)<br>
(pref aptr :timespec.tv_nsec)))))<div class="im"><br>
(psetq aptr bptr bptr aptr)<br>
(return)))<br></div>
(return))))))))<br>
<br>
(It should look like that in all relevant recent versions of CCL; the code<br>
hasn't changed in years.) Erik suggested replacing the LET* which follows<br>
the comment ";; x86-64 Leopard bug" with just the PSETQ (so that we do the<br>
PSETQ and try to sleep a little longer, unconditionally); I'm curious about<br>
whether it would also work if we did the sanity-checking a little more<br>
rigorously, by replacing the:<br>
<br>
(< (pref bptr :timespec.tv_nsec)<br>
(pref aptr :timespec.tv_nsec)))))<br>
<br>
with<br>
<br>
(<= (pref bptr :timespec.tv_nsec)<br>
(pref aptr :timespec.tv_nsec)))))<br>
<br>
As things have stood, if the "seconds" and "nanoseconds" fields in both<br>
"a" and "b" are exactly equal, we won't go back to sleep at all (and this<br>
could conceivably happen if we get interrupted before nanosleep goes to<br>
sleep.)<br>
<br>
If that change fixes the problem that James reported, I'm marginaly more<br>
comfortable with it than I am with removing the sanity checking at all,<br>
simply because:<br>
<br>
- I don't know if the bug that the sanity-checking was intended to defend<br>
against is still present in some supported version of OSX<br>
- if it is, it's really nasty. IIRC, it was present in pre-releases of<br>
10.5, I reported it to Apple (and I think that my bug report was marked<br>
as a duplicate), it wasn't fixed in the final 10.5, and ... that was<br>
5 years ago and I don't know what's happened since.<br>
<br>
Thanks.<div class="im"><br>
<br>
<br>
<br>
<br>
On Thu, 10 May 2012, James M. Lawrence wrote:<br>
<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
On Thu, May 10, 2012 at 12:16 PM, Erik Pearson <<a href="mailto:erik@defunweb.com" target="_blank">erik@defunweb.com</a>> wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
Hi James,<br>
<br>
I'm sure Gary et al. will have a fix soon -- today if past performance is<br>
any measure -- but for now try this. In your ccl directory (/opt/ccl/ccl in<br>
my system, because I install my ccl from svn in /opt/ccl), in the level-1<br></div>
directory, in the file l1-lisp-threads.lisp, ?hunt down and replace the<div class="im"><br>
%nanosleep function with this:<br>
<br>
#-windows-target<br>
(defun %nanosleep (seconds nanoseconds)<br></div>
? (with-process-whostate ("Sleep")<br>
? ? (rlet ((a :timespec)<br>
? ? ? ? ? ?(b :timespec))<br>
?(setf (pref a :timespec.tv_sec) seconds<br>
(pref a :timespec.tv_nsec) nanoseconds)<br>
?(let ((aptr a)<br>
(bptr b))<br>
? ?(loop<br>
? ? ? (let ((result<br>
? ? ?(external-call #+darwin-target "_nanosleep"<br>
? ? #-darwin-target "nanosleep"<br>
? ? :address aptr<br>
? ? :address bptr<br>
? ? :signed-fullword)))<div class="im"><br>
(declare (type (signed-byte 32) result))<br>
(if (and (< result 0)<br></div>
?(eql (%get-errno) (- #$EINTR)))<br>
? ? (psetq aptr bptr bptr aptr)<br>
? ? (return))))))))<div><div class="h5"><br>
<br>
All I did was remove the OS X workaround code. I'm working with the<br>
up-to-date trunk, v 1.9.<br>
</div></div></blockquote><div><div class="h5">
<br>
That appears to have fixed it. I went back and forth between the old<br>
and new %nanosleep for good measure. Congrats to all.<br>
<br>
Using latest lx86cl in trunk with 2 second sleeps.<br>
<br>
With old %nanosleep:<br>
<br>
fail at 18 iterations<br>
fail at 32<br>
fail at 46<br>
fail at 11<br>
fail at 74<br>
<br>
With new %nanosleep:<br>
<br>
no fail after 166 iterations<br>
restart CCL<br>
no fail after 189<br>
restart CCL<br>
no fail after 221<br>
restart CCL<br>
no fail after 159<br>
restart CCL<br>
no fail after 653 and still running<br>
______________________________<u></u>_________________<br>
Openmcl-devel mailing list<br>
<a href="mailto:Openmcl-devel@clozure.com" target="_blank">Openmcl-devel@clozure.com</a><br>
<a href="http://clozure.com/mailman/listinfo/openmcl-devel" target="_blank">http://clozure.com/mailman/<u></u>listinfo/openmcl-devel</a><br>
<br>
<br>
</div></div></blockquote><div class="HOEnZb"><div class="h5">
______________________________<u></u>_________________<br>
Openmcl-devel mailing list<br>
<a href="mailto:Openmcl-devel@clozure.com" target="_blank">Openmcl-devel@clozure.com</a><br>
<a href="http://clozure.com/mailman/listinfo/openmcl-devel" target="_blank">http://clozure.com/mailman/<u></u>listinfo/openmcl-devel</a><br>
</div></div></blockquote></div><br></div></div>