[Openmcl-devel] high res timer for CCL

Tue Dec 16 12:43:46 PST 2008

thanks Gary. Great feedback and code. I used the code to make a  
Hemlock interface integrated timer:

http://www.agentsheets.com/lisp/Clozure_CL/ns%20Timer.lisp

One curious thing you will find in the code is that the timer function  
is called 4 times. This is because the it takes quite a bit of time to  
call the timer itself  (~3us on my intel mac) which needs to be  
compensated.  Further, calling the timer function, or any foreign  
function, appears to take more time the first time around. You can run  
this test:

(without-interrupts
(let (t1 t2 t3 t4)
  (setq t1 (#_mach_absolute_time))
  (setq t2 (#_mach_absolute_time))
  (setq t3 (#_mach_absolute_time))
  (setq t4 (#_mach_absolute_time))
  (format t "~%t2-t1=~A, t3-t2=~A, t4-t3=~A" (- t2 t1) (- t3 t2) (- t4  
t3))))

I am getting t2-t1 to be significantly higher on average than t3 - t2  
or t4 - t3

by inserting the code to be measured between t3 and t4 one can  
estimate the time is takes to run the code to be (t4 - t3) - (t3 - t2)  
and ignoring t1

Just wondering: why would it take more time the first time around? Is  
there some kind of FFI call caching going on?

Alex

On Dec 9, 2008, at 7:05 PM, Gary Byers wrote:

> Carbon functions for measuring wall-clock time are deprecated (and  
> in some
> cases they seem to be missing on x8664 Leopard.)
>
> Supposedly, the most accurate and lowest-overhead means of measuring
> elapsed time on MacOS is #_mach_absolute_time, which returns an  
> unsigned
> 64-bit integer representing real time in "Mach Time Units".  Exactly
> how many Mach Time Units there are in a second is CPU-dependent; on
> a few Core 2 Duo systems that I tried this on, there seem to be  
> exactly
> 1000000000 of them (e.g., a Mach Time Unit is a nanosecond.)  On an
> iMac G5, a Mach Time Unit appears to be around 300 nanoseconds.
>
> We can determine the ratio of Mach Time Units to nanoseconds by
> calling #_mach_timebase_info, letting it fill in the fields of a
> "mach_timebase_info" structure.  (Those fields are a 32-bit unsigned
> numerator and a 32-bit unsigned denominator.)  We'd presumably want to
> cache this info and only compute it the first time we need it in a
> lisp session.
>
> (defvar *mach-time-unit-ratio*
> (rlet ((info #>mach_timebase_info))
>  (#_mach_timebase_info info)
>  (/ (pref info #>mach_timebase_info.numer)
>     (pref info #>mach_timebase_info.denom))))
>
> (defun measure-elapsed-time (thunk)
> (let* ((start (#_mach_absolute_time)))
>  (funcall thunk)
>  (let* ((elapsed-mach-time-units (- (#_mach_absolute_time) start)))
>    (values (round (* elapsed-mach-time-units *mach-time-unit- 
> ratio*))))))
>
> ? (measure-elapsed-time (lambda () (sleep 1)))
> 1000051891
>
> which is probably pretty accurate (e.g, we sleep for 1000000000
> nanoseconds, and it's another 50 or more microseconds before we
> actually get scheduled again.)  You might see more lor less latency
> depending on what else was runnable and how many CPU cores there are
> to run it on;  (It's really neat that this has a resolution of a
> nanosecond, but it can be very difficult to consistently get results
> whose accuracy is particularly close to their precision.)
>
> On x86, #_mach_absolute_time seems to be implemented in terms of the
> rdtsc instruction, which returns the value of an on-chip counter that
> increments on every clock cycle (e.g, somewhere around twice a  
> nanosecond
> on a modern x86 CPU).  Each CPU core has its own time-stamp counter,
> OSX seems to do a good job of getting keeping and keeping them in
> synch, but other OSes may not.  On many consumer machines, the CPU
> frequency can be scaled down (to conserve poweer/generate less heat)
> when system software decides (by some measure) that the system is
> "idle" and scaled back up after it notices that it has gotten "busy",
> so it can be hard to map from TSC values (cycles) to real time units
> unless the mapping's done by code that's aware of these issues.
>
> The old, deprecated Carbon functions (those that still exist) are
> implemented in terms of #_mach_absolute_time.  Many of these
> functions conceptually "return structures", which in C means that
> the caller and callee conspire somehow to side-effect an instance
> of the structure type in the caller.  In C this is written to look
> like a regular assignment:
>
>   AbsoluteTime start;
>
>   start = UpTime();
>
> but what goes on behind-the-scenes to arrange that 'start' is
> side-effected by the call depends on the platform (and sometimes
> on the size and type of the structure and its fields.)
>
> It's supposed to be the case that structure assignment/return is
> handled in CCL's FFI by passing a pointer to the structure as
> a first argument in the call; that -should- cause the appropriate
> behind-the-scenes activity to take place to maintain the illusion
> of (and have the effect of) assignment:
>
> (rlet ((start #>AbsoluteTime))
>  (_#UpTime start)
>  ...)
>
> Back in the day (on the 32-bit PPC, mostly), that was literally the
> extent of the behind-the-scenes activity: the caller would pass an
> invisible first arg - a pointer to a structure - to the callee, and
> the callee would fill in that structure's fields.  The interface  
> translator would silently translate "function returning structure" to
> "function that accepted structure pointer as extra first arg and  
> returned
> :void".
>
> Technically, I don't think that that was always correct for PPC32
> (structures that contained a single 32-bit word may have been handled
> differently, though the details are a bit blurry ...)  On PPC64 and
> x86-64 platforms (and on Dawin on x86-32), the conventions are more
> complicated: sometimes, the behind-the-scenes activity involved
> returning a structure's fields or some composite words in some  
> combination of general-purpose and floating-point registers.  The
> FFI translator was changed to make structure return explicit, and
> the FFI itself was changed to use platform-specific translation
> rules for some parts of foreign-function call forms and callback
> definitions.  We want to hide all of that gunk, and use syntax
> like that in the #_UpTime call above regardless of what happens
> behand-the-scenes/under-the-hood, but the FFI can only arange for
> that stuff to happen if we know that the function being called
> "returns a structure".
>
> Unfortunately, that change to the FFI translator was controlled by
> a DEFVAR, and at some point the initial value of that DEFVAR was
> changed to revert back to the old PPC32 behavior.  As of a couple
> of days ago, most of the interfaces distributed with CCL on all of
> the platforms it runs on were missing infomation about explicit
> structure return.  (This information was present on ObjC method
> definitions, and structure-return conventions are most heavily
> used on Darwin and there mostly in Carbon/Cocoa/CoreFoundation.)
>
> I've been updating the interfaces to ensure that information about
> structure returns is present when it should be over the last few
> days.  If you update from svn, you should get a set of interfaces
> where code like the above would work (if you have the right libraries
> loaded - as you would in the IDE - and are using the right set of
> interfaces, which might depend on how the IDE was built.  (For reasons
> that seemed to make sense a year or so ago, the IDE is sometimes built
> with a copy of the interfaces built into the .app bundle; that may
> make them easier to find, but may also make them harder to update via
> svn.)
>
> If the last few paragraphs don't make sense, a shorter version is
> that you can get the code fragment above to do the right thing
> by avoiding #_:
>
> (rlet ((start #>AbsoluteTime))
>  (external-call "UpTime" start #>AbsoluteTime)
>  (let* ((mach-absolute-time (#_mach_absolute_time)))
>    (format t "~&   uptime is ~d,~&mach time is ~d"
>       (dpb (pref start #>AbsoluteTime.hi)
>            (byte 32 32)
>            (pref start #>AbsoluteTime.lo))
>       mach-absolute-time)))
>
> The values printed by this code snippet will differ by a few ms (if
> we could call UpTime and mach_asolute_time at exactly the same time,
> we'd get exactly the same answer.  I'd personally prefer to just
> use the underlying #_mach_absolute_time directly and forget about
> the deprecated Carbon stuff, but until it totally disappears it's
> still possible to party like it's 1999.
>
>
>
>
>
> On Mon, 8 Dec 2008, Alexander Repenning wrote:
>
>> This new version of the high resolution timer makes it simple to
>> measure the time spent in s-expressions down to the microsecond
>> level.  This version is integrated into CCL editor. Select an
>> expression and get the time it takes  to run. Quite precise actually.
>>
>> http://www.agentsheets.com/lisp/Clozure_CL/us%20Timer.lisp
>>
>> The only shame is that there would be an even better time function,
>> good old upTime, with nanosecond resolution. Here is some code:
>>
>> #include <CoreServices/CoreServices.h>
>>
>> int main(){
>>  AbsoluteTime start = UpTime();
>>  //Do stuff
>>  AbsoluteTime end = UpTime();
>>
>>  Nanoseconds diffNS = AbsoluteDeltaToNanoseconds(end,start);
>>  long nanoseconds = UnsignedWideToUInt64(diffNS);
>>
>>
>> How would one translate this into CCL  code?
>>
>> Alex
>>
>>
>>
>>
>>
>> Prof. Alexander Repenning
>>
>> University of Colorado
>> Computer Science Department
>> Boulder, CO 80309-430
>>
>> vCard: http://www.cs.colorado.edu/~ralex/AlexanderRepenning.vcf
>>
>>
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>>
>>
>

Prof. Alexander Repenning

University of Colorado
Computer Science Department
Boulder, CO 80309-430

vCard: http://www.cs.colorado.edu/~ralex/AlexanderRepenning.vcf