[Openmcl-devel] time travel

Sat May 16 20:03:51 PDT 2009

On Sat, 16 May 2009, Alexander Repenning wrote:

> thanks Gary,
>
> I think you are right that the reported time warps are just suspiciously 
> close to 2^32 nano seconds (~4.3 seconds) and the frequency of these time 
> drift events is also matching, i.e., in the 10 second loop the problem 
> appears 2-3 times on average.
>
> I did add the less than max safety and it does appear to do the trick but 
> just leaves me worried a bit that the "less save" code works better than the 
> save one.

There's a function named CL:CDR.  There isn't much to it: it checks
that it gets exactly one argument, checks that that argument is a
LIST, and edoes a simple memory reference and returns the result.  The
function #'CL:CDR exists; we can call it from the REPL, pass #'CL:CDR
as an argument to mapping functions, etc., but we hardly ever actually
call #'CL:CDR at runtime: the compiler can usually perform the
number-of-args check at compile-time, and generating the typecheck (in
safe code) and the memory reference inline is a lot faster (and no
less safe in most cases) than actually calling #'CL:CDR would be, and
the only real costs of open-coding calls to CDR are:

   1) you can't use things like TRACE to observe calls to CDR, since no
      actual function call occurs.
   2) the compiler may interpret whatever combination of TYPE and OPTIMIZE
      declarations are in effect as a request to skip the typecheck.

Code compiled with an (OPTIMIZE (SAFETY 3) ...) declaration in effect
has to be as safe as the implementation can make it.  Rather than
looking at every primitive operation that the compiler knows about and
ensuring that potentially unsafe optimizations were inhibited when
SAFETY 3 is in effect, CCL basically tries to avoid open-coding calls
to any functions (including primitive things like CDR and AREF and +
...).  That's generally correct, but it imposes a huge performance hit
in some cases, in exchange for what may be little or no actual gain in
safety.  (Rather than ensuring that all calls to CDR were maximally
safe by compiling a function call to #'CDR, the compiler could
continue to open-code CDR but simply refuse to omit the type-check
when SAFETY 3 was in effect.)

Suppose (just to strain credibility a little) that rather than being
"rarely actually called", #'CDR was "almost never actually called", and
that because of a cut-and-paste mishap it returned the CAR of its argument
rather then the CDR.  That would mean that calls to CDR from code generated
with SAFETY 3 (the way that CCL currently implements SAFETY 3) in effect
return the wrong answer; that'd expose a bug in CDR, but that'd be ...
a bug in CDR, not a safety issue (assuming that the buggy CDR function
was "safe" - checked that it received one arg of type LIST - but returned
the wrong answer.)

CCL::%FF-CALL is a function (used to do foreign function calls); it
takes a strange argument list (consisting of alternating keywords that
denote foreign argument types and values to be used for the
coresponding argument, an optional keyword denoting the foreign type
of the result, and maybe some nightmarish platform-specific gunk that
describes how to deal with ABI arcana. The function has to interpret
these arguments, reserve space for the argument values (somewhere
where the foreign code can find them), store the unboxed equivalent of
each argument in the right place in that reserved space (according to
ABI details), arrange to call the foreign code, and return a lisp
representation of the foreign result.  That's all incredibly
complicated, and it has to be done each time that CCL::%FF-CALL is
actually called.

Fortunately, most uses of CCL::%FF-CALL are the results of
macroexpanding whatever the #_ reader macro generates (or of
macroexpanding CCL:EXTERNAL-CALL or CCL:FF-CALL) and the keywords
that describe the foreign arg and result types are usually constants,
which means that the compiler can open-code calls to CCL::%FF-CALL,
knowing how much space will be required for the argument values, knowing
the foreign type of each argument and therefore how it's to be unboxed
and passed to the foreign code, knowing the foreign type of the result,
and knowing how to deal with any platform-specific arcana.  It's still
very complicated, but most of the complexity happens at compile-time,
and the compiler mostly just generates code to reserve N bytes of foreign
argument space, store M values into precomputed locations in that reserved
space, etc.  That code's pretty heavily exercised, and any outright bugs
in it are probably pretty obscure.

There still needs to be a #'CCL::%FF-CALL function (for the same reason
that there needs to be a #'CL:CDR).  (For a long time in MCL and possibly
for some time in CCL, the implementation of #'CCL::%FF-CALL called the
compiler at runtime and then FUNCALLed the result.)  These days in CCL,
#'CCL::%FF-CALL is implemented as a hairy, platform-specific mixture
of low-level lisp and LAP code.  It generally works, but it's not exercised
nearly as much as the open-coded case is.  (Someone found a bug in the PPC32
version a week or so ago; that bug's probably been there for 10 years or so.)

It's relatively rare for foreign functions on 32-bit platfroms to return
64-bit integer values (and when they do so, they do so somewhat awkwardly,
in two 32-bit pieces.)  The x8632 version of #'CCL::%FF-CALL didn't handle
this correctly (Matt checked in a fix a few hours ago), so calls to
#_mach_absolute_time were returning an incorrect value.

So, it's possible for code compiled with SAFETY 3 to encounter bugs
that aren't encountered at other SAFETY levels, and the same may be
true of other optimization settings..  The wrong value was returned
not because #'CCL::%FF-CALL was "unsafe", but because it didn't handle
64-bit return values correctly.