[Openmcl-devel] Effects of optimization settings

Gary Byers gb at clozure.com
Mon Apr 6 21:51:12 PDT 2009



On Mon, 6 Apr 2009, Osei Poku wrote:

> Hello,
>
> Are there any guidelines on the effects that the various optimization
> settings have on the compiler?

<http://trac.clozure.com/openmcl/wiki/DeclareOptimize> is linked to from
the front page.

> After playing with them for a while,
> I'm not convinced that some settings have any effect (esp. (speed
> 3)).  The output of disassemble looks exactly the same for the
> following functions.
>
> (defun mathop (i j)
>    (1+ (ash i j)))
>
> and..
>
> (defun mathop-fast (i j)
>    (declare (optimize (speed 3) (safety 0) (debug 0)) (fixnum i j))
>    (1+ (ash i j)))

No matter how important speed is, about the only way that the compiler
could produce different code in this case (where we have no idea
whether (ASH I J) fits int a fixnum) is to open-code the case where it
doesn't.  (For example, when I is 1 and J is 100.)  That -might-
conceivable be worth doing, but the first thing that ASH would need to
do in your example is to determine whether J is positive or negative
(whether it's a left or a right-shift) and, if it's a left shift,
whether the result fits in a fixnum or not.  The code in question
(that doesn't even deal with the cases where I and J are not both
fixnums) would be roughly equivalent to what you see at the label
"builtin_ash" in lisp-kernel/x86-spentry64.s, and even the simpler case
where J is negative (and the code that determines that to be true) -
down to the local label 2 in that code - is 10 or so intructions (and
you'd still have to inline the left-shift case or go out-of-line to
handle it.)  I would guess that the cost of the call/return to do this
out-of-line is probably pretty small, and that there isn't likely
a whole lot of bang-for-the-buck to be had there.

Something like:

(defun foo (i j)
   (declare (fixnum i) (type (integer -16 -1) j))
   (ash i j))

might be worth paying attention to (though I'm not sure how often code
like that appears in the wild); it at least provides a hint that the
shift will be a right shift and therefore that the result will fit
in a fixnum.

(defun foo (i j)
   (declare (fixnum i j))
   (the fixnum (ash i j)))

might also be worth paying attention to: we have to determine at runtime
whether it's a left or right shift (or a noop if J is 0), but we don't
have to worry about a left shift overflowing.

Certainly the case where J is a small constant is worth paying attention
to (and is paid attention to ...) and that case does appear in the wild
and reasonable people would reasonably expect it to be open-coded and
a lot less ... generic than ASH.  (Your example is basically saying
"make the general case of ASH faster"; good luck with that.)

So, if we're all in (violent) agreement that looking at the code
produced for a generic call to ASH doesn't tell us much about the
effect of OPTIMIZE declarations, let's look at something where those
declarations (speed/space/safety tradeoffs) have more of an effect:

(defun foo (v i)
   (svref v i))

At "normal" safety:

- we want to ensure that FOO receives exactly two arguments when called.
   (Failing to do this check can cause horrible stack-discipline errors if
   FOO was ever called with more arguments.)
- since SVREF's first argument must be a SIMPLE-VECTOR, the first argument
   to FOO must be a SIMPLE-VECTOR (or the results would be undefined and
   possibly unpleasant.)
- SVREF's second argument must be a non-negative FIXNUM less than the length
   of the SIMPLE-VECTOR V.
- Once we've gone through all of this, the actual operation is just a simple
   load from memory.

If we disassemble FOO (compiled at default safety settings and under the
default policy), the code looks like:

L0
   [0]     (leaq (@ (:^ L0) (% rip)) (% fn))
   [7]     (cmpl ($ 16) (% nargs))  ; exactly 2 args ?
   [10]    (jne L81)                ; go trap if not
   [12]    (pushq (% rbp))          ; build stack frame ...
   [13]    (movq (% rsp) (% rbp))
   [16]    (pushq (% arg_y))
   [17]    (pushq (% arg_z))
   [18]    (movq (@ -8 (% rbp)) (% arg_y)) ; recover V to arg_y register
   [22]    (movq (@ -16 (% rbp)) (% arg_z)) ; and I to arg_z register
   [26]    (movl (% arg_y.l) (% imm0.l)) ; is V
   [28]    (andl ($ 7) (% imm0.l))       ; some kind of
   [31]    (cmpl ($ 5) (% imm0.l))       ; vector-like object ?
   [34]    (jne L40)
   [36]    (movsbl (@ -13 (% arg_y)) (% imm0.l)) ; if so, is it a SIMPLE-VECTOR ?
L40
   [40]    (cmpl ($ -74) (% imm0.l))
   [43]    (jne L89)                     ; go trap if not
   [45]    (testb ($ 7) (% arg_z.b))     ; Is I a fixnum ?
   [49]    (jne L97)                     ; trap if not
   [51]    (movq (@ -13 (% arg_y)) (% imm0)) ; V is a SIMPLE-VECTOR; get its
   [55]    (shrq ($ 8) (% imm0))         ; length
   [59]    (shlq ($ 3) (% imm0))
   [63]    (cmpq (% imm0) (% arg_z))     ; is I > that length ?
   [66]    (jae L105)                    ; trap if so
   [68]    (movq (@ -5 (% arg_y) (% arg_z)) (% arg_z)) ; a simple memory reference
   [73]    (leaveq)                      ; discard stack frame
   [74]    (retq)                        ; return to caller
L81
   [81]    (uuo-error-wrong-number-of-args)
L89
   [89]    (uuo-error-reg-not-tag (% arg_y) ($ 182))
L97
   [97]    (uuo-error-reg-not-fixnum (% arg_z))
L105
   [105]   (uuo-error-vector-bounds (% arg_z) (% arg_y))


None of those checks to enforce safety are hugely expensive, but they're
each about as expensive as the simple memory reference (so we're spending
"much" more time enforcing safety than we are referencing memory, though
each of these operations is likely just a few machine cycles.)  If we
compiled the same code with disregard for SAFETY:

? (defun foo (v i)
   (declare (optimize (speed 3) (safety 0)))
   (svref v i))
FOO
? (disassemble *)
L0
   [0]     (leaq (@ (:^ L0) (% rip)) (% fn))
   [7]     (pushq (% rbp))
   [8]     (movq (% rsp) (% rbp))
   [11]    (pushq (% arg_y))
   [12]    (pushq (% arg_z))
   [13]    (movq (@ -8 (% rbp)) (% arg_y))
   [17]    (movq (@ -16 (% rbp)) (% arg_z))
   [21]    (movq (@ -5 (% arg_y) (% arg_z)) (% arg_z))
   [26]    (leaveq)
   [27]    (retq)

well. I'd like to see some other stuff optimized out (no need to build
a stack frame for a leaf function, no need to save args on stack ...)
but the body of the function is just a memory reference (at [21]), and
the code's about 4X smaller.  Millions of calls to the unsafe version
will probably run measurably faster than the same number of calls to
the "normal" version, but the normal version will signal errors if
called with the wrong number or types of arguments, and the unsafe
version may crash and burn in that case.



>
> Interestingly, (debug 3) in the above function gives a shorter
> assembly listing.

For extra credit, try to understand why ...

>
> Thanks,
>
> Osei
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list