[Openmcl-devel] To sqrt or to expt, that is the question

Mon Dec 22 20:08:34 PST 2003

On Mon, 22 Dec 2003, Gary King wrote:

> I was looking at some code where someone had used expt to compute the
> square root of a number. "Hmm, I thought, it all depends on the
> compiler and the implementation, but I would have thought that (sqrt x)
> would be faster than (expt x 0.5)." So I tried the following,
> admittedly unscientific, test:
>
> (time
>   (let ((x 0d0))
>     (loop repeat 100000 do
>           (setf x (expt 101 0.5)))
>     x))
>
> (time
>   (let ((x 0d0))
>     (loop repeat 100000 do
>           (setf x (sqrt 101)))
>     x))
>
> To my surprise in both MCL and OpenMCL, not only was expt faster, but
> sqrt conses. Note that I tried the same test in SBCL and found that
> there was no significant difference between expt and sqrt. This
> probably isn't a big deal, but it might be worth optimizing sqrt in MCL
> at some point...
> --
> Gary Warren King, Lab Manager
> EKSL East, University of Massachusetts * 413 577 0176

Boy, this optimization stuff is easy!  I don't know why everyone says
otherwise:

? (progn (ccl::can-constant-fold '(sqrt)) nil)
NIL

For those who aren't amused by sarcasm: no, I don't remember why
SQRT isn't known to be constant-foldablle by default; doing so would
clearly make programs that call SQRT on constants run faster and (with
a straight face) I'd agree that there might really be some such programs
(or macros might expand into code like this), and there's no good reason
-not- to constant-fold SQRT at compile-time. (I would guess that the
decision to not constant-fold SQRT calls was made - if it was made
consciously at all - at a time when cons cells were seen as being
in short supply.)

Also with a straight face: there -are- cases where CMUCL/SBCL generate
better numeric code than MCL/OpenMCL.  Whether it's any better in the
case of SQRT or not is hard to say: that may depend on what's known
about the type and sign of SQRT's argument and may also depend on
whether (floating-point) SQRT's implmented in hardware on a given
platform, whether library routines for all floating point types are
available, how much overhead's involved in a foreign function call
and why, etc.

Unlike other PPC variants. the G5 does implement the "optional" FSQRT
instruction, which suggests that there might be some cases where the
compiler could do something better than a function call for SQET.  Even
if this were implemented, I think that your example clearly shows that
it's much better to do SQRT at compile tome than at run time, and I
hope that you'll forgive me for not keeping a straight face for three
whole paragraphs.

(spoken as someone who's raised the alarm prematurely from time to time
myself.)

Gary Byers
gb at clozure.com