[Openmcl-devel] Optimizing a stupid benchmark

Thu May 4 16:28:06 PDT 2017

On Thu, 4 May 2017, Jonathan Fischer wrote:

> This is a stupid thing to test, I was just curious about it, and I’m wondering what obvious thing it is I’m doing wrong to get really bad results.
>
> For starters, a silly little function to sum up an arithmetic series:
>
> (defun sum-test-iterative ()
>  (let ((sum 0))
>    (dotimes (i 2147483647)
>      (setf sum (+ sum i)))
>    sum))
>
> In ClozureCL 1.11, this is horribly slow and conses like crazy:
>
> ? (time (sum-test-iterative))
> (SUM-TEST-ITERATIVE)
> took 62,875,836 microseconds (62.875835 seconds) to run.
>      2,818,179 microseconds ( 2.818179 seconds, 4.48%) of which was spent in GC.
> During that period, and with 4 available CPU cores,
>     55,949,019 microseconds (55.949020 seconds) were spent in user mode
>      7,153,305 microseconds ( 7.153305 seconds) were spent in system mode
> 20,127,468,688 bytes of memory allocated.
> 10,397 minor page faults, 8 major page faults, 0 swaps.
> 2305843005992468481
>
> SBCL 1.3.17 does much better:
>
> * (time (sum-test-iterative))
>
> Evaluation took:
>  7.741 seconds of real time
>  7.677832 seconds of total run time (7.594504 user, 0.083328 system)
>  99.19% CPU
>  18,536,585,073 processor cycles
>  0 bytes consed
>
> 2305843005992468481
>
> If I sprinkle in some declarations I can get SBCL down to a bit over 3 seconds, but ClozureCL’s still pretty bad:
>
> (defun sum-test-iterative ()
>  (declare (optimize speed (safety 0)))
>  (let ((sum 0))
>    (declare ((signed-byte 64) sum))
>    (dotimes (i 2147483647)
>      (setf sum (the (signed-byte 64) (+ sum i))))
>    sum))
>
> ? (time (sum-test-iterative))
> (SUM-TEST-ITERATIVE)
> took 48,308,393 microseconds (48.308390 seconds) to run.
>      2,875,113 microseconds ( 2.875113 seconds, 5.95%) of which was spent in GC.
> During that period, and with 4 available CPU cores,
>     42,849,312 microseconds (42.849310 seconds) were spent in user mode
>      6,040,092 microseconds ( 6.040092 seconds) were spent in system mode
> 20,127,468,707 bytes of memory allocated.
> 14,039 minor page faults, 1 major page faults, 0 swaps.
> 2305843005992468481
>
> * (time (sum-test-iterative))
>
> Evaluation took:
>  3.267 seconds of real time
>  3.248668 seconds of total run time (3.228591 user, 0.020077 system)
>  99.45% CPU
>  7,821,424,512 processor cycles
>  0 bytes consed
>
> 2305843005992468481
>
> How can I help ClozureCL out here? Both of these are running 64-bit on macOS, btw.

For numbers bigger then most-positive-fixnum lisp switches to bignums, which are slower.
Bignums are boxed, so they cons.

CCL switches earlier than SBCL. 2305843005992468481 is FIXNUM in SBCL, but BIGNUM in CCL.
With big enough numbers SBCL will slow down too. Try 3221225471 iterations in you example.

If you are !ABSOLUTELY! sure that your numbers are always smaller then most-positive-fixnum
you can declare them as fixnum to get some (dangerous) speedup.
Speedup from (signed-byte 64) is smaller, btw.

most-positive-fixnum
CCL  #x0FFFFFFFFFFFFFFF 1152921504606846975 60-bit
SBCL #x3FFFFFFFFFFFFFFF 4611686018427387903 62-bit