[Openmcl-devel] Fun with Measurements
Brent Fulgham
bfulg at pacbell.net
Sun Oct 29 11:45:52 PST 2006
On Oct 29, 2006, at 12:05 AM, Gary Byers wrote:
> On Sat, 28 Oct 2006, Brent Fulgham wrote:
>> While the 64-bit build of OpenMCL looks great on nearly all tests
>> (and substantially improves over the 32-bit build in some cases),
>> something seems very wrong with the 64-bit support for BIGNUMS (see
>> BIGNUM/PARI-200-5, PI-DECIMAL/BIG and PI-DECIMAL/SMALL for example).
>> Remember that the non-reference columns are relative scales, so on
>> PI-
>> DECIMAL/SMALL, 64-bit OpenMCL is 10x as slo as OpenMCL 1.1/1.0 in 32-
>> bit. One small bright spot is the CRC40 benchmark, which due to it's
>> heavy use of 40-bit integers is substantially improved using native
>> 64-bit words.
>
> I haven't looked at this (Apple's CHUD performance tools don't yet
> work
> in 64-bit mode); can you tell whether it's more accurate to say
> "bignum
> arithmetic is slower in ppc64 OpenMCL" or "bignum -multiplication- is
> slower ..." ?
Yes, it probably is multiplication. The benchmark consists of:
;; code from Bruno Haible <haible at ilog.fr>
;;
;; A. Elementary integer computations:
;; The tests are run with N = 100, 1000, 10000, 100000 decimal
digits.
;; Precompute *x1* = floor((sqrt(5)+1)/2 * 10^(2N))
;; *x2* = floor(sqrt(3) * 10^N)
;; *x3* = 10^N+1
;; Then time the following operations:
;; 1. Multiplication *x1* * *x2*,
;; 2. Division (with remainder) *x1* / *x2*,
;; 3. integer_sqrt (*x3*),
;; 4. gcd (*x1*, *x2*),
;;
;; B. (from Pari)
;; u=1;v=1;p=1;q=1;for(k=1..1000){w=u+v;u=v;v=w;p=p*w;q=lcm(q,w);}
So, it is greatly focused on bignum multiplication.
Here's an example test case that shows bad performance in 64-bit mode:
=========================================================
;; calculating pi using ratios
(defun compute-pi-decimal (n)
(let ((p 0)
(r nil)
(dpi 0))
(dotimes (i n)
(incf p (/ (- (/ 4 (+ 1 (* 8 i)))
(/ 2 (+ 4 (* 8 i)))
(/ 1 (+ 5 (* 8 i)))
(/ 1 (+ 6 (* 8 i))))
(expt 16 i))))
(dotimes (i n)
(multiple-value-setq (r p) (truncate p 10))
(setf dpi (+ (* 10 dpi) r))
(setf p (* p 10)))
dpi))
;; this can be 1e-6 on most compilers, but for COMPUTE-PI-DECIMAL on
;; OpenMCL we lose lotsa precision
(defun fuzzy-eql (a b)
(< (abs (/ (- a b) b)) 1e-4))
(defun run-pi-decimal/small ()
(assert (fuzzy-eql pi (/ (compute-pi-decimal 200) (expt 10 198)))))
(defun run-pi-decimal/big ()
(assert (fuzzy-eql pi (/ (compute-pi-decimal 1000) (expt 10 998)))))
=========================================================
The comment regarding OpenMCL is not mine, this was in the benchmark
sources. Does anyone know why this comment is being made?
At any rate, the above example is 8 to 10x as slow for 64-bit OpenMCL
as for 32-bit. (pi-decimal/small is 10x as slow, pi-decimal/big is 8x).
> The ppc32 version does Karatsuba multiplication
> <http://en.wikipedia.org/wiki/Karatsuba_multiplication> if the
> operands are big enough; neither the ppc64 nor x86-64 versions do
> (for no
> particularly good reason.) I'd be a little surprised if that was
> the difference,
> but I'm also surprised by the results: a year of so ago, the ppc64
> bignum code
> seemed as fast or faster in all of the cases that I tried.
I'm not qualified to assess whether Karatsuba multiplication would
provide significant gain. It might be useful to see if there is a
more full-featured BIGNUM test suite we could use to assess the full
set of operations. I'll see what I can find.
Thanks,
-Brent
More information about the Openmcl-devel
mailing list