[Openmcl-devel] Bignum speed

Thu Oct 16 20:24:49 PDT 2008

Gary Byers wrote:
> Sorry for not responding sooner.
> 
> "Plans" in the sense of "in N weeks, someone plans to do X, Y, and Z
> and therefore speed things up by M percent" ?  I don't know of any
> such plans, no.
> 
> "Plans" in the sense of many people thinking "someday, someone should
> take the time to do X, Y, and Z, and therefore speed things up ..." ?
> Sure.
> 
> Designing and implementing really good bignum code takes significant
> time.  Taking bignum code that's often pretty bad (worse than it seems
> that it should be ...) and making it "not so bad" or "not bad at all"
> or "maybe kind of good" takes far less time.  In some cases, one would
> hit a wall where there's no more low-hanging fruit left and further
> improvements involve better algorithms (or better support for doing
> unboxed arithmetic, or ...), but I don't think that we're particularly
> close to that wall, or that anyone's put any effort/time into getting
> there.
> 
> I don't know exactly how to quantify what it would mean to go from
> where things are now to where we could get with a moderate amount of
> effort; I've poked at some of the code in the last few days and seem
> to have gotten some improvement (maybe 2X-3X in some cases, though
> it's hard to know if those cases are representative and - since memory
> allocation/GC costs factor into this - should note that those results
> weren't obtained in a controlled environnent.  (The kinds of stuff
> that I did just involved the obvious first steps one might take in
> trying to improve code: removing function call overhead, declaring the
> types of things that're heavily used and non-obvious, removing
> unnecesary typechecking, etc.  Again, there's a limit to how far this
> approach can take you; again, it's not surprising that doing obvious
> and simple things to improve performance ...  improves performance, at
> least some.)
> 
> Two things that someone interested in helping CCL's bignum performance
> could provide that'd help are:
> 
> 1) A fistful of cash.  (Hey, I had to mention it.)  We -do- do work
> on CCL that isn't directly funded, but a significant amount of work
> (multiple person-months) is much more likely to happen if it's funded
> than if it isn't.
> 

FriCAS is an unpaid project, so no cash from here... OTOH I do not
think multiple person-months are needed.  Current performance leader
for large integer calculations is GMP library -- it gets performance
via combination of advanced algorithms and tuned assembler code
(for several popular architectures).  Independently getting
comparable performance is a big effort.  But calling GMP via
FFI should not be that hard.  Alternatively, getting some
intermediate level of performance would make huge improvement
and is easier.

> 2) Concrete examples of things where performance is bad.  (It may
> be the case that things are just bad across the board; I don't know.)
> The simpler it is to reproduce a performance problem (or an outright
> bug, for that matter), the more time someone can put into trying to
> improve/fix the problem.
> 

I mean multiplication of large numbers.  Here size matters.  First
range is just above fixnums: it is important because such numbers
are relatively frequent.  The second is between tens and thousends
of digits.  Such numberes are less frequent but routinely appear
in intermediate steps in some problems where all input and output is
in fixnum range.  Finally, there are huge numbers, million digits
or more.  Huge numers are more specialized than merely large ones:
natural problems which small numbers as input that need million digit
numbers in intermediate steps are expected to need lot of them, so
to run out memory anyway.

A little benchmark below.  To make time easier to measure
dot-product adds all products of elements of two lists.  To
avoid possiblity of trivial optimizations all numbers are
different, products are added, result is assigned to a variable.
I make sure that result is not printed, because printing is
usually quite time consuming and could change timing in undesired way.

(setf a (expt 9 1000))
(defun dot-product (l1 l2)
    (let ((s 0)
          (ll1 l1))
        (loop
            (cond
                ((null l2)
                    (return-from dot-product s))
                ((null ll1)
                    (setf ll1 l1)
                    (setf l2 (cdr l2)))
                (t
                    (setf s (+ s (* (car ll1) (car l2))))
                    (setf ll1 (cdr ll1)))))))

(compile 'dot-product)

(setf l1 nil)
(dotimes (i 100) (setf l1 (cons (- a i) l1)))
(setf l2 nil)
(dotimes (i 100) (setf l2 (cons (+ i a)  l2)))

(time (progn (setf ress (dot-product l1 l2)) nil))

Using Clozure CL 1.2-r10552 on 2.4GHz Core Duo time is  2.200137s
On the same machine sbcl needs 0.104s, clisp 0.297694 s, gcl 0.09s

For larger numbers ratio goes even worse.

-- 
                              Waldek Hebisch
hebisch at math.uni.wroc.pl