[Openmcl-devel] Wrong multiplication

Sat Jan 5 09:11:44 PST 2013

If X is a FIXNUM, then (- X) is a FIXNUM.  Unless X is
MOST-NEGATIVE-FIXNUM ...  I'm fairly sure that I've seen other bugs
that were caused by failing to account for that special case, and
(once I finally realized that the test case involved
MOST-NEGATIVE-FIXNUM) it was very easy to see that that was the cause
of the problem.  Given the way that FIXNUM x BIGNUM multiplication
works in CCL and the fact that MOST-NEGATIVE-FIXNUM doesn't fit (and
given some hindsight), testing the (* most-negative-fixnum bignum) and
(* any-other-fixnum bignum) cases is fairly exhaustive, (Of course
there are other things that can go wrong, including other things
that're only obvious in hindsight ...)

In 32-bit versions of CCL, there are 2^30 (~10^9) different FIXNUMs; in
64-bit versions, there are 2^61 (~10^18).  If you did 1,000,000 tests
of (* (random-fixnum) (random-bignum)) in  32-bit CCL, then you'd miss
this failure 99.9% of the time.  (I -think- that a test that iterated
over all 2*30 FIXNUMs in 32-bit CCL and multiplied each by a bignum and
checked the result would likely take minutes rather than hours or days
to run; that might be viable and worthwhile.  However long it took, the
64-bit case would likely take ~2*31 times longer, and that probably 
wouldn't be too practical.

What's now CCL forked off from MCL in 1998 (IIRC), and I'm fairly sure
that this bug was present at that time.  I checked the (R)MCL sources
and it had been fixed there, so sometime in the last ~15 years either
someone (Alice Hartley ?) looked at the code and noticed that it was
wrong or someone ran into the bug and reported it.  (Also: the code in
question was introduced in MCL when it was ported to the PPC in
1994-1995; prior to that, MCL only ran on 68K Macs, and most of the
math code was written in 68K assembly language.  I don't remember any
version of this bug in that code, but I barely remember 1990 at this
point ...  Saying "18 years" instead of "25 years" doesn't sound much
different, but it's a little more accurate.)

I don't know whether people have encountered this bug in CCL and either
not noticed the error or not reported it, but I don't think that it's
entirely unbelievable that it hasn't been triggered.  Outside from code
which tries to exhaustively test FIXNUM x BIGNUM multiplication, it's
-probably- true that some fixnums are more likely to be involved than others.
(I'd guess that (* 2 bignum) is more common than (* 6889 bignum), and that
even if there are "only" 2^30 FIXNUMs the distribution isn't even here.)

On Sat, 5 Jan 2013, Eric Marsden wrote:

>>>>>> "gb" == Gary Byers <gb at clozure.com> writes:
>
>  gb> (* most-negative-fixnum some-bignum) ; either arg order
>  gb>
>  gb> will produce an incorrect result on all architectures that CCL runs on.
>  gb> (The incorrect result will also likely differ on each invocation.)
>  gb>
>  gb> The bad news: this bug has likely been around forever.
>
>  This really is quite astonishing! CCL has been around for 25 years; it has
>  been considered as a platform for use in space; it is used intensively
>  as a theorem prover (ACL2) runtime; I assume that people have run
>  the Maxima test suite on it. I myself have run many days worth of random
>  integer testing. And noone had detected the bug until now ...
>
> -- 
> Eric Marsden
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>