[Openmcl-devel] how does one cause stack-allocation for floats? [2]

Wed Mar 29 03:38:28 PST 2006

It's certainly necessary for the compiler to know that floating-point
arithmetic is going on in order for it to generate reasonably good
floating-point code, and some combination of declarations/explict
constants would be necessary in order for it to know that.

That's necessary, but it's not really sufficient: OpenMCL will often
correctly and semi-reasonably inline a floating-point operation (with
operands in FP registers) and then correctly but not-very-reasonably
cons up a DOUBLE-FLOAT object to hold the result.  The declaration
that would have the desired effect of avoiding this when it's
unnecessary might be:

   (declare have-a-clue-about-object-lifetimes)

or

   (declare be-smarter-about-this-than-you've-been-for-the-last-20-years)

Of course, if that declaration had any effect in OpenMCL, it would be
implicit.  Unless and until the compiler's smarter about this, it's
kind of like asking a chimpanzee to do algebra: there aren't really
the right kinds of synapses and neurons and other stuff in the right
places.

If that infrastructure was in place (in the OpenMCL compiler, not
necessarily in the chimp), then it'd also be good if the compiler put
effort into trying to keep FP values that're likely to be frequently
used in registers.  (In the example that I posted earlier, it didn't
seem to be the GC per se that was killing performance, but I'd bet
that memory traffic was a major culprit.)

Without that infrastructure, the whole concept of keeping a floating-
point result in a register is (sadly) foreign: FP results are things
that have to (quickly) be discharged ("who knows how the value will
be used ? Best to put it in a tagged lisp object, and if it happens
to have a next-use any code that needs the value can deal with that").
That (repeated thousand or millons of times) is obviously pretty
undesirable.

The functionality that you (James) asked about - the ability to
stack-allocate floats and destructively modify them - may sound like
it's a partial step in the right direction (e.g., better than nothing.)
(It's sort of like manual register allocation, where the "registers"
are stack-allocated floats.)

One way in which it's pretty clearly a step in the wrong direction
is that it gets in the way of real register allocation.  If code is
written like:

  (ccl::with-stack-double-floats ((fsum 0.0d0))
   (ccl::%setf-double-float fsum (the double-float (+ fsum (the double-float ...)))))

would a smarter compiler - which was planning on keeping the result of
that addition in an FP register - have any idea of why it's being told
to put things in memory and take them back out (when, after all, that
largely defeats the purpose of any analysis it has done) ?

Perhaps more significantly, should users be encouraged to think of
floats as being mutable objects and of exercising control over their
allocation ?  I think that there are lots of aesthetic reasons for
saying "no", and the fact that this doesn't have much to do with
really good code generation is a strong practical reason for doing
so as well.

On Wed, 29 Mar 2006, james anderson wrote:

> hello again;
>
> thank you for your note.
>
> i agree with the sentiment, but it's not very satisfying.
>
> let me see if i can reframe this.
>
> i know that it is not good form to slip notes under the table to the
> compiler to cajole it into doing the right thing.
> on the other hand, some annotation is necessary, since, without the
> benefit of global analysis, only the application writer will know
> that, in the end, it's just fine to do the whole thing in registers.
> yes, i would like to have annotation sufficient to achieve beer's
> floating-point efficiency, but i would like the directives to be
> understood by any standard lisp. maybe ansi will reconvene one day.
> maybe they will admit that a "register" declaration is not
> unmitigated evil. until then i am limited to dynamic-extent.
>
> yes, i have trashed zero and had to figure out how to reconstruct it.
> that's life.
>
> conscious of which, if i just gather together the results, with a
> count of 10000000, from lisps i have at hand, they look like this:
> (results are | total milliseconds . gc milliseconds .  byes consed | )
>
> ----------------------|    single-float     | single/stack  | double-
> float        | double/stack     |
> allegro 6.2 / P4-2.8G | 4484 . 3436 . 320MB | 406 . 0 . 496 | 6562 .
> 4638 . 480MB | 406 .  0 .  24   |
> open-mcl 1.0/ G5-2.5G |  655 .    0 .   0   | 421 . 0 .   0 |
> 932 .   28 . 320MB | 819 . 27 . 320MB |
> mcl         / G5-2.5G | 4098 .  547 . 160MB | 320 . 8 .  24 | 7844 .
> 305 . 320MB | 361 .  0 .  16   |
> cmucl 19c   / G5-2.5G |   70 .    0 .  24   |  60 . 0 .   0 |
> 280 .    0 .  32   |   7 .  0 .  32   |
> ----------------------|---------------------|---------------|-----------
> ----------|------------------|
> cmucl-only double w/o declarations                           11050 .
> 350 . 320MB
>
> (whereby the gc figures are intra-runtime only, as the heap sizes
> vary widely)
> these reveal open-mcl's dark secret in the relatively small
> difference between 819 and 932.
> they also imply that cmucl knows even more secrets: the type
> declarations alone accomplish a lot.
> despite which, the differences between 361 and 819 and between 320MB
> and 16, and between 280 and 7 will sometimes contribute to whether a
> process completes in time or not.
>
> what is wrong with letting the program specify that it wants scratch
> storage?
>
> would it not be the right thing to support (declare (register result))?
> it would avoid the (= 0.0 pi) problem. no?
>
> ...
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>