[Openmcl-devel] consing iterating through a loop

Thu Jan 27 13:41:54 PST 2005

On Thu, 27 Jan 2005, Cyrus Harmon wrote:

>
> Randy et al.,
>
> This is, of course, very cool, but I'd like to point out that other
> implementations such as SBCL (SBCL on PPC) don't cons on this example.
> Clearly different compilers have strengths and weaknesses and OpenMCL
> is great at a lot of things. While use fpc-ppc is a reasonable answer,
> two other alternatives come to mind. First, I'm by no means an expert
> in lisp compiler declarations, so it seemed reasonable that there might
> be something I was forgetting here that would keep OpenMCL from
> consing. Second, assuming all the declare options are correct, is there
> anything that can be done in the compiler to fix this? I know Gary
> Byers has better things to do than explain compiler internals to folks
> like me all day, but if anyone knows why we're consing here, I'd love
> to hear the answer.
>
> Thanks,
>
> Cyrus
>

Here's a short explanation:

It's necessary in general for a lisp implementation to be able to
determine the type of an object at runtime.  There are a few ways
of doing this, but the most common schemes generally involve:

a) using a single machine word to encode either an immediate
   value (FIXNUM, CHAR, ...) or a pointer to something that's
   too large to fit in a machine word (CONS, STRING, FLOAT ...)

b) ensuring that all lisp objects that're allocated in memory
   have a small but consistent memory alignment (typically
   4 or 8 bytes, sometimes different for different object
   types)

c) using a few of the low bits in a machine word used as the
   uniform representation of a lisp object to differentiate
   between immediate and memory-allocated objects and, in
   some cases, to differentiate between the type of memory-
   allocated object.  (The number of bits used - typically
   2 or 3 in a 32-bit implementation - depends on the alignment
   chosen in (b)).  These low bits are typically called "tag
   bits", and a 32-bit implementation typically has between
   4 and 8 primary types that can be encoded in these 2 or 3
   bits.  Depending on the value of those low 2 or 3 bits (the
   "tag"), the upper 29 or 30 bits either encode an immediate
   value or encode the (32-bit or 64-bit aligned) address of
   an object in memory.

A tagged representation of an object is sometimes called a
"boxed" representation; converting between a tagged values
and untagged values is often called "boxing" and "unboxing".
It's common to use tag bits with value 0 to represent fixnums;
this allows boxed values to be used directly for fixnum addition
and subtraction, and means that unboxing is just an "arithmetic
shift right" and boxing is just "shift left and maybe check for
overflow."

The value of a 64-bit DOUBLE-FLOAT (or even a 32-bit SINGLE-FLOAT)
can't be encoded in the 32-bit lisp object used to represent it;
the boxed representation of a FLOAT object is therefore a tagged
pointer to a value that's allocated in memory.  (68K MCL had an
immediate SHORT-FLOAT type, as do some other implementations,
but - since such things have to lose precision and/or magnitude
relative to even a SINGLE-FLOAT, they tend to be of limited
utility.)

In the general case, any operation that produces a value of type
FLOAT has to allocate room for the (32-or-64-bit) value in memory
(+ some extra bits to identify the chunk of memory as containing
a float value of the indicated type) and represent the FLOAT as
a tagged pointer to the block of memory which contains the value.
The -general case- of something like:

  (+ (the double-float x) (the double-float y))

involves something like:

  1) unbox x, by loading a double-float value from the tagged pointer
     X into a floating-point register, FPx.

  2) unbox y, loading the value into another FP register (FPy.)

  3) add the contents of FPx and FPy, storing the result in FPz.

  4) allocate an aligned chunk of memory; store FPz (and a header
     word describing that chunk as containing a DOUBLE-FLOAT),
     and return a tagged pointer to that chunk of memory.

It's mandatory that the general case happens this way, and this
is likely to be true in all implementations.

In specific cases like your example, 499,999 of the 500,000 loop
iterations don't need to follow the general case: the intermediate
accumulated result is only used to provide an operand for the next
loop iteration.  It's not necessary to box and unbox all of these
intermediate results, and steps 1 and 2 (which are cheap) and
4 (which is potentially expensive) can be moved out of the loop
and/or avoided completely.

Some CL compilers do enough analysis to recognize and exploit
this (fairly typical) case; others (including OpenMCL's) don't.

The reasons for this are somewhat circular: because floating-point
performace can be so bad, people who do extensive floating-point
don't use MCL/OpenMCL (or learn to use tools like Randall's to
lessen the pain.)  People who -do- use MCL/OpenMCL (and therefore
influence and sometimes fund) its development tend to be people
who don't care much about floating-point performance.