[Openmcl-devel] What is the status of OpenMCL Altivec support

Sat Feb 21 03:12:38 PST 2004

On 21.02.2004, at 08:46, Gary Byers wrote:

> What support there is is pretty much limited to the fact that the
> assembler supports AltiVec instructions.  (Someone reported a bug in
> the way that immediate operands were encoded in some vector instuctions
> not too long ago, suggesting that they may have been trying to actually
> use AltiVec.  I don't know exactly what they were doing or how far they
> got.)

That would've been me. I'm planning to do some 3D programming
with OpenMCL, so I've been exploring the possibility of using AltiVec
for the simpler vector operations (normalise, cross-product etc.) in
addition to providing native Lisp versions, so I can just switch some
function definitions depending on ALTIVEC-AVAILABLE-P.

The code generally looks similar to this:

(in-package :ccl)
(defppclapfunction %vector-dosomething!-altivec ((vec arg_z))
   ;; Register usage as follows:
   ;; vr1 = Holds the MSQ when loading/storing
   ;; vr2 = Holds the LSQ when loading/storing
   ;; vr3 = Holds the contents of the vector
   ;; vr27 = Alignment vector for loading/storing
   ;; vr28 = Select mask when storing
   ;; vr29 = Constant: all -1s
   ;; vr30 = Constant: all 0s
   ;; vr31 = Constant: all 1s
   (with-altivec-registers (vr1 vr2 vr3 vr27 vr28 vr29 vr30 vr31)
     ;; load the given vector into vr3
     (li imm0 arch::misc-data-offset) ; get the offset to the 
(unaligned) data
     (lvx vr1 arg_z imm0)             ; load the MSQ
     (lvsl vr27 arg_z imm0)           ; load the alignment vector
     (addi imm0 imm0 16)              ; address of LSQ
     (lvx vr2 arg_z imm0)             ; load the LSQ
     (vperm vr3 vr1 vr2 vr27)         ; permute the result into vr3
     ;; initialize some useful constants
     (vspltisb vr31 1)                ; vr31 = all 1s
     (vspltisb vr30 0)                ; vr30 = all 0s
     (vspltisb vr29 -1)               ; vr29 = all -1s
     ;; do the calculation
     (...)
     ;; store the result into the given vector
     (li imm0 arch::misc-data-offset) ; get the offset to the 
(unaligned) data
     (lvx vr1 arg_z imm0)             ; load the MSQ for update
     (lvsr vr27 arg_z imm0)           ; load the alignment vector
     (addi imm0 imm0 16)              ; address of LSQ
     (lvx vr2 arg_z imm0)             ; load the LSQ
     (vperm vr28 vr30 vr29 vr27)      ; right shift the select mask
     (vperm vr3 vr3 vr3 vr27)         ; right rotate the data
     (vsel vr2 vr3 vr2 vr28)          ; insert LSQ component
     (vsel vr1 vr1 vr3 vr28)          ; insert MSQ component
     (stvx vr2 arg_z imm0)            ; store LSQ
     (addi imm0 imm0 -16)             ; address of MSQ
     (stvx vr1 arg_z imm0))           ; store MSQ
   (blr))

I did some simple tests and it worked like a charm. Normalising
a vector was 50-60x faster than the pure Lisp version for example.
Of course, it also took me 50-60 times longer to write.

There are some caveats: writing the code itself is difficult enough,
but debugging it is pretty much hell on earth. I ended up attaching
a GDB to the running OpenMCL process, getting the address of
the function each time and stepping through each instruction with
the debugger. Not exactly my definition of "fun".
Additionally, I didn't think of any effects the GC might have, so it
is probably even more interesting to write robust AltiVec functions.
I'm pretty much bogged down with school-work right now, so I won't
be able to pursue anything in that direction until April (this also
includes working some more on the XREF code...).

-- 
   Oliver Markovic