[Openmcl-devel] What is the status of OpenMCL Altivec support
Oliver Markovic
entrox at entrox.org
Sat Feb 21 03:12:38 PST 2004
On 21.02.2004, at 08:46, Gary Byers wrote:
> What support there is is pretty much limited to the fact that the
> assembler supports AltiVec instructions. (Someone reported a bug in
> the way that immediate operands were encoded in some vector instuctions
> not too long ago, suggesting that they may have been trying to actually
> use AltiVec. I don't know exactly what they were doing or how far they
> got.)
That would've been me. I'm planning to do some 3D programming
with OpenMCL, so I've been exploring the possibility of using AltiVec
for the simpler vector operations (normalise, cross-product etc.) in
addition to providing native Lisp versions, so I can just switch some
function definitions depending on ALTIVEC-AVAILABLE-P.
The code generally looks similar to this:
(in-package :ccl)
(defppclapfunction %vector-dosomething!-altivec ((vec arg_z))
;; Register usage as follows:
;; vr1 = Holds the MSQ when loading/storing
;; vr2 = Holds the LSQ when loading/storing
;; vr3 = Holds the contents of the vector
;; vr27 = Alignment vector for loading/storing
;; vr28 = Select mask when storing
;; vr29 = Constant: all -1s
;; vr30 = Constant: all 0s
;; vr31 = Constant: all 1s
(with-altivec-registers (vr1 vr2 vr3 vr27 vr28 vr29 vr30 vr31)
;; load the given vector into vr3
(li imm0 arch::misc-data-offset) ; get the offset to the
(unaligned) data
(lvx vr1 arg_z imm0) ; load the MSQ
(lvsl vr27 arg_z imm0) ; load the alignment vector
(addi imm0 imm0 16) ; address of LSQ
(lvx vr2 arg_z imm0) ; load the LSQ
(vperm vr3 vr1 vr2 vr27) ; permute the result into vr3
;; initialize some useful constants
(vspltisb vr31 1) ; vr31 = all 1s
(vspltisb vr30 0) ; vr30 = all 0s
(vspltisb vr29 -1) ; vr29 = all -1s
;; do the calculation
(...)
;; store the result into the given vector
(li imm0 arch::misc-data-offset) ; get the offset to the
(unaligned) data
(lvx vr1 arg_z imm0) ; load the MSQ for update
(lvsr vr27 arg_z imm0) ; load the alignment vector
(addi imm0 imm0 16) ; address of LSQ
(lvx vr2 arg_z imm0) ; load the LSQ
(vperm vr28 vr30 vr29 vr27) ; right shift the select mask
(vperm vr3 vr3 vr3 vr27) ; right rotate the data
(vsel vr2 vr3 vr2 vr28) ; insert LSQ component
(vsel vr1 vr1 vr3 vr28) ; insert MSQ component
(stvx vr2 arg_z imm0) ; store LSQ
(addi imm0 imm0 -16) ; address of MSQ
(stvx vr1 arg_z imm0)) ; store MSQ
(blr))
I did some simple tests and it worked like a charm. Normalising
a vector was 50-60x faster than the pure Lisp version for example.
Of course, it also took me 50-60 times longer to write.
There are some caveats: writing the code itself is difficult enough,
but debugging it is pretty much hell on earth. I ended up attaching
a GDB to the running OpenMCL process, getting the address of
the function each time and stepping through each instruction with
the debugger. Not exactly my definition of "fun".
Additionally, I didn't think of any effects the GC might have, so it
is probably even more interesting to write robust AltiVec functions.
I'm pretty much bogged down with school-work right now, so I won't
be able to pursue anything in that direction until April (this also
includes working some more on the XREF code...).
--
Oliver Markovic
More information about the Openmcl-devel
mailing list