[Openmcl-devel] Using vecLib framework from OpenMCL?

Tue Aug 29 00:21:04 PDT 2006

The function (CCL:MAKE-HEAP-IVECTOR <element-count> <element-type>)
will allocate some foreign memory (at least enough to store <element-count>
contiguous objects of immediate type <element-type>); it then slaps
some lisp header information on the front of that block of memory
and returns a tagged lisp vector, a pointer to the first usable
byte of data (past the lisp header), and the logical size of the
pointer in bytes.

? (make-heap-ivector 10 'character)
""		; the 10 #\NUL characters may or may not print visibly
#<A Mac Pointer #x301E6C>
10
?

Note that the address in question (#x301e6c) is 4 bytes (the size of
the lisp header on ppc32) past an address that's aligned on a 64-bit
boundary.  AltiVec objects generally have to be aligned on 128-bit
boundaries in memory (I believe that there's at least a performance
penalty in the SSE2 case if the vector isn't 128-bit aligned).

So, you could allocate a little more than you need, and only use
the aligned part of the result:

;;; Create a 128-bit aligned vector of 4 SINGLE-FLOATs inside a
;;; heap-allocated vector of 7 SINGLE-FLOATS.  Return the lisp
;;; vector, the biased foreign pointer, and the index of the first
;;; aligned SINGLE-FLOAT in the vector.

(defun allocate-aligned-single-float-vector ()
   (multiple-value-bind (lisp-vector foreign-pointer)
       (ccl:make-heap-ivector 7 'single-float)
     (let* ((address (ccl:%ptr-to-int foreign-pointer))
            (aligned-address (logandc2 (+ address 15) 15)))
       (values lisp-vector
               (ccl::ptr-to-int aligned-address)
               ;; The "4" below is the size of a SINGLE-FLOAT in bytes
               (floor (- aligned-address address) 4)))))

So (after some huffing and puffing) we could initialize a foreign
vector with some values whose square roots we'd like to determine
in parallel:

(multiple-value-bind (vector arg-pointer first-index)
     (allocate-aligned-single-float-vector)
   (dotimes (i 4)
     (setf (aref vector (+ first-index i)) (float i 1.0f0)))
   ;; Allocate another aligned vector to hold the eagerly awaited result.
   (multiple-value-bind (result-vector result-pointer result-first-index)
        (allocate-aligned-single-float-vector)
     ;; Now, we're ready to call vsqrtf.  Oops, no we're not.
     ;; (#_vsqrtf result-pointer arg-pointer)))

#_vsqrtf wants its argument to be passed in (and will return its
result in) a vector register (vN for AltiVec, xmmN for SSE2).
OpenMCL's FFI has no real concept of what this means.

For most foreign types, there's a corresponding lisp type and (in
general) a foreign function call involves coercing between various
lisp representations of integers/floats/pointers and raw (unboxed)
representations and coercing the raw unboxed result into a lisp
value.  This generally also involves following conventions like
"pass the first N FP args in the first N FP registers" according
to the target ABI.

One of those ABI conventions involves how SIMD vector arguments
and results should be handled.  There isn't really a corresponding
Lisp "SIMD vector" type, and there hasn't been an obvious candidate
on PPC32 because (a) lisp vectors aren't aligned stringently enough
and (b) the alignment of a lisp vector (relative to 128-bit alignment)
can change at any instruction boundary because of GC activity.

For 64-bit platforms, neither (a) nor (b) is a concern (all lisp
objects are 128-bit aligned and this never changes.)  So, we -could-
(hypothetically) use real lisp vectors to encapsulate SIMD vectors
(sort of like the way that a lisp DOUBLE-FLOAT object encapsulates
a double-float value).  Our call to #_vsqrtf might wind up something
like:

(let* ((argument-vector (make-array (+ 4 2) :element-type 'single-float)))
   ;; the vector will start with a 64-bit header; skip the first 2
   ;; 32-bit elements to wind up back on a 128-bit boundary
   (dotimes (i 4)
     (setf (aref argument-vector (+ 2 i)) (float i 1.0f0)))
   (let* ((result-vector (make-array (+ 4 2) :element-type 'single-float)))
      (external-call "_vsqrtf" :vector arguments-vector (:vector result-vector))
      ;; The made-up syntax above is supposed to suggest that the result
      ;; is a SIMD vector that should be stored in the lisp object
      ;;  RESULT-VECTOR.  There might be other/better syntax for this.
      (dotimes (i 4)
        (format t "~& SQRT of ~s = ~s"
                (aref argument-vector (+ 2 i))
                (aref result-vector (+ 2 i))))))

-That- looks vaguely lisp-like (except for the slightly odd requirement
that the first 64 bits of data in the lisp object be ignored.)

I was never able to come up with an even vaguely lisp-like way of
integrating SIMD stuff with a 32-bit lisp; whenever the issue arose,
I generally suggested that it'd be better to wait for 64-bit ports
because of the alignment issues.  I'm not sure what the priority on
this should be,  but I do recognize that I no longer have that excuse.

On Tue, 29 Aug 2006, Phil wrote:

> I'm finding myself longingly looking at some of the capabilities in
> vecLib (a collection of math libraries which are accelerated when
> Altivec/SSE is available) but am thinking that the required setup
> (i.e. allocating and populating the structures then reading the
> results back into Lisp structures) would at best be a wash vs.
> straight Lisp code.  Then I got to thinking that the overhead could
> be greatly minimized by implementing a limited-functionality Lisp
> vector/array type which used FFI-based memory allocation.  Any
> thoughts/experiences re: attempting this or other approaches to using
> vecLib?
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>