[Openmcl-devel] Inline assembly?

Thu Aug 27 08:24:06 PDT 2009

On Aug 27, 2009, at 6:06 AM, Jianshi Huang wrote:

> On Thu, Aug 27, 2009 at 3:34 AM, R. Matthew Emerson<rme at clozure.com>  
> wrote:
>>
>> We do use some mmx/sse/sse2 instructions:  scalar floating point  
>> operations
>> and some more exotic ones (like pmuldq) in bignum operations (see  
>> the source
>> for %multiply-and-add-loop on a 64-bit lisp, for example), but it's  
>> true
>> that the assembler doesn't recognize all of them.
>>
>> These instructions are defined at around line 2987 of
>> ccl:compiler;X86;x86-asm.lisp.
>>
>> Adding more instructions is not too difficult once you figure out  
>> what the
>> def-x86-opcode macro is expecting.
>>
>
> Thanks for the explanation. I'm reading the souce code. It's quite
> clean what it does I think, so far so good.
>
>> If you need 16-byte alignment (for instance, in order to use movapd  
>> or
>> whatever), then you generally have to ensure proper alignment  
>> yourself
>> somehow.
>>
>> All heap-allocated lisp objects are doublenode (8 bytes on x8632,  
>> 16 bytes
>> on x8664) aligned.  Double-floats are 8-byte aligned on both  
>> platforms.  The
>> elements of vectors have natural alignment.
>>
>> http://ccl.clozure.com/ccl-documentation.html#Tagging-scheme talks  
>> about how
>> objects are represented in memory in a little more detail.
>>
>
> I'll first stick to X8664 first since on X8664, lispobjs are already
> 16 bytes aligned, so it would be easier.

I'm not sure exactly what you have in mind, but it might make sense to  
start out not worrying about alignment.  You could just load with  
movlpd/movhpd (or maybe just movupd just to get started).

An interesting read is found at http://wikis.sun.com/display/BluePrints/Instruction+Selection 
.  It talks about selecting appropriate load instructions;  it claims  
that newer processors have much smaller (or even negligible) unaligned  
load penalties.

> I currently only want specialized-arrays to be used with SSE
> instructions so according to the doc (and the source code) I only need
> to modify the layout of a uvector. Please correct me if I'm wrong.

Changing the layout of objects is kind of tricky (it would almost  
certainly require cross-compilation, and that's sort of a pain).  Can  
you give some examples of how you are thinking of using SSE  
instructions?  Maybe there is some other way we can come up with that  
won't involve changing the low-level data representation.

> A uvector object has a 64bit header at the front so the raw data is
> misaligned by 8bytes. For what I've read in the source code, accessing
> the element is to add a offset to the uvector pointer. e.g.
>
> (define-x8664-vinsn misc-ref-double-float  (((dest :double-float))
>                                            ((v :lisp)
>                                             (scaled-idx :imm)))
>  (movsd (:@ x8664::misc-data-offset (:%q v) (:%q scaled-idx)) (:%xmm  
> dest)))
>
>
> So what I need to do is to change the offset of the raw data. Is there
> anything I missed? (very likely)

Well, that's basically right.  Instead of x8664::misc-data-offset,  
we'd use x8664::misc-aligned-float-offset or something, which would be  
misc-data-offset + 8.  (See the x8632 version of that vinsn, and  
you'll note that we use misc-dfloat-offset there.)

misc_set_common and misc_ref_common in x86-spentry64.s would need to  
change also.

> My current problem is I couldn't understand the following definitions
> (in x8664-arch.lisp):
>
> (defconstant misc-header-offset (- fulltag-misc))
> (defconstant misc-data-offset (+ misc-header-offset node-size))
> (defconstant misc-subtag-offset misc-header-offset)
> (defconstant misc-dfloat-offset misc-data-offset)
> (defconstant misc-symbol-offset (- node-size fulltag-symbol))
> (defconstant misc-function-offset (- node-size fulltag-function))
>
>
> Why they are different and why header-offset  equals to (-
> fulltag-misc) which is -13L?

Consider a pointer to some sort of uvector.  The low 4 bits of the  
pointer will be #b1101.  That's x8664::fulltag-misc.

This means that the pointer will actually be pointing 13 bytes into  
the uvector.  If we subtract off the tag, then the effective address  
will be the start of the header word.  So, if we've got a (tagged)  
pointer to some uvector in %temp0, then

(movq (@ x8664::misc-header-offset (% temp0)) (% imm0))

gets the header into %imm0.

Add 8 to that effective address, and we're addressing element 0 of the  
vector.  That's misc-data-offset:  -13 + 8 = -5.

Recall that the low 8 bits of the header word are the subtag.  On a  
little-endian platform, the offset from the tagged pointer to the  
subtag is just misc-header-offset, so misc-subtag-offset is the same.   
On a big-endian platform (like Power PC), it would be different.

On 32-bit platforms, we want double-floats to be 8-byte aligned.  So,  
there's an empty word between the header and the first element (in  
both the case of a solitary boxed double-float and a specialized  
double-float vector).  On 64-bit platforms, double-floats are always  
naturally aligned, so misc-dfloat-offset and misc-data-offset are the  
same.

Finally, symbols and functions have their own tags on 64-bit plaforms,  
but they're really just uvectors underneath. I'm not sure that misc- 
symbol-offset or misc-function-offset are used much (if at all), but  
the idea is that you should be able to change the tag of the pointer  
to treat a symbol or function as a generic uvector.

(By the way, it's probably a good idea to cc messages to openmcl- 
devel.  Maybe Gary Byers or someone else will feel like making some  
comments, too.)