[Openmcl-devel] ARM immediates, subprims and NIL
gb at clozure.com
Mon Jul 11 01:54:07 PDT 2005
On Sun, 10 Jul 2005, James Bielman wrote:
> I've made a fair bit of progress on the LAP assembler for my OpenMCL
> ARM port. Now that I've started messing around with it a bit, I've
> got some more architectural problems to solve.
> The format of immediates in ARM data processing instructions is rather
> weird. Instead of an N-bit integer, an immediate is encoded as an
> 8-bit integer rotated right by an even number of bits.
I think that you can actually do several flavors of shifts as well as
rotates, and an immediate shift/rotate count can anything between
0 and 31, inclusive.
This doesn't really affect your basic point (that 1-instruction
constant loads only work if the number of significant bits in the
constant is small)
> So, the following immediates are legal (using ARM LAP syntax):
> (mov imm0 #xff) ; #xff rotated right 0 bits
> (mov imm0 #x2000) ; #x80 rotated right 6 bits
> (mov imm0 #xf000000f) ; #xff rotated right 28 bits
> but these arent:
> (mov imm0 #x101)
> (mov arg_z #x2015)
> (mov pc #x5000)
> So, assuming I want to be able to load NIL into a register in a single
> instruction (which seems like a good goal), I'll need to shuffle
> things around a bit.
> Furthermore, Windows CE helpfully declares the first 64k of memory as
> off-limits, so this wouldn't work there anyway.
NIL's basically a really popular constant (e.g., more functions
reference NIL more often than they reference PI or the symbol FOO).
I'm not sure that it's -so- frequently-used that it warrants being
kept in a dedicated register, especially if registers are in short
supply. (You'd like it to be in a register if it was referenced in a
loop or otherwise referenced frequently, but the same is true of PI
If you have control of the address space above #x10000 (and if you
want to keep the PPC32 tagging scheme), you can probaly get NIL
into a register with 2 ALU instructions (assuming that NIL is
(mov temp0 ($ (lsl 2 16))) ; whatever the assembler syntax is
(or temp0 temp0 ($ 15))
If NIL's going to be referenced often in a given function (and/or
in one or more loops), you'd probably have wanted to target SAVE0
instead of TEMP0. If the compiler doesn't do a good job of this,
the 2 instructions above are more than we're used to seeing, but
not the end of the world. (At least, I don't think so).
My argument that it'd be acceptable to treat NIL not-so-specially
might depend on the compiler doing a better job of both local
and global register allocation. If you dissasemble:
(defun both-foo (x y)
(if (and (eq x 'foo) (eq y 'foo))
(LWZ ARG_Y 4 VSP)
(LWZ ARG_Z 'FOO FN)
(CMPW ARG_Y ARG_Z)
(LWZ ARG_Y 0 VSP)
(LWZ ARG_Z 'FOO FN)
(CMPW ARG_Y ARG_Z)
the second load of 'FOO is redundant (and it's not too hard to prove
> I'm thinking that nil will need to go in a register, since even if I
> move nil to a location representable as a rotated immediate,
> nil-relative objects probably won't be. I'm not sure which register
> to give up for this.
> So that leaves me with:
> (mov arg_z rnil) ; return nil
> (add arg_z rnil t_offset) ; return t
> (This is also a problem for loading immediate fixnums. I've read it
> can take up to three instructions to load an arbitrary 32-bit constant
> using arithmetic instructions. Maybe fixnums that can't be
> represented as immediates can go in the function's constant vector
> like other constant objects? ARM assemblers do something similar with
> "literal pools". Perhaps in the future I could do something fancier
> when optimize speed > optimize size or something...)
On PPC64, it can take as many as 5 instructions to load a 64-bit
immediate. The literature that I've seen suggests that it's almost
always faster to load large constants from memory: if there's a data
cache hit, the single LD instruction is likely to be faster than
the 5 ALU instructions.
> There is a similar problem with subprimitives. ARM doesn't have a
> branch absolute instruction, but you can use PC as a destination
> register in data processing instructions.
> Because of the immediate representation, I can't do:
> (mov lr pc)
> (mov pc #x50bc) ; this would have to be >#xffff on wince
Can you load the PC from the constants pool ?
> I'm not sure what the best way to solve this is yet. Could I move the
> subprimitive jump table relative to nil as well? If this worked, a
> subprim call could look something like:
> (mov lr pc)
> (add pc rnil .SPwhatever-offset)
> ;; or maybe even (ldr pc (rnil .SPwhatever-offset))
> There's probably some sneakier, better way to do this...
The subprimitive jump table on the PPC supposedly costs nearly nothing:
unconditional branches get folded in the pipeline.
One register that I don't think that we can easily get rid of is the
TCR (if you lose track of it, you have to do something like
(#_pthread_getspecific) to recover it.) We're used to thinking of the
TCR as containing strictly thread-specific info, but (if we're willing
to live with slightly larger TCRs) we can certainly copy some global
info into it. Suppose that a given thread's TCR pointed into the
middle of a block of memory; the few-dozen words at non-negative
offsets from the TCR could be the current volatile thread-specific
stuff and the few dozen (hundred ?) words at negative offsets would
be copies of static information (like subprimitive addresses).
That'd make a subprim call something like:
(mov lr pc)
(ldr pc (rcontext .SPfuncall)) ; .SPfuncall is negative and a multiple
; of 4
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
More information about the Openmcl-devel