[Openmcl-devel] Why does this "cheat"/"lie" not work?...

Fri Feb 5 15:16:55 PST 2010

Hi,

I'm trying to cheat a bit in some low level sequence copying.  The
sequences are at base simple arrays of unsigned byte 8.  But in copying
them around I have the following which copies them in "word" size
chunks, where word-size is machine dependent (basically 32 or 64):

(defmacro word-type-spec ()
  `(signed-byte ,+word-size+))

(defmacro the-wda (x)
  `(the (simple-array (signed-byte ,+word-size+) (*)) ,x))

(defmacro wdref (arr index)
  `(the (signed-byte ,+word-size+)
     (aref (the-wda ,arr) (the fixnum ,index))))

(defun replace-int-vec (vec1 vec2 start1 end1 start2)
  (declare (optimize (speed 3) (safety 0) (space 0))
           (type fixnum start1 end1 start2)
           (type (simple-array (word-type-spec) (*)) vec1)
           (type (simple-array (word-type-spec) (*)) vec2))
  (while (fix< start1 end1)
    (setf (wdref vec1 start1) (wdref vec2 start2)
          start1 (fix+ start1 1)
          start2 (fix+ start2 1)))
  vec1)

fix+, fix<, are just what you think and not of issue.

This seems to work, but generates some "odd" (to me) code.  Also
generates quite a bit more than expected:

? (disassemble (symbol-function 'replace-int-vec))
  [0]     (recover-fn)
  [5]     (movl (% ebp) (@ 16 (% esp)))
  [9]     (leal (@ 16 (% esp)) (% ebp))
  [13]    (popl (@ 4 (% ebp)))
  [16]    (pushl (% arg_y))
  [17]    (pushl (% arg_z))
  [18]    (jmp L133)
L20
  [20]    (movl (@ -8 (% ebp)) (% arg_y))
  [23]    (movl (@ -20 (% ebp)) (% arg_z))
  [26]    (movl (% arg_z) (% imm0))
  [28]    (movl (@ -2 (% arg_y) (% imm0)) (% imm0))
  [32]    (calll (@ .SPMAKES32))
  [39]    (recover-fn)
  [44]    (movl (% arg_z) (% temp1))
  [46]    (movl (% temp1) (% imm0))
  [48]    (sarl ($ 2) (% imm0))
  [51]    (testl ($ 3) (% temp1))
  [57]    (je L81)
  [59]    (movl (% temp1) (% imm0))
  [61]    (andl ($ 3) (% imm0))
  [64]    (cmpl ($ 2) (% imm0))
  [67]    (jne L154)
  [69]    (cmpl ($ 263) (@ -6 (% temp1)))
  [76]    (movl (@ -2 (% temp1)) (% imm0))
  [79]    (jne L154)
L81
  [81]    (movl (@ -12 (% ebp)) (% arg_y))
  [84]    (movl (@ -4 (% ebp)) (% temp0))
  [87]    (btrl ($ 2) (@ (% fs) 8))
  [97]    (movl (% arg_y) (% temp1))
  [99]    (movl (% imm0) (@ -2 (% temp0) (% temp1)))
  [103]   (xorl (% temp1) (% temp1))
  [105]   (btsl ($ 2) (@ (% fs) 8))
  [115]   (movl (@ -12 (% ebp)) (% arg_z))
  [118]   (addl ($ 4) (% arg_z))
  [121]   (movl (% arg_z) (@ -12 (% ebp)))
  [124]   (movl (@ -20 (% ebp)) (% arg_z))
  [127]   (addl ($ 4) (% arg_z))
  [130]   (movl (% arg_z) (@ -20 (% ebp)))
L133
  [133]   (movl (@ -12 (% ebp)) (% arg_y))
  [136]   (movl (@ -16 (% ebp)) (% arg_z))
  [139]   (cmpl (% arg_z) (% arg_y))
  [141]   (jl L20)
  [143]   (movl (@ -4 (% ebp)) (% arg_z))
  [146]   (leavel)
  [147]   (retl)
L154
  [154]   (uuo-error-reg-not-type (% temp1) ($ 157))

[20] through [105] are the "interesting" bits.  There seems to be a fair
amount of shifting (sarl) and bit testing (testl, andl, compl, btrl,
btsl, xorl) going on.  I would have thought this chunk of code would
basically be a handful of movl (four to six or so) and that's it.  I
mean I think I lied pretty good here (what with (speed 3) (safety 0) and
loads of type annotation).  Is some (all) of this associated with thread
issues (conditional store in arrays or some such)?

Also, what is the (calll (@ .SPMAKES32)) for?  Again, given the context
of all the lying.

If you change word-type-spec to be fixnum (and change accessors
accordingly) the result is shorter with less testing (though the same
call is there) but no longer works.  By that I mean it corrupts the
destination array with bogus content.

If you change word-type-spec to (UNsigned-byte 32), this also works, but
generates quite a bit more code than the signed version.  I think it was
about another 50 bytes or so in the code vector.

In Allegro, all the lying here pretty much does exactly what you (or at
least I) would expect.  No calls to anything, and just a handful of movl
instructions.  And, of course, it also works.  Actually here's what they
generate:

(disassemble (symbol-function 'replace-int-vec))
;; disassembly of #<Function REPLACE-INT-VEC>
;; formals: vec1 vec2 start1 end1 start2

;; code start: #x48851be4:
   0: 55          pushl	ebp
   1: 8b ec       movl	ebp,esp
   3: 56          pushl	esi
   4: 83 ec 2c    subl	esp,$44
   7: 83 f9 05    cmpl	ecx,$5
  10: 74 02       jz	14
  12: cd 61       int	$97             ; sys::trap-argerr
  14: 80 7f cb 00 cmpb	[edi-53],$0     ; sys::c_interrupt-pending
  18: 74 02       jz	22
  20: cd 64       int	$100            ; sys::trap-signal-hit
  22: eb 31       jmp	73
  24: 89 45 e4    movl	[ebp-28],eax    ; vec1
  27: 8b 45 18    movl	eax,[ebp+24]
  30: 8b 5c 02 f6 movl	ebx,[edx+eax-10]
  34: 8b 45 e4    movl	eax,[ebp-28]    ; vec1
  37: 8b 4d 10    movl	ecx,[ebp+16]
  40: 89 5c 08 f6 movl	[eax+ecx-10],ebx
  44: 8b 5d 10    movl	ebx,[ebp+16]
  47: 83 c3 04    addl	ebx,$4
  50: 8b 45 18    movl	eax,[ebp+24]
  53: 83 c0 04    addl	eax,$4
  56: 80 7f cb 00 cmpb	[edi-53],$0     ; sys::c_interrupt-pending
  60: 74 02       jz	64
  62: cd 64       int	$100            ; sys::trap-signal-hit
  64: 89 45 18    movl	[ebp+24],eax
  67: 8b 45 e4    movl	eax,[ebp-28]    ; vec1
  70: 89 5d 10    movl	[ebp+16],ebx
  73: 8b 5d 10    movl	ebx,[ebp+16]
  76: 3b 5d 14    cmpl	ebx,[ebp+20]
  79: 7c c7       jl	24
  81: f8          clc
  82: c9          leave
  83: 8b 75 fc    movl	esi,[ebp-4]
  86: c3          ret
  87: 90          nop

The bits corresponding to [2]-[105] in CCL are 24-40.

Aside from some edification on why the extra code and the call out to
some routine, I guess I'm wondering if there is any extra lying that
could get CCL to generate something closer to the ACL code.  Note, this
ACL is not using native threads...

Thanks in advance!

/Jon