[Openmcl-devel] Why does this "cheat"/"lie" not work?...
Jon S. Anthony
j-anthony at comcast.net
Fri Feb 5 15:16:55 PST 2010
Hi,
I'm trying to cheat a bit in some low level sequence copying. The
sequences are at base simple arrays of unsigned byte 8. But in copying
them around I have the following which copies them in "word" size
chunks, where word-size is machine dependent (basically 32 or 64):
(defmacro word-type-spec ()
`(signed-byte ,+word-size+))
(defmacro the-wda (x)
`(the (simple-array (signed-byte ,+word-size+) (*)) ,x))
(defmacro wdref (arr index)
`(the (signed-byte ,+word-size+)
(aref (the-wda ,arr) (the fixnum ,index))))
(defun replace-int-vec (vec1 vec2 start1 end1 start2)
(declare (optimize (speed 3) (safety 0) (space 0))
(type fixnum start1 end1 start2)
(type (simple-array (word-type-spec) (*)) vec1)
(type (simple-array (word-type-spec) (*)) vec2))
(while (fix< start1 end1)
(setf (wdref vec1 start1) (wdref vec2 start2)
start1 (fix+ start1 1)
start2 (fix+ start2 1)))
vec1)
fix+, fix<, are just what you think and not of issue.
This seems to work, but generates some "odd" (to me) code. Also
generates quite a bit more than expected:
? (disassemble (symbol-function 'replace-int-vec))
[0] (recover-fn)
[5] (movl (% ebp) (@ 16 (% esp)))
[9] (leal (@ 16 (% esp)) (% ebp))
[13] (popl (@ 4 (% ebp)))
[16] (pushl (% arg_y))
[17] (pushl (% arg_z))
[18] (jmp L133)
L20
[20] (movl (@ -8 (% ebp)) (% arg_y))
[23] (movl (@ -20 (% ebp)) (% arg_z))
[26] (movl (% arg_z) (% imm0))
[28] (movl (@ -2 (% arg_y) (% imm0)) (% imm0))
[32] (calll (@ .SPMAKES32))
[39] (recover-fn)
[44] (movl (% arg_z) (% temp1))
[46] (movl (% temp1) (% imm0))
[48] (sarl ($ 2) (% imm0))
[51] (testl ($ 3) (% temp1))
[57] (je L81)
[59] (movl (% temp1) (% imm0))
[61] (andl ($ 3) (% imm0))
[64] (cmpl ($ 2) (% imm0))
[67] (jne L154)
[69] (cmpl ($ 263) (@ -6 (% temp1)))
[76] (movl (@ -2 (% temp1)) (% imm0))
[79] (jne L154)
L81
[81] (movl (@ -12 (% ebp)) (% arg_y))
[84] (movl (@ -4 (% ebp)) (% temp0))
[87] (btrl ($ 2) (@ (% fs) 8))
[97] (movl (% arg_y) (% temp1))
[99] (movl (% imm0) (@ -2 (% temp0) (% temp1)))
[103] (xorl (% temp1) (% temp1))
[105] (btsl ($ 2) (@ (% fs) 8))
[115] (movl (@ -12 (% ebp)) (% arg_z))
[118] (addl ($ 4) (% arg_z))
[121] (movl (% arg_z) (@ -12 (% ebp)))
[124] (movl (@ -20 (% ebp)) (% arg_z))
[127] (addl ($ 4) (% arg_z))
[130] (movl (% arg_z) (@ -20 (% ebp)))
L133
[133] (movl (@ -12 (% ebp)) (% arg_y))
[136] (movl (@ -16 (% ebp)) (% arg_z))
[139] (cmpl (% arg_z) (% arg_y))
[141] (jl L20)
[143] (movl (@ -4 (% ebp)) (% arg_z))
[146] (leavel)
[147] (retl)
L154
[154] (uuo-error-reg-not-type (% temp1) ($ 157))
[20] through [105] are the "interesting" bits. There seems to be a fair
amount of shifting (sarl) and bit testing (testl, andl, compl, btrl,
btsl, xorl) going on. I would have thought this chunk of code would
basically be a handful of movl (four to six or so) and that's it. I
mean I think I lied pretty good here (what with (speed 3) (safety 0) and
loads of type annotation). Is some (all) of this associated with thread
issues (conditional store in arrays or some such)?
Also, what is the (calll (@ .SPMAKES32)) for? Again, given the context
of all the lying.
If you change word-type-spec to be fixnum (and change accessors
accordingly) the result is shorter with less testing (though the same
call is there) but no longer works. By that I mean it corrupts the
destination array with bogus content.
If you change word-type-spec to (UNsigned-byte 32), this also works, but
generates quite a bit more code than the signed version. I think it was
about another 50 bytes or so in the code vector.
In Allegro, all the lying here pretty much does exactly what you (or at
least I) would expect. No calls to anything, and just a handful of movl
instructions. And, of course, it also works. Actually here's what they
generate:
(disassemble (symbol-function 'replace-int-vec))
;; disassembly of #<Function REPLACE-INT-VEC>
;; formals: vec1 vec2 start1 end1 start2
;; code start: #x48851be4:
0: 55 pushl ebp
1: 8b ec movl ebp,esp
3: 56 pushl esi
4: 83 ec 2c subl esp,$44
7: 83 f9 05 cmpl ecx,$5
10: 74 02 jz 14
12: cd 61 int $97 ; sys::trap-argerr
14: 80 7f cb 00 cmpb [edi-53],$0 ; sys::c_interrupt-pending
18: 74 02 jz 22
20: cd 64 int $100 ; sys::trap-signal-hit
22: eb 31 jmp 73
24: 89 45 e4 movl [ebp-28],eax ; vec1
27: 8b 45 18 movl eax,[ebp+24]
30: 8b 5c 02 f6 movl ebx,[edx+eax-10]
34: 8b 45 e4 movl eax,[ebp-28] ; vec1
37: 8b 4d 10 movl ecx,[ebp+16]
40: 89 5c 08 f6 movl [eax+ecx-10],ebx
44: 8b 5d 10 movl ebx,[ebp+16]
47: 83 c3 04 addl ebx,$4
50: 8b 45 18 movl eax,[ebp+24]
53: 83 c0 04 addl eax,$4
56: 80 7f cb 00 cmpb [edi-53],$0 ; sys::c_interrupt-pending
60: 74 02 jz 64
62: cd 64 int $100 ; sys::trap-signal-hit
64: 89 45 18 movl [ebp+24],eax
67: 8b 45 e4 movl eax,[ebp-28] ; vec1
70: 89 5d 10 movl [ebp+16],ebx
73: 8b 5d 10 movl ebx,[ebp+16]
76: 3b 5d 14 cmpl ebx,[ebp+20]
79: 7c c7 jl 24
81: f8 clc
82: c9 leave
83: 8b 75 fc movl esi,[ebp-4]
86: c3 ret
87: 90 nop
The bits corresponding to [2]-[105] in CCL are 24-40.
Aside from some edification on why the extra code and the call out to
some routine, I guess I'm wondering if there is any extra lying that
could get CCL to generate something closer to the ACL code. Note, this
ACL is not using native threads...
Thanks in advance!
/Jon
More information about the Openmcl-devel
mailing list