[Openmcl-devel] Why does this "cheat"/"lie" not work?...
R. Matthew Emerson
rme at clozure.com
Mon Feb 8 17:44:18 PST 2010
On Feb 5, 2010, at 8:48 PM, Gary Byers wrote:
> On Fri, 5 Feb 2010, Jon S. Anthony wrote:
>
>> L20
>> [20] (movl (@ -8 (% ebp)) (% arg_y))
>> [23] (movl (@ -20 (% ebp)) (% arg_z))
>> [26] (movl (% arg_z) (% imm0))
>> [28] (movl (@ -2 (% arg_y) (% imm0)) (% imm0))
>> [32] (calll (@ .SPMAKES32))
>> [39] (recover-fn)
>> [44] (movl (% arg_z) (% temp1))
>> [46] (movl (% temp1) (% imm0))
>> [48] (sarl ($ 2) (% imm0))
>> [51] (testl ($ 3) (% temp1))
>> [57] (je L81)
>> [59] (movl (% temp1) (% imm0))
>> [61] (andl ($ 3) (% imm0))
>> [64] (cmpl ($ 2) (% imm0))
>> [67] (jne L154)
>> [69] (cmpl ($ 263) (@ -6 (% temp1)))
>> [76] (movl (@ -2 (% temp1)) (% imm0))
>> [79] (jne L154)
>> L81
>> [81] (movl (@ -12 (% ebp)) (% arg_y))
>> [84] (movl (@ -4 (% ebp)) (% temp0))
>> [87] (btrl ($ 2) (@ (% fs) 8))
>> [97] (movl (% arg_y) (% temp1))
>> [99] (movl (% imm0) (@ -2 (% temp0) (% temp1)))
>> [103] (xorl (% temp1) (% temp1))
>> [105] (btsl ($ 2) (@ (% fs) 8))
>> [115] (movl (@ -12 (% ebp)) (% arg_z))
>> [118] (addl ($ 4) (% arg_z))
>> [121] (movl (% arg_z) (@ -12 (% ebp)))
>> [124] (movl (@ -20 (% ebp)) (% arg_z))
>> [127] (addl ($ 4) (% arg_z))
>> [130] (movl (% arg_z) (@ -20 (% ebp)))
>> L133
>> [133] (movl (@ -12 (% ebp)) (% arg_y))
>> [136] (movl (@ -16 (% ebp)) (% arg_z))
>> [139] (cmpl (% arg_z) (% arg_y))
>> [141] (jl L20)
>>
>> [20] through [105] are the "interesting" bits. There seems to be a fair
>> amount of shifting (sarl) and bit testing (testl, andl, compl, btrl,
>> btsl, xorl) going on. I would have thought this chunk of code would
>> basically be a handful of movl (four to six or so) and that's it. I
>> mean I think I lied pretty good here (what with (speed 3) (safety 0) and
>> loads of type annotation). Is some (all) of this associated with thread
>> issues (conditional store in arrays or some such)?
>
> Matt gave a lightning talk at last year's ILC explaining what the bit-setting
> and clearing are all about.
>
> <http://www.thoughtstuff.com/rme/weblog/?p=17>
>
> used to link to his materials from that talk; we changed servers a few
> months ago, and that stuff seems to not have been copied over.
I copied over that information this past weekend, so it should now be available at http://www.clozure.com/~rme/
>> Also, what is the (calll (@ .SPMAKES32)) for? Again, given the context
>> of all the lying.
>
> Um, "lack of support for immediate operations on signed integers of
> the native word size" ?
>
> CCL's support for operations on unboxed integers that fit in a machine
> word in general is poor, but what support exists is oriented towards
> unsigned integers (there's no good reason for excluding signed integers;
> that support just isn't there.)
>
> In x86-64 CCL, the inner part of a loop that copies between two
> vectors of type (SIMPLE-ARRAY (UNSIGNED-BYTE 64) (*)) A and B looks
> like:
>
> ;;; (aref b i)
> L29
> [29] (movq (@ -5 (% save2) (% save0)) (% imm0))
>
> ;;; (setf (aref a i) (aref b i))
> [34] (movq (% imm0) (@ -5 (% save1) (% save0)))
>
> which is fairly reasonable; the analogous case with vectors
> of element type (SIGNED-BYTE 64) is considerably less so: there's
> some completely unnecesary boxing and unboxing between those two
> instructions. I'd expect the x86-32 code be roughly equivalent
> to the code above in the (UNSIGNED-BYTE 32) case and don't know
> why it isn't. Matt's goofing off rather than working on a Friday
> night, so we'll have to wait for the answer.)
The x86-64 version of CCL existed before the 32-bit x86 version, so I was adding 32-bit support to an existing 64-bit backend rather than the (probably more usual) other way around.
There is some compiler support for dealing with elements of type (unsigned-byte 64), which is the native word size on x86-64. The 64-bit backend doesn't do anything special with (unsigned-byte 32) elements---it just boxes the result (which is fairly cheap, since we know it will fit in a fixnum). The code that dealt with (unsigned-byte 32) elements "just worked" on 32-bit systems too (although on 32-bit x86, the boxing isn't necessarily cheap---see the calls to .SPmakeu32). So, there's no deep reason for all the boxing and unboxing on x8632. What can I say? Bad hacker, no cookie.
We can do a little better:
;;; using a slightly patched compiler
(defun copy-u32-vector (src dest)
(declare (type (simple-array (unsigned-byte 32)) src dest)
(optimize (speed 3) (safety 0)))
(dotimes (i (length dest))
(setf (aref dest i) (aref src i))))
;;; (aref src i)
L33
[33] (movl (@ -4 (% ebp)) (% arg_y))
[36] (movl (@ -16 (% ebp)) (% arg_z))
[39] (movl (% arg_z) (% imm0))
[41] (movl (@ -2 (% arg_y) (% imm0)) (% imm0))
;;; (setf (aref dest i) (aref src i))
[45] (movl (@ -16 (% ebp)) (% arg_y))
[48] (movl (@ -8 (% ebp)) (% temp0))
[51] (btrl ($ 2) (@ (% fs) 8))
[61] (movl (% arg_y) (% temp1))
[63] (movl (% imm0) (@ -2 (% temp0) (% temp1)))
[67] (xorl (% temp1) (% temp1))
[69] (btsl ($ 2) (@ (% fs) 8))
;;; (dotimes (i (length dest)) (setf (aref dest i) (aref src i)))
[79] (movl (@ -16 (% ebp)) (% arg_z))
[82] (addl ($ 4) (% arg_z))
[85] (movl (% arg_z) (@ -16 (% ebp)))
L88
[88] (movl (@ -16 (% ebp)) (% arg_y))
[91] (movl (@ -12 (% ebp)) (% arg_z))
[94] (cmpl (% arg_z) (% arg_y))
[96] (jl L33)
There's a lot of stack traffic (can we have some registers here, please?), and we end up doing the mark-as-imm/mark-as-node dance on %temp1 within the loop, but it's less dreadful.
I'll clean these changes up and commit them soon.
More information about the Openmcl-devel
mailing list