[Openmcl-devel] Progress toward a CCL-to-ARM64 compiler
Robert Munyer
2420506348 at munyer.com
Mon Mar 4 16:06:52 PST 2024
(responses below are not in-order)
I agree with most of what Tim McNerney wrote, especially this part:
>> perhaps we should
>> discuss as a group “conventions used by ARM64 assembly code that's
>> already recorded in the main CCL repo (e.g., register names and
>> lisp_frame layout from arm64-constants.s).”
Yes! Feel free to design new conventions and throw away old.
I'm not committed to any of it; my code can be changed pretty easily.
I'll get that discussion started, with some of the questions that
caused me to write that I "don't fully understand" GB's intentions:
Why doesn't ARM64 have a register named FN?
Why does arm64-constants.s's lisp_frame include only savevsp and
savelr, when ppc-constants64.s's also includes backlink and savefn?
If anyone wants to offer a patch that replaces e.g. the DEFREGS
form in arm64-arch.lisp, don't worry about "breaking the build"
of my other lisp code; I'll adapt it to match your changes.
>> I have already spotted typos in the
>> checked-in, arm64 instruction tables.
Yes, I noticed those last summer. In fact, that file is the only
significant pre-existing code that's deleted in my ARM64 branch.
> (And just so you know where my own biases lie, I’m
> actually a big fan of high tags.)
I really wanted to implement high tags too. But when I saw how
easy low tags made re-using PPC compiler functions, I reluctantly
concluded that any high-tag attempt probably should be done only
_after_ a low-tag implementation is working and passing all tests.
High tags might be a good dissertation project for someone some day.
-- Robert Munyer
https://ccl-arm64-2023-07.srht.site
On 29 February 2024, Tim McNerney wrote:
> P.S. A friendly word of caution: Be vigilantly on the lookout for
> vestigial remains of the abandoned high-tags scheme before relying on
> any so-called “conventions” spelled out in the arm64 files checked
> into the CCL repository. I write “so-called” and use scare-quotes
> deliberately: because we are taking about details that have not been
> vetted, let alone agreed upon (and agreement is part of the definition
> of conventions). They are merely the noble beginnings of a scheme we
> won’t pursue. (And just so you know where my own biases lie, I’m
> actually a big fan of high tags.) Please forgive me for all this lack
> of subtlety. At my core I am a practical and cautious engineer when it
> comes to standing on the shoulders of giants.
>
> --Tim
>
> On Feb 29, 2024, at 09:29, Tim McNerney <mc at media.mit.edu> wrote:
>
>> Keep in mind—and this has been confirmed by RME himself—that
>> anything you find in the CCL repository related to the arm64 target
>> is an untested sketch. What Matt told me is consistent what I found:
>> that he made a noble, aspirational stab at shifting to a high tags
>> runtime typing scheme, taking advantage of a valuable, documented
>> architectural feature: that the arm64 ignores the upper bits of
>> every 64-bit memory address. Then he abandoned this approach for
>> reasons of practicality and then, alas, had to move on to other
>> projects.
>>
>> With all due respects to any work in progress, perhaps we should
>> discuss as a group “conventions used by ARM64 assembly code that's
>> already recorded in the main CCL repo (e.g., register names and
>> lisp_frame layout from arm64-constants.s).” Yes, these might be
>> totally fine, but there may be uncaught errors, and these decisions
>> should at least be reviewed by extra eyeballs before we build a
>> large body of work on them. I have already spotted typos in the
>> checked-in, arm64 instruction tables. We plan to check in better
>> tables from a verified source plus a work-in-progress disassembler.
>>
>> As for the complexity of the existing CCL compiler, it’s prudent not
>> be too encouraged by a few isolated experiments. Gary B. is a truly
>> brilliant software engineer. It is well known that he worked largely
>> alone and kept vast troves of undocumented, internal knowledge about
>> the compiler in his head. Common Lisp is a very complicated language
>> for compiler writers to tackle, especially when you consider the
>> myriad of existing performance optimizations, only a subset of which
>> are well-exercised “in the wild.”
>>
>> It is my goal to keep the integrity of this magnum opus largely
>> intact. It works. It is stable. Once multiple people start messing
>> with it, we will introduce stealth “corner case” bugs that might
>> remain untested and unfixed for years.
>>
>> Franz, which has a larger user community, recently fixed an obscure,
>> but actually used, interaction between the compiler and the GC that
>> caused a commercial application I am well familiar with to crash
>> about once a month. They had to pull one of their top developers out
>> of retirement to isolate and fix the problem. CCL no longer has this
>> luxury.
>>
>> We need to be methodical and risk-averse on our path forward.
>>
>> --Tim
>>
>> On Feb 28, 2024, at 20:49, Robert Munyer <2420506348 at munyer.com> wrote:
>>
>>>> Tim's right: I don't think GB ever got around to figuring out what
>>>> registers to use in ARM64.
>>>
>>> In the parts of the compiler that I've developed so far, I've just
>>> been adhering to conventions used by ARM64 assembly code that's already
>>> recorded in the main CCL repo (e.g., register names and lisp_frame
>>> layout from arm64-constants.s) but nothing's carved in stone,
>>> I can change that stuff pretty easily.
>>>
>>>> Here's a bit of information on existing ports collected in one place.
>>>>
>>>> This needs more sanity checking, but I think it's pretty close to accurate:
>>>> https://github.com/Clozure/ccl/wiki/Register-Usage-in-CCL-Implementations
>>>>
>>>> And this has been up for a while:
>>>> https://github.com/Clozure/ccl/wiki/Arch-Constant-Values-in-CCL
>>>
>>> Thanks, I will look at those.
>>>
>>>>> My current crazy plan is to write a
>>>>> specialized (*) ppc64 to arm64 translator and use it to convert all
>>>>> the subprims (once)
>>>
>>> I don't have an informed opinion about that (not having paid much
>>> attention to subprimitives yet) but my initial reaction is that it's
>>> a nice idea that's worth trying.
>>>
>>>>> and translate the ppc64 compiler output (on an
>>>>> ongoing basis). I know this isn’t what RME would do, but it seems
>>>>> less risky that doing “open heart surgery” on the compiler.
>>>
>>> I don't think that will be necessary, because the "brain surgery"
>>> is relatively easy. I have found that I can take large functions
>>> from e.g. ppc2.lisp, and make only a few small changes to get them
>>> to work on ARM64 code.
>>>
>>> -- Robert Munyer
>>> https://ccl-arm64-2023-07.srht.site
>>>
>>> On 25 February 2024, Shannon Spires wrote:
>>>
>>>> Tim's right: I don't think GB ever got around to figuring out what
>>>> registers to use in ARM64.
>>>>
>>>> Here's a bit of information on existing ports collected in one place.
>>>>
>>>> This needs more sanity checking, but I think it's pretty close to accurate:
>>>> https://github.com/Clozure/ccl/wiki/Register-Usage-in-CCL-Implementations
>>>>
>>>> And this has been up for a while:
>>>> https://github.com/Clozure/ccl/wiki/Arch-Constant-Values-in-CCL
>>>>
>>>> -SS
>>>>
>>>> On 2/25/24 3:50 PM, Tim McNerney wrote:
>>>>> Thanks for doing this experiment, Robert.
>>>>> Gary B., to the best of my knowledge, never tackled designing
>>>>> register conventions or stack usage for the arm64. This is an open
>>>>> problem for the taking. I haven’t yet searched for CCL documentation
>>>>> on register and stack usage on the PPC64. But my own strategy would
>>>>> be to try to map one into the other with very few changes, kinda
>>>>> like the spirit your experiment. My current crazy plan is to write a
>>>>> specialized (*) ppc64 to arm64 translator and use it to convert all
>>>>> the subprims (once) and translate the ppc64 compiler output (on an
>>>>> ongoing basis). I know this isn’t what RME would do, but it seems
>>>>> less risky that doing “open heart surgery” on the compiler.
>>>>>
>>>>> (*) by specialized meaning it is not a general translated, but
>>>>> rather designed specifically for CCL hand-written assembly language
>>>>> and compiler output, and knows how to rewrite register references
>>>>> based on knowledge of the register and stack conventions for both
>>>>> targets.
>>>>> --Tim
>>>>>
>>>>> On Feb 25, 2024, at 16:05, Robert Munyer <2420506348 at munyer.com> wrote:
>>>>>
>>>>>> I have made some progress toward a CCL-to-ARM64 compiler, by taking
>>>>>> code from the existing CCL-to-PPC64 compiler, and modifying it to emit
>>>>>> ARM64 instruction sequences that resemble ARM64 assembly code that was
>>>>>> checked-in by Gary Byers before 2013-10-22.
>>>>>>
>>>>>> It compiles the body of this function:
>>>>>>
>>>>>> (defun fixnum-fibonacci (n)
>>>>>> (declare (type (mod 24) n)
>>>>>> (optimize (safety 0) (speed 3)))
>>>>>> (do ((a 1 b)
>>>>>> (b 0 (the fixnum (+ a b)))
>>>>>> (n n (1- n)))
>>>>>> ((zerop n) b)
>>>>>> (declare (fixnum a b n))))
>>>>>>
>>>>>> to this ARM64 machine code:
>>>>>>
>>>>>> aa1e03f8 a9bf7bf9 f9402f80 eb2063ff 5400004a d4207d00 f81f8f2f
>>>>>> f81f8f30 f81f8f31 f81f8f32 d2800112 d2800010 f9400f31 14000008
>>>>>> f81f8f30 8b10024f f81f8f2f d1002231 f9400732 f9400330 91004339
>>>>>> f100023f 54ffff01 aa1003ef f9400332 f9400731 f9400b30 a9407bf9
>>>>>> aa1803fe 910043ff d65f03c0
>>>>>>
>>>>>> (hand-disassembled here [1]), which, when pasted into this test
>>>>>> program [3], calculates "fibonacci(23) = 28657".
>>>>>>
>>>>>> If you have an Apple Silicon device with Linux and GCC, I think you
>>>>>> should be able to run the test program on it. (Darwin might also
>>>>>> work, with some tweaking.) Paste the program's code [3] into a text
>>>>>> file named test-fib.s, then enter "gcc test-fib.s" and "./a.out".
>>>>>>
>>>>>> For comparison, here is the result of running the existing PPC64
>>>>>> compiler on the same Fibonacci source code: [2].
>>>>>>
>>>>>> Forge resources (source code, wiki wiki, issue tracker, mailing
>>>>>> lists) are available at https://ccl-arm64-2023-07.srht.site .
>>>>>>
>>>>>> Some disclaimers...
>>>>>>
>>>>>> I have not yet made any effort to make the compiled code thread-safe
>>>>>> or signal-safe or garbage-collection-safe, so I wouldn't expect it to
>>>>>> work correctly in a real CCL kernel.
>>>>>>
>>>>>> I mostly have implemented only enough of the compiler for the Fibonacci
>>>>>> function above, so I wouldn't expect other functions to work correctly.
>>>>>>
>>>>>> I don't fully understand how GB intended ARM64 register assignments
>>>>>> and stack discipline to work, so feedback in those areas would be
>>>>>> especially welcome.
>>>>>>
>>>>>> -- Robert Munyer
>>>>>>
>>>>>> [1] --------
>>>>>>
>>>>>> fib (mov loc-pc lr)
>>>>>> (stp vsp lr (:-@! sp 16))
>>>>>> (ldr imm0 (:+@ rcontext 88))
>>>>>> (cmp sp imm0)
>>>>>> (b.ge l24)
>>>>>> (brk 1000)
>>>>>> l24 (str arg_z (:-@! vsp 8))
>>>>>> (str save0 (:-@! vsp 8))
>>>>>> (str save1 (:-@! vsp 8))
>>>>>> (str save2 (:-@! vsp 8))
>>>>>> (mov save2 '1)
>>>>>> (mov save0 '0)
>>>>>> (ldr save1 (:+@ vsp 24))
>>>>>> (b l84)
>>>>>> l56 (str save0 (:-@! vsp 8))
>>>>>> (add arg_z save2 save0)
>>>>>> (str arg_z (:-@! vsp 8))
>>>>>> (sub save1 save1 '1)
>>>>>> (ldr save2 (:+@ vsp 8))
>>>>>> (ldr save0 (:@ vsp))
>>>>>> (add vsp vsp 16)
>>>>>> l84 (cmp save1 '0)
>>>>>> (b.ne l56)
>>>>>> (mov arg_z save0)
>>>>>> (ldr save2 (:@ vsp))
>>>>>> (ldr save1 (:+@ vsp 8))
>>>>>> (ldr save0 (:+@ vsp 16))
>>>>>> (ldp vsp lr (:@ sp))
>>>>>> (mov lr loc-pc)
>>>>>> (add sp sp 16)
>>>>>> (ret)
>>>>>>
>>>>>> [2] --------
>>>>>>
>>>>>> 0000000000000000 <fib>:
>>>>>> 00: 7d c8 02 a6 mflr loc_pc
>>>>>> 04: f8 21 ff e1 stdu sp,-32(sp)
>>>>>> 08: fa 01 00 08 std fn,8(sp)
>>>>>> 0c: f9 c1 00 10 std loc_pc,16(sp)
>>>>>> 10: f9 e1 00 18 std vsp,24(sp)
>>>>>> 14: 7e 50 93 78 mr fn,nfn
>>>>>> 18: e8 62 00 58 ld imm0,88(rcontext)
>>>>>> 1c: 7c 41 18 88 tdllt sp,imm0
>>>>>> 20: fa ef ff f9 stdu arg_z,-8(vsp)
>>>>>> 24: fb ef ff f9 stdu save0,-8(vsp)
>>>>>> 28: fb cf ff f9 stdu save1,-8(vsp)
>>>>>> 2c: fb af ff f9 stdu save2,-8(vsp)
>>>>>> 30: 3b a0 00 08 li save2,8
>>>>>> 34: 3b e0 00 00 li save0,0
>>>>>> 38: eb cf 00 18 ld save1,24(vsp)
>>>>>> 3c: 48 00 00 20 b 5c <fib+0x5c>
>>>>>> 40: fb ef ff f9 stdu save0,-8(vsp)
>>>>>> 44: 7e fd fa 14 add arg_z,save2,save0
>>>>>> 48: fa ef ff f9 stdu arg_z,-8(vsp)
>>>>>> 4c: 3b de ff f8 addi save1,save1,-8
>>>>>> 50: eb af 00 08 ld save2,8(vsp)
>>>>>> 54: eb ef 00 00 ld save0,0(vsp)
>>>>>> 58: 39 ef 00 10 addi vsp,vsp,16
>>>>>> 5c: 2c 3e 00 00 cmpdi save1,0
>>>>>> 60: 40 82 ff e0 bne 40 <fib+0x40>
>>>>>> 64: 7f f7 fb 78 mr arg_z,save0
>>>>>> 68: eb af 00 00 ld save2,0(vsp)
>>>>>> 6c: eb cf 00 08 ld save1,8(vsp)
>>>>>> 70: eb ef 00 10 ld save0,16(vsp)
>>>>>> 74: e9 c1 00 10 ld loc_pc,16(sp)
>>>>>> 78: e9 e1 00 18 ld vsp,24(sp)
>>>>>> 7c: ea 01 00 08 ld fn,8(sp)
>>>>>> 80: 7d c8 03 a6 mtlr loc_pc
>>>>>> 84: 38 21 00 20 addi sp,sp,32
>>>>>> 88: 4e 80 00 20 blr
>>>>>> 8c: 83 a9 ff e0 lwz save2,-32(allocptr)
>>>>>>
>>>>>> [3] --------
>>>>>>
>>>>>> .global main
>>>>>> .extern printf
>>>>>> .text
>>>>>>
>>>>>> fmt: .asciz "fibonacci(23) = %ld\n"
>>>>>>
>>>>>> .balign 4
>>>>>>
>>>>>> fib: .inst 0xAA1E03F8, 0xA9BF7BF9, 0xF9402F80, 0xEB2063FF, 0x5400004A
>>>>>> .inst 0xD4207D00, 0xF81F8F2F, 0xF81F8F30, 0xF81F8F31, 0xF81F8F32
>>>>>> .inst 0xD2800112, 0xD2800010, 0xF9400F31, 0x14000008, 0xF81F8F30
>>>>>> .inst 0x8B10024F, 0xF81F8F2F, 0xD1002231, 0xF9400732, 0xF9400330
>>>>>> .inst 0x91004339, 0xF100023F, 0x54FFFF01, 0xAA1003EF, 0xF9400332
>>>>>> .inst 0xF9400731, 0xF9400B30, 0xA9407BF9, 0xAA1803FE, 0x910043FF
>>>>>> .inst 0xD65F03C0
>>>>>>
>>>>>> main: mov x0, sp
>>>>>> stp fp, lr, [sp, -64]!
>>>>>> mov fp, sp
>>>>>> stp x24, x25, [sp, -16]!
>>>>>> mov x25, x0
>>>>>> sub x0, sp, 32
>>>>>> stp x0, x28, [sp, -16]!
>>>>>> sub x28, sp, 88
>>>>>> mov x15, 23 << 3
>>>>>> bl fib
>>>>>> asr x1, x15, 3
>>>>>> adr x0, fmt
>>>>>> bl printf
>>>>>> ldp x0, x28, [sp], 16
>>>>>> ldp x24, x25, [sp], 16
>>>>>> ldp fp, lr, [sp], 64
>>>>>> mov x0, 0
>>>>>> ret
More information about the Openmcl-devel
mailing list