[Openmcl-devel] Progress toward a CCL-to-ARM64 compiler

Mon Mar 4 16:06:52 PST 2024

(responses below are not in-order)

I agree with most of what Tim McNerney wrote, especially this part:

>> perhaps we should
>> discuss as a group “conventions used by ARM64 assembly code that's
>> already recorded in the main CCL repo (e.g., register names and
>> lisp_frame layout from arm64-constants.s).”

Yes!  Feel free to design new conventions and throw away old.
I'm not committed to any of it; my code can be changed pretty easily.

I'll get that discussion started, with some of the questions that
caused me to write that I "don't fully understand" GB's intentions:

Why doesn't ARM64 have a register named FN?

Why does arm64-constants.s's lisp_frame include only savevsp and
savelr, when ppc-constants64.s's also includes backlink and savefn?

If anyone wants to offer a patch that replaces e.g. the DEFREGS
form in arm64-arch.lisp, don't worry about "breaking the build"
of my other lisp code; I'll adapt it to match your changes.

>> I have already spotted typos in the
>> checked-in, arm64 instruction tables.

Yes, I noticed those last summer.  In fact, that file is the only
significant pre-existing code that's deleted in my ARM64 branch.

> (And just so you know where my own biases lie, I’m
> actually a big fan of high tags.)

I really wanted to implement high tags too.  But when I saw how
easy low tags made re-using PPC compiler functions, I reluctantly
concluded that any high-tag attempt probably should be done only
_after_ a low-tag implementation is working and passing all tests.
High tags might be a good dissertation project for someone some day.

-- Robert Munyer
https://ccl-arm64-2023-07.srht.site

On 29 February 2024, Tim McNerney wrote:

> P.S. A friendly word of caution: Be vigilantly on the lookout for
> vestigial remains of the abandoned high-tags scheme before relying on
> any so-called “conventions” spelled out in the arm64 files checked
> into the CCL repository. I write “so-called” and use scare-quotes
> deliberately: because we are taking about details that have not been
> vetted, let alone agreed upon (and agreement is part of the definition
> of conventions). They are merely the noble beginnings of a scheme we
> won’t pursue. (And just so you know where my own biases lie, I’m
> actually a big fan of high tags.) Please forgive me for all this lack
> of subtlety. At my core I am a practical and cautious engineer when it
> comes to standing on the shoulders of giants.
>
> --Tim
>
> On Feb 29, 2024, at 09:29, Tim McNerney <mc at media.mit.edu> wrote:
>
>> Keep in mind—and this has been confirmed by RME himself—that
>> anything you find in the CCL repository related to the arm64 target
>> is an untested sketch. What Matt told me is consistent what I found:
>> that he made a noble, aspirational stab at shifting to a high tags
>> runtime typing scheme, taking advantage of a valuable, documented
>> architectural feature: that the arm64 ignores the upper bits of
>> every 64-bit memory address. Then he abandoned this approach for
>> reasons of practicality and then, alas, had to move on to other
>> projects.
>>
>> With all due respects to any work in progress, perhaps we should
>> discuss as a group “conventions used by ARM64 assembly code that's
>> already recorded in the main CCL repo (e.g., register names and
>> lisp_frame layout from arm64-constants.s).” Yes, these might be
>> totally fine, but there may be uncaught errors, and these decisions
>> should at least be reviewed by extra eyeballs before we build a
>> large body of work on them. I have already spotted typos in the
>> checked-in, arm64 instruction tables. We plan to check in better
>> tables from a verified source plus a work-in-progress disassembler.
>>
>> As for the complexity of the existing CCL compiler, it’s prudent not
>> be too encouraged by a few isolated experiments. Gary B. is a truly
>> brilliant software engineer. It is well known that he worked largely
>> alone and kept vast troves of undocumented, internal knowledge about
>> the compiler in his head. Common Lisp is a very complicated language
>> for compiler writers to tackle, especially when you consider the
>> myriad of existing performance optimizations, only a subset of which
>> are well-exercised “in the wild.”
>>
>> It is my goal to keep the integrity of this magnum opus largely
>> intact. It works. It is stable. Once multiple people start messing
>> with it, we will introduce stealth “corner case” bugs that might
>> remain untested and unfixed for years.
>>
>> Franz, which has a larger user community, recently fixed an obscure,
>> but actually used, interaction between the compiler and the GC that
>> caused a commercial application I am well familiar with to crash
>> about once a month. They had to pull one of their top developers out
>> of retirement to isolate and fix the problem. CCL no longer has this
>> luxury.
>>
>> We need to be methodical and risk-averse on our path forward.
>>
>> --Tim
>>
>> On Feb 28, 2024, at 20:49, Robert Munyer <2420506348 at munyer.com> wrote:
>>
>>>> Tim's right: I don't think GB ever got around to figuring out what
>>>> registers to use in ARM64.
>>>
>>> In the parts of the compiler that I've developed so far, I've just
>>> been adhering to conventions used by ARM64 assembly code that's already
>>> recorded in the main CCL repo (e.g., register names and lisp_frame
>>> layout from arm64-constants.s) but nothing's carved in stone,
>>> I can change that stuff pretty easily.
>>>
>>>> Here's a bit of information on existing ports collected in one place.
>>>>
>>>> This needs more sanity checking, but I think it's pretty close to accurate:
>>>> https://github.com/Clozure/ccl/wiki/Register-Usage-in-CCL-Implementations
>>>>
>>>> And this has been up for a while:
>>>> https://github.com/Clozure/ccl/wiki/Arch-Constant-Values-in-CCL
>>>
>>> Thanks, I will look at those.
>>>
>>>>> My current crazy plan is to write a
>>>>> specialized (*) ppc64 to arm64 translator and use it to convert all
>>>>> the subprims (once)
>>>
>>> I don't have an informed opinion about that (not having paid much
>>> attention to subprimitives yet) but my initial reaction is that it's
>>> a nice idea that's worth trying.
>>>
>>>>> and translate the ppc64 compiler output (on an
>>>>> ongoing basis). I know this isn’t what RME would do, but it seems
>>>>> less risky that doing “open heart surgery” on the compiler.
>>>
>>> I don't think that will be necessary, because the "brain surgery"
>>> is relatively easy.  I have found that I can take large functions
>>> from e.g. ppc2.lisp, and make only a few small changes to get them
>>> to work on ARM64 code.
>>>
>>> -- Robert Munyer
>>> https://ccl-arm64-2023-07.srht.site
>>>
>>> On 25 February 2024, Shannon Spires wrote:
>>>
>>>> Tim's right: I don't think GB ever got around to figuring out what
>>>> registers to use in ARM64.
>>>>
>>>> Here's a bit of information on existing ports collected in one place.
>>>>
>>>> This needs more sanity checking, but I think it's pretty close to accurate:
>>>> https://github.com/Clozure/ccl/wiki/Register-Usage-in-CCL-Implementations
>>>>
>>>> And this has been up for a while:
>>>> https://github.com/Clozure/ccl/wiki/Arch-Constant-Values-in-CCL
>>>>
>>>> -SS
>>>>
>>>> On 2/25/24 3:50 PM, Tim McNerney wrote:
>>>>> Thanks for doing this experiment, Robert.
>>>>> Gary B., to the best of my knowledge, never tackled designing
>>>>> register conventions or stack usage for the arm64. This is an open
>>>>> problem for the taking. I haven’t yet searched for CCL documentation
>>>>> on register and stack usage on the PPC64. But my own strategy would
>>>>> be to try to map one into the other with very few changes, kinda
>>>>> like the spirit your experiment. My current crazy plan is to write a
>>>>> specialized (*) ppc64 to arm64 translator and use it to convert all
>>>>> the subprims (once) and translate the ppc64 compiler output (on an
>>>>> ongoing basis). I know this isn’t what RME would do, but it seems
>>>>> less risky that doing “open heart surgery” on the compiler.
>>>>>
>>>>> (*) by specialized meaning it is not a general translated, but
>>>>> rather designed specifically for CCL hand-written assembly language
>>>>> and compiler output, and knows how to rewrite register references
>>>>> based on knowledge of the register and stack conventions for both
>>>>> targets.
>>>>> --Tim
>>>>>
>>>>> On Feb 25, 2024, at 16:05, Robert Munyer <2420506348 at munyer.com> wrote:
>>>>>
>>>>>> I have made some progress toward a CCL-to-ARM64 compiler, by taking
>>>>>> code from the existing CCL-to-PPC64 compiler, and modifying it to emit
>>>>>> ARM64 instruction sequences that resemble ARM64 assembly code that was
>>>>>> checked-in by Gary Byers before 2013-10-22.
>>>>>>
>>>>>> It compiles the body of this function:
>>>>>>
>>>>>>   (defun fixnum-fibonacci (n)
>>>>>>     (declare (type (mod 24) n)
>>>>>>              (optimize (safety 0) (speed 3)))
>>>>>>     (do ((a 1 b)
>>>>>>          (b 0 (the fixnum (+ a b)))
>>>>>>          (n n (1- n)))
>>>>>>         ((zerop n) b)
>>>>>>       (declare (fixnum a b n))))
>>>>>>
>>>>>> to this ARM64 machine code:
>>>>>>
>>>>>>   aa1e03f8 a9bf7bf9 f9402f80 eb2063ff 5400004a d4207d00 f81f8f2f
>>>>>>   f81f8f30 f81f8f31 f81f8f32 d2800112 d2800010 f9400f31 14000008
>>>>>>   f81f8f30 8b10024f f81f8f2f d1002231 f9400732 f9400330 91004339
>>>>>>   f100023f 54ffff01 aa1003ef f9400332 f9400731 f9400b30 a9407bf9
>>>>>>   aa1803fe 910043ff d65f03c0
>>>>>>
>>>>>> (hand-disassembled here [1]), which, when pasted into this test
>>>>>> program [3], calculates "fibonacci(23) = 28657".
>>>>>>
>>>>>> If you have an Apple Silicon device with Linux and GCC, I think you
>>>>>> should be able to run the test program on it.  (Darwin might also
>>>>>> work, with some tweaking.)  Paste the program's code [3] into a text
>>>>>> file named test-fib.s, then enter "gcc test-fib.s" and "./a.out".
>>>>>>
>>>>>> For comparison, here is the result of running the existing PPC64
>>>>>> compiler on the same Fibonacci source code: [2].
>>>>>>
>>>>>> Forge resources (source code, wiki wiki, issue tracker, mailing
>>>>>> lists) are available at https://ccl-arm64-2023-07.srht.site .
>>>>>>
>>>>>> Some disclaimers...
>>>>>>
>>>>>> I have not yet made any effort to make the compiled code thread-safe
>>>>>> or signal-safe or garbage-collection-safe, so I wouldn't expect it to
>>>>>> work correctly in a real CCL kernel.
>>>>>>
>>>>>> I mostly have implemented only enough of the compiler for the Fibonacci
>>>>>> function above, so I wouldn't expect other functions to work correctly.
>>>>>>
>>>>>> I don't fully understand how GB intended ARM64 register assignments
>>>>>> and stack discipline to work, so feedback in those areas would be
>>>>>> especially welcome.
>>>>>>
>>>>>> -- Robert Munyer
>>>>>>
>>>>>> [1] --------
>>>>>>
>>>>>> fib     (mov    loc-pc lr)
>>>>>>         (stp    vsp lr (:-@! sp 16))
>>>>>>         (ldr    imm0 (:+@ rcontext 88))
>>>>>>         (cmp    sp imm0)
>>>>>>         (b.ge   l24)
>>>>>>         (brk    1000)
>>>>>> l24     (str    arg_z (:-@! vsp 8))
>>>>>>         (str    save0 (:-@! vsp 8))
>>>>>>         (str    save1 (:-@! vsp 8))
>>>>>>         (str    save2 (:-@! vsp 8))
>>>>>>         (mov    save2 '1)
>>>>>>         (mov    save0 '0)
>>>>>>         (ldr    save1 (:+@ vsp 24))
>>>>>>         (b      l84)
>>>>>> l56     (str    save0 (:-@! vsp 8))
>>>>>>         (add    arg_z save2 save0)
>>>>>>         (str    arg_z (:-@! vsp 8))
>>>>>>         (sub    save1 save1 '1)
>>>>>>         (ldr    save2 (:+@ vsp 8))
>>>>>>         (ldr    save0 (:@ vsp))
>>>>>>         (add    vsp vsp 16)
>>>>>> l84     (cmp    save1 '0)
>>>>>>         (b.ne   l56)
>>>>>>         (mov    arg_z save0)
>>>>>>         (ldr    save2 (:@ vsp))
>>>>>>         (ldr    save1 (:+@ vsp 8))
>>>>>>         (ldr    save0 (:+@ vsp 16))
>>>>>>         (ldp    vsp lr (:@ sp))
>>>>>>         (mov    lr loc-pc)
>>>>>>         (add    sp sp 16)
>>>>>>         (ret)
>>>>>>
>>>>>> [2] --------
>>>>>>
>>>>>> 0000000000000000 <fib>:
>>>>>>   00:   7d c8 02 a6     mflr    loc_pc
>>>>>>   04:   f8 21 ff e1     stdu    sp,-32(sp)
>>>>>>   08:   fa 01 00 08     std     fn,8(sp)
>>>>>>   0c:   f9 c1 00 10     std     loc_pc,16(sp)
>>>>>>   10:   f9 e1 00 18     std     vsp,24(sp)
>>>>>>   14:   7e 50 93 78     mr      fn,nfn
>>>>>>   18:   e8 62 00 58     ld      imm0,88(rcontext)
>>>>>>   1c:   7c 41 18 88     tdllt   sp,imm0
>>>>>>   20:   fa ef ff f9     stdu    arg_z,-8(vsp)
>>>>>>   24:   fb ef ff f9     stdu    save0,-8(vsp)
>>>>>>   28:   fb cf ff f9     stdu    save1,-8(vsp)
>>>>>>   2c:   fb af ff f9     stdu    save2,-8(vsp)
>>>>>>   30:   3b a0 00 08     li      save2,8
>>>>>>   34:   3b e0 00 00     li      save0,0
>>>>>>   38:   eb cf 00 18     ld      save1,24(vsp)
>>>>>>   3c:   48 00 00 20     b       5c <fib+0x5c>
>>>>>>   40:   fb ef ff f9     stdu    save0,-8(vsp)
>>>>>>   44:   7e fd fa 14     add     arg_z,save2,save0
>>>>>>   48:   fa ef ff f9     stdu    arg_z,-8(vsp)
>>>>>>   4c:   3b de ff f8     addi    save1,save1,-8
>>>>>>   50:   eb af 00 08     ld      save2,8(vsp)
>>>>>>   54:   eb ef 00 00     ld      save0,0(vsp)
>>>>>>   58:   39 ef 00 10     addi    vsp,vsp,16
>>>>>>   5c:   2c 3e 00 00     cmpdi   save1,0
>>>>>>   60:   40 82 ff e0     bne     40 <fib+0x40>
>>>>>>   64:   7f f7 fb 78     mr      arg_z,save0
>>>>>>   68:   eb af 00 00     ld      save2,0(vsp)
>>>>>>   6c:   eb cf 00 08     ld      save1,8(vsp)
>>>>>>   70:   eb ef 00 10     ld      save0,16(vsp)
>>>>>>   74:   e9 c1 00 10     ld      loc_pc,16(sp)
>>>>>>   78:   e9 e1 00 18     ld      vsp,24(sp)
>>>>>>   7c:   ea 01 00 08     ld      fn,8(sp)
>>>>>>   80:   7d c8 03 a6     mtlr    loc_pc
>>>>>>   84:   38 21 00 20     addi    sp,sp,32
>>>>>>   88:   4e 80 00 20     blr
>>>>>>   8c:   83 a9 ff e0     lwz     save2,-32(allocptr)
>>>>>>
>>>>>> [3] --------
>>>>>>
>>>>>>         .global main
>>>>>>         .extern printf
>>>>>>         .text
>>>>>>
>>>>>> fmt:    .asciz  "fibonacci(23) = %ld\n"
>>>>>>
>>>>>>         .balign 4
>>>>>>
>>>>>> fib:    .inst   0xAA1E03F8, 0xA9BF7BF9, 0xF9402F80, 0xEB2063FF, 0x5400004A
>>>>>>         .inst   0xD4207D00, 0xF81F8F2F, 0xF81F8F30, 0xF81F8F31, 0xF81F8F32
>>>>>>         .inst   0xD2800112, 0xD2800010, 0xF9400F31, 0x14000008, 0xF81F8F30
>>>>>>         .inst   0x8B10024F, 0xF81F8F2F, 0xD1002231, 0xF9400732, 0xF9400330
>>>>>>         .inst   0x91004339, 0xF100023F, 0x54FFFF01, 0xAA1003EF, 0xF9400332
>>>>>>         .inst   0xF9400731, 0xF9400B30, 0xA9407BF9, 0xAA1803FE, 0x910043FF
>>>>>>         .inst   0xD65F03C0
>>>>>>
>>>>>> main:   mov     x0, sp
>>>>>>         stp     fp, lr, [sp, -64]!
>>>>>>         mov     fp, sp
>>>>>>         stp     x24, x25, [sp, -16]!
>>>>>>         mov     x25, x0
>>>>>>         sub     x0, sp, 32
>>>>>>         stp     x0, x28, [sp, -16]!
>>>>>>         sub     x28, sp, 88
>>>>>>         mov     x15, 23 << 3
>>>>>>         bl      fib
>>>>>>         asr     x1, x15, 3
>>>>>>         adr     x0, fmt
>>>>>>         bl      printf
>>>>>>         ldp     x0, x28, [sp], 16
>>>>>>         ldp     x24, x25, [sp], 16
>>>>>>         ldp     fp, lr, [sp], 64
>>>>>>         mov     x0, 0
>>>>>>         ret