[Openmcl-devel] Progress toward a CCL-to-ARM64 compiler

Thu Feb 29 07:11:29 PST 2024

P.S. A friendly word of caution: Be vigilantly on the lookout for vestigial remains of the abandoned high-tags scheme before relying on any so-called “conventions” spelled out in the arm64 files checked into the CCL repository. I write “so-called” and use scare-quotes deliberately: because we are taking about details that have not been vetted, let alone agreed upon (and agreement is part of the definition of conventions). They are merely the noble beginnings of a scheme we won’t pursue. (And just so you know where my own biases lie, I’m actually a big fan of high tags.) Please forgive me for all this lack of subtlety. At my core I am a practical and cautious engineer when it comes to standing on the shoulders of giants.

--Tim

> On Feb 29, 2024, at 09:29, Tim McNerney <mc at media.mit.edu> wrote:
> 
> Keep in mind—and this has been confirmed by RME himself—that anything you find in the CCL repository related to the arm64 target is an untested sketch. What Matt told me is consistent what I found: that he made a noble, aspirational stab at shifting to a high tags runtime typing scheme, taking advantage of a valuable, documented architectural feature: that the arm64 ignores the upper bits of every 64-bit memory address. Then he abandoned this approach for reasons of practicality and then, alas, had to move on to other projects.
> 
> With all due respects to any work in progress, perhaps we should discuss as a group “conventions used by ARM64 assembly code that's already recorded in the main CCL repo (e.g., register names and lisp_frame layout from arm64-constants.s).” Yes, these might be totally fine, but there may be uncaught errors, and these decisions should at least be reviewed by extra eyeballs before we build a large body of work on them. I have already spotted typos in the checked-in, arm64 instruction tables. We plan to check in better tables from a verified source plus a work-in-progress disassembler. 
> 
> As for the complexity of the existing CCL compiler, it’s prudent not be too encouraged by a few isolated experiments. Gary B. is a truly brilliant software engineer. It is well known that he worked largely alone and kept vast troves of undocumented, internal knowledge about the compiler in his head. Common Lisp is a very complicated language for compiler writers to tackle, especially when you consider the myriad of existing performance optimizations, only a subset of which are well-exercised “in the wild.”
> 
> It is my goal to keep the integrity of this magnum opus largely intact. It works. It is stable. Once multiple people start messing with it, we will introduce stealth “corner case” bugs that might remain untested and unfixed for years.
> 
> Franz, which has a larger user community, recently fixed an obscure, but actually used, interaction between the compiler and the GC that caused a commercial application I am well familiar with to crash about once a month. They had to pull one of their top developers out of retirement to isolate and fix the problem. CCL no longer has this luxury. 
> 
> We need to be methodical and risk-averse on our path forward. 
> 
> --Tim
> 
>>> On Feb 28, 2024, at 20:49, Robert Munyer <2420506348 at munyer.com> wrote:
>>> 
>> 
>>> 
>>> Tim's right: I don't think GB ever got around to figuring out what
>>> registers to use in ARM64.
>> 
>> In the parts of the compiler that I've developed so far, I've just
>> been adhering to conventions used by ARM64 assembly code that's already
>> recorded in the main CCL repo (e.g., register names and lisp_frame
>> layout from arm64-constants.s) but nothing's carved in stone,
>> I can change that stuff pretty easily.
>> 
>>> Here's a bit of information on existing ports collected in one place.
>>> 
>>> This needs more sanity checking, but I think it's pretty close to accurate:
>>> https://github.com/Clozure/ccl/wiki/Register-Usage-in-CCL-Implementations
>>> 
>>> And this has been up for a while:
>>> https://github.com/Clozure/ccl/wiki/Arch-Constant-Values-in-CCL
>> 
>> Thanks, I will look at those.
>> 
>>>> My current crazy plan is to write a
>>>> specialized (*) ppc64 to arm64 translator and use it to convert all
>>>> the subprims (once)
>> 
>> I don't have an informed opinion about that (not having paid much
>> attention to subprimitives yet) but my initial reaction is that it's
>> a nice idea that's worth trying.
>> 
>>>> and translate the ppc64 compiler output (on an
>>>> ongoing basis). I know this isn’t what RME would do, but it seems
>>>> less risky that doing “open heart surgery” on the compiler.
>> 
>> I don't think that will be necessary, because the "brain surgery"
>> is relatively easy.  I have found that I can take large functions
>> from e.g. ppc2.lisp, and make only a few small changes to get them
>> to work on ARM64 code.
>> 
>> -- Robert Munyer
>> https://ccl-arm64-2023-07.srht.site
>> 
>>> On 25 February 2024, Shannon Spires wrote:
>>> 
>>> Tim's right: I don't think GB ever got around to figuring out what
>>> registers to use in ARM64.
>>> 
>>> Here's a bit of information on existing ports collected in one place.
>>> 
>>> This needs more sanity checking, but I think it's pretty close to accurate:
>>> https://github.com/Clozure/ccl/wiki/Register-Usage-in-CCL-Implementations
>>> 
>>> And this has been up for a while:
>>> https://github.com/Clozure/ccl/wiki/Arch-Constant-Values-in-CCL
>>> 
>>> -SS
>>> 
>>>> On 2/25/24 3:50 PM, Tim McNerney wrote:
>>>> Thanks for doing this experiment, Robert.
>>>> Gary B., to the best of my knowledge, never tackled designing
>>>> register conventions or stack usage for the arm64. This is an open
>>>> problem for the taking. I haven’t yet searched for CCL documentation
>>>> on register and stack usage on the PPC64. But my own strategy would
>>>> be to try to map one into the other with very few changes, kinda
>>>> like the spirit your experiment. My current crazy plan is to write a
>>>> specialized (*) ppc64 to arm64 translator and use it to convert all
>>>> the subprims (once) and translate the ppc64 compiler output (on an
>>>> ongoing basis). I know this isn’t what RME would do, but it seems
>>>> less risky that doing “open heart surgery” on the compiler.
>>>> 
>>>> (*) by specialized meaning it is not a general translated, but
>>>> rather designed specifically for CCL hand-written assembly language
>>>> and compiler output, and knows how to rewrite register references
>>>> based on knowledge of the register and stack conventions for both
>>>> targets.
>>>> --Tim
>>>> 
>>>>> On Feb 25, 2024, at 16:05, Robert Munyer <2420506348 at munyer.com> wrote:
>>>>> 
>>>>> I have made some progress toward a CCL-to-ARM64 compiler, by taking
>>>>> code from the existing CCL-to-PPC64 compiler, and modifying it to emit
>>>>> ARM64 instruction sequences that resemble ARM64 assembly code that was
>>>>> checked-in by Gary Byers before 2013-10-22.
>>>>> 
>>>>> It compiles the body of this function:
>>>>> 
>>>>>  (defun fixnum-fibonacci (n)
>>>>>    (declare (type (mod 24) n)
>>>>>             (optimize (safety 0) (speed 3)))
>>>>>    (do ((a 1 b)
>>>>>         (b 0 (the fixnum (+ a b)))
>>>>>         (n n (1- n)))
>>>>>        ((zerop n) b)
>>>>>      (declare (fixnum a b n))))
>>>>> 
>>>>> to this ARM64 machine code:
>>>>> 
>>>>>  aa1e03f8 a9bf7bf9 f9402f80 eb2063ff 5400004a d4207d00 f81f8f2f
>>>>>  f81f8f30 f81f8f31 f81f8f32 d2800112 d2800010 f9400f31 14000008
>>>>>  f81f8f30 8b10024f f81f8f2f d1002231 f9400732 f9400330 91004339
>>>>>  f100023f 54ffff01 aa1003ef f9400332 f9400731 f9400b30 a9407bf9
>>>>>  aa1803fe 910043ff d65f03c0
>>>>> 
>>>>> (hand-disassembled here [1]), which, when pasted into this test
>>>>> program [3], calculates "fibonacci(23) = 28657".
>>>>> 
>>>>> If you have an Apple Silicon device with Linux and GCC, I think you
>>>>> should be able to run the test program on it.  (Darwin might also
>>>>> work, with some tweaking.)  Paste the program's code [3] into a text
>>>>> file named test-fib.s, then enter "gcc test-fib.s" and "./a.out".
>>>>> 
>>>>> For comparison, here is the result of running the existing PPC64
>>>>> compiler on the same Fibonacci source code: [2].
>>>>> 
>>>>> Forge resources (source code, wiki wiki, issue tracker, mailing
>>>>> lists) are available at https://ccl-arm64-2023-07.srht.site .
>>>>> 
>>>>> Some disclaimers...
>>>>> 
>>>>> I have not yet made any effort to make the compiled code thread-safe
>>>>> or signal-safe or garbage-collection-safe, so I wouldn't expect it to
>>>>> work correctly in a real CCL kernel.
>>>>> 
>>>>> I mostly have implemented only enough of the compiler for the Fibonacci
>>>>> function above, so I wouldn't expect other functions to work correctly.
>>>>> 
>>>>> I don't fully understand how GB intended ARM64 register assignments
>>>>> and stack discipline to work, so feedback in those areas would be
>>>>> especially welcome.
>>>>> 
>>>>> -- Robert Munyer
>>>>> 
>>>>> [1] --------
>>>>> 
>>>>> fib     (mov    loc-pc lr)
>>>>>        (stp    vsp lr (:-@! sp 16))
>>>>>        (ldr    imm0 (:+@ rcontext 88))
>>>>>        (cmp    sp imm0)
>>>>>        (b.ge   l24)
>>>>>        (brk    1000)
>>>>> l24     (str    arg_z (:-@! vsp 8))
>>>>>        (str    save0 (:-@! vsp 8))
>>>>>        (str    save1 (:-@! vsp 8))
>>>>>        (str    save2 (:-@! vsp 8))
>>>>>        (mov    save2 '1)
>>>>>        (mov    save0 '0)
>>>>>        (ldr    save1 (:+@ vsp 24))
>>>>>        (b      l84)
>>>>> l56     (str    save0 (:-@! vsp 8))
>>>>>        (add    arg_z save2 save0)
>>>>>        (str    arg_z (:-@! vsp 8))
>>>>>        (sub    save1 save1 '1)
>>>>>        (ldr    save2 (:+@ vsp 8))
>>>>>        (ldr    save0 (:@ vsp))
>>>>>        (add    vsp vsp 16)
>>>>> l84     (cmp    save1 '0)
>>>>>        (b.ne   l56)
>>>>>        (mov    arg_z save0)
>>>>>        (ldr    save2 (:@ vsp))
>>>>>        (ldr    save1 (:+@ vsp 8))
>>>>>        (ldr    save0 (:+@ vsp 16))
>>>>>        (ldp    vsp lr (:@ sp))
>>>>>        (mov    lr loc-pc)
>>>>>        (add    sp sp 16)
>>>>>        (ret)
>>>>> 
>>>>> [2] --------
>>>>> 
>>>>> 0000000000000000 <fib>:
>>>>>  00:   7d c8 02 a6     mflr    loc_pc
>>>>>  04:   f8 21 ff e1     stdu    sp,-32(sp)
>>>>>  08:   fa 01 00 08     std     fn,8(sp)
>>>>>  0c:   f9 c1 00 10     std     loc_pc,16(sp)
>>>>>  10:   f9 e1 00 18     std     vsp,24(sp)
>>>>>  14:   7e 50 93 78     mr      fn,nfn
>>>>>  18:   e8 62 00 58     ld      imm0,88(rcontext)
>>>>>  1c:   7c 41 18 88     tdllt   sp,imm0
>>>>>  20:   fa ef ff f9     stdu    arg_z,-8(vsp)
>>>>>  24:   fb ef ff f9     stdu    save0,-8(vsp)
>>>>>  28:   fb cf ff f9     stdu    save1,-8(vsp)
>>>>>  2c:   fb af ff f9     stdu    save2,-8(vsp)
>>>>>  30:   3b a0 00 08     li      save2,8
>>>>>  34:   3b e0 00 00     li      save0,0
>>>>>  38:   eb cf 00 18     ld      save1,24(vsp)
>>>>>  3c:   48 00 00 20     b       5c <fib+0x5c>
>>>>>  40:   fb ef ff f9     stdu    save0,-8(vsp)
>>>>>  44:   7e fd fa 14     add     arg_z,save2,save0
>>>>>  48:   fa ef ff f9     stdu    arg_z,-8(vsp)
>>>>>  4c:   3b de ff f8     addi    save1,save1,-8
>>>>>  50:   eb af 00 08     ld      save2,8(vsp)
>>>>>  54:   eb ef 00 00     ld      save0,0(vsp)
>>>>>  58:   39 ef 00 10     addi    vsp,vsp,16
>>>>>  5c:   2c 3e 00 00     cmpdi   save1,0
>>>>>  60:   40 82 ff e0     bne     40 <fib+0x40>
>>>>>  64:   7f f7 fb 78     mr      arg_z,save0
>>>>>  68:   eb af 00 00     ld      save2,0(vsp)
>>>>>  6c:   eb cf 00 08     ld      save1,8(vsp)
>>>>>  70:   eb ef 00 10     ld      save0,16(vsp)
>>>>>  74:   e9 c1 00 10     ld      loc_pc,16(sp)
>>>>>  78:   e9 e1 00 18     ld      vsp,24(sp)
>>>>>  7c:   ea 01 00 08     ld      fn,8(sp)
>>>>>  80:   7d c8 03 a6     mtlr    loc_pc
>>>>>  84:   38 21 00 20     addi    sp,sp,32
>>>>>  88:   4e 80 00 20     blr
>>>>>  8c:   83 a9 ff e0     lwz     save2,-32(allocptr)
>>>>> 
>>>>> [3] --------
>>>>> 
>>>>>        .global main
>>>>>        .extern printf
>>>>>        .text
>>>>> 
>>>>> fmt:    .asciz  "fibonacci(23) = %ld\n"
>>>>> 
>>>>>        .balign 4
>>>>> 
>>>>> fib:    .inst   0xAA1E03F8, 0xA9BF7BF9, 0xF9402F80, 0xEB2063FF, 0x5400004A
>>>>>        .inst   0xD4207D00, 0xF81F8F2F, 0xF81F8F30, 0xF81F8F31, 0xF81F8F32
>>>>>        .inst   0xD2800112, 0xD2800010, 0xF9400F31, 0x14000008, 0xF81F8F30
>>>>>        .inst   0x8B10024F, 0xF81F8F2F, 0xD1002231, 0xF9400732, 0xF9400330
>>>>>        .inst   0x91004339, 0xF100023F, 0x54FFFF01, 0xAA1003EF, 0xF9400332
>>>>>        .inst   0xF9400731, 0xF9400B30, 0xA9407BF9, 0xAA1803FE, 0x910043FF
>>>>>        .inst   0xD65F03C0
>>>>> 
>>>>> main:   mov     x0, sp
>>>>>        stp     fp, lr, [sp, -64]!
>>>>>        mov     fp, sp
>>>>>        stp     x24, x25, [sp, -16]!
>>>>>        mov     x25, x0
>>>>>        sub     x0, sp, 32
>>>>>        stp     x0, x28, [sp, -16]!
>>>>>        sub     x28, sp, 88
>>>>>        mov     x15, 23 << 3
>>>>>        bl      fib
>>>>>        asr     x1, x15, 3
>>>>>        adr     x0, fmt
>>>>>        bl      printf
>>>>>        ldp     x0, x28, [sp], 16
>>>>>        ldp     x24, x25, [sp], 16
>>>>>        ldp     fp, lr, [sp], 64
>>>>>        mov     x0, 0
>>>>>        ret
>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20240229/1760bf50/attachment-0001.htm>