[Openmcl-devel] Porting the OpenMCL Compiler

Gary Byers gb at clozure.com
Fri Jul 8 00:08:05 UTC 2005

On Thu, 7 Jul 2005, James Bielman wrote:

> Gary Byers <gb at clozure.com> writes:
>> Some PPC Linuces (I don't know about ARM Linux or WinCE) want to
>> keep thread-specific information in a register (DarwinPPC64 wants to
>> keep a pointer to the current pthread in R13), and the C runtime
>> often gets confused and upset if this convention is violated.
>> (OpenMCL's been trying to get away with violating it while lisp code
>> is running, but weird things happen during exception handling and
>> the next release will avoid Angering The TLS Gods.)  If the OS
>> supports the concept of thread-local storage (TLS), it may be
>> possible to make the lisp TCR be a thread-local variable (with a
>> known offset within the block of thread-local variables that the
>> ABI's thread-pointer points to), which would keep both the OS and
>> Lisp happy without burning a register.
> So far as I can tell, neither Windows CE nor the Linux ARM ABI require
> a dedicated thread register (I haven't found anything that looks like
> official Linux/ARM ABI documentation yet), so this sounds like a good
> plan.

In the last couple of years, ELF ABIs on various Linux platforms
have been extended to support Thread Local Storage (TLS) and (though
this is usually less visible to user code) a New Posix Threads Library
(I'm pretty sure that NPTL depends on TLS support as well as on
kernel features that were introduced in 2.6).

In the PPC world, some Linux distributions still aren't supporting
TLS, some provide parallel TLS and non-TLS libraries, and some are
entirely TLS-based.  I'm not sure what the situation is on the ARM;
I can probably find the ARM TLS ABI supplement if I can remember
where I found the PPC32/PPC64 TLS supplements ....

Ah.  Google reveals:

Main ARM ABI repository:

TLS supplement (may already be incorporated into the above.)

These documents don't tell you whether a particular ARM Linux release
is using a TLS-based ABI, or if/when it will ...

>> It's nice to be able to cons inline, but I'm not sure if it's
>> that important.
>> On the PPC, "nargs" is only used in limited contexts (#args/#values),
>> but (as far as the GC is concerned) it's just an immediate register.
> Okay, here's where I am after shuffling things around:
> r0      imm0            unboxed temp reg
> r1      imm1            unboxed temp reg
> r2      imm2/nargs      unboxed temp reg, number of arguments
> r3      temp0           boxed temp reg
> r4      temp1           boxed temp reg
> r5      temp2           boxed temp reg
> r6      save0           boxed callee-save reg
> r7      save1           boxed callee-save reg
> r8      save2           boxed callee-save reg
> r9      arg_y           second to last argument
> r10     arg_z           last argument
> r11     fn              current function object
> r12     vsp             value stack pointer
> r13     sp              control stack pointer
> r14     lr              link register
> r15     pc              program counter
> With the TCR (containing ALLOCPTR, ALLOCBASE, and TSP) in thread-local
> storage.

That looks reasonable.  I think that the idea of being able to handle
execptional cases by (carefully) setting a mask in the TCR probably
addresses the cases where you might need another immediate reg or
two, and so far I haven't thought of a problem with the TCR-mask

> This leads very nicely into my next architectural question. :-)
> The ARM doesn't have the compare-and-trap instructions that the PPC
> does.  There is an undefined instruction space that looks similar to
> the UUOs on the PPC and I *think* these instructions are conditionally
> executed (testing on a StrongARM PDA confirms this but I'm not sure if
> it's required to do so, if this isn't guaranteed then I'll have to do
> something completely different).
> The undefined instruction space available for user extension has 16
> bits of space to play with, so I should be able to fit some sort of
> trap code (wrong # of arguments, wrong tag, etc), plus maybe a source
> register and an immediate or other register (so I can spill the
> expected argument count for 8191-arg functions to a register :-) into
> the UUO.
> So, the traps above might look sort of like (handwavy still):
>    CMP nargs, 0
>    MAKE_UUO(ne, check_nargs, nargs, 0)
> or
>    ANDS imm0, arg_z, #fixnum_mask    ; extract tag and set flags
>    MAKE_UUO(ne, check_lisptag, arg_z, tag_fixnum)
> Where MAKE_UUO is a macro taking an ARM condition code, a trap code
> for the kernel, and a register/immediate to use in error reporting.

I'm pretty sure that the ARM Architecture Manual (under another
pile of papers on my desk, the last time I looked) pretty strongly
suggests that the conditional execution bits are always interpreted
and the rest of the opcode isn't interpreted unless the condition is

> Alternatively, if this is getting too weird/complicated/whatever,
> maybe there could just be subprimitives for the failed traps?
> Something like:
>    ANDS  imm0, arg_z, #fixnum_mask
>    MOVNE pc, .SPargz_not_fixnum      ; or whatever...

If it turns out that an alternative is necessary, it might also be possible
to do something like

     ANDS imm0, arg_z, #fixnum_mask
     BNE not_fixnum_1

     unconditional_uuo not_fixnum,arg_z
     ;; you might be able to encode more info here.
     ;; Something else isn't a fixnum, in some other context.

I'm pretty sure that all ARM instructions are conditonal (even illegal
ones), and that'd be a bit nicer.

It's good to be able to do this sort of think via some sort of trap/uuo-
like mechanism, as long as the exception handler has a reasonable chance
of figuring out what's going on.

> (I guess this assumes the subprimitive can call into the kernel proper
> without invoking a trap, which probably raises other issues I haven't
> looked into...)
> I imagine an x86 port would need to do something different here too.

I imagine that whatever it is will involve much grief and gnashing of

> (So far, trap handling on Windows CE seems to be a bit of a mess.  If
> you use "structured exception handling" you have to build stack frames
> in very specific ways so the "virtual unwinder" can emulate function
> prolog instructions backwards to walk up the stack.
> It requires information about each function in a section of the
> executable as well; I'm not sure how one would build these data
> structures when compiling at run-time.
> Win32 has SetUnhandledExceptionFilter which can be used to get around
> this but it is (of course) missing on CE...)

Carl Shapiro's been working on a port of CMUCL to Win32 and was running
into this; I'd suggested using the unhandled-exception-filter mechanism,
and he said recently that he'd solved the problem another way.  I don't
know if he's released his code yet, but I wouldn't think that his solution
would involve pc-tables (he basically wanted to ensure that SEH things got
unwound when lisp did a THROW.)

>> There might also be ways of getting some flexibility (in some sense
>> of the word) and still keeping a preemptively scheduled GC happy.
>> Suppose that you were about to enter a loop where you really needed
>> a bunch of IMM regs and had no use for some node regs (in that
>> loop).  You -might- be able to do something like:
>>   [set bits in tcr to mark node regs as immediate]
> Ah cool, I'll keep that option open for when I get that far. :-)
>> In the "last N args in registers" case, the naive approach is:
> Neat, that makes sense.  I think this should be nice for ARM too since
> it should be possible to VPOP the last arguments in a single
> instruction:
>    (VPOP-YZ) ==> LDMIA vsp!, {arg_y, arg_z}

The ARM is definitely cool.

> James

More information about the Openmcl-devel mailing list