[Openmcl-devel] Solaris x86-64?

Thu Jan 3 12:07:00 PST 2008

On Jan 3, 2008, at 1:27 AM, Gary Byers wrote:

>
>
> On Wed, 2 Jan 2008, Chris Curtis wrote:
>
>>
>> On Jan 2, 2008, at 11:39 AM, Gary Byers wrote:
>
>>> The x86-64 port needs (or at least really, really wants)
>>> some way to set the gs segment register so that it points
>>> to per-thread lisp data.  I don't remember whether Solaris
>>> offers a way to do this; if not, it might be hard to work
>>> around that.
>
>>
>> That one I'd have to spend a little more time digging into...  
>> nothing off the
>> top of my head, since I've tried really hard to forget everything I  
>> ever knew
>> about segmented memory.
>
> In ccl/lisp-kernel/thread-manager.c, there's a function called
> 'setup_tcr_extra_segment' that's called on thread startup on x86-64;
> it tries to do something OS-dependent to make the %gs register
> point at lisp per-thread data (the thread's "thread context record",
> or TCR.)
>
> Darwin ("the world's most advanced operating system!") doesn't offer
> that, so we do something horrible and ugly: the pthreads library
> uses %gs on Darwin (less advanced OSes that're compliant with
> the amd64 ABI use %fs ...) and we switch %gs to point to either
> pthreads data or lisp data every time we switch between running
> lisp code and running foreign code; see code conditionalized on
> DARWIN_GS_HACK.  If there isn't an advertised way to make %gs
> available to user code on Solaris, there might be a way to do
> SOLARIS_GS_HACK.

I stumbled across this:
http://blogs.sun.com/tpm/entry/solaris_10_on_x64_processors3

[begin excerpt]

Threads and Selectors

In previous releases of Solaris, the 32-bit threads library used the  
%gs selector to allow each LWP in a process to refer to a private LDT  
entry to provide the per-thread state manipulated by the internals of  
the thread library.  Each LWP gets a different %gs value that selects  
a different LDT entry; each LDT entry is initialized to point at per- 
thread state.  On LWP context switch, the kernel loads the per-process  
LDT register to virtualize all this data to the process.  Workable,  
yes, but the obvious inefficiency here was requiring every process to  
have at least one extra locked-down page to contain a minimal LDT.   
More serious, was the implied upper bound of 8192 LWPs per process  
(derived from the hardware limit on LDT entries).

For the amd64 port, following the draft ABI document, we needed to use  
the %fs selector for the analogous purpose in 64-bit processes too.   
On the 64-bit kernel, we wanted to use the FSBASE and GSBASE MSRs to  
virtualize the addresses that a specific magic %fs and magic %gs  
select, and we obviously wanted to use a similar technique on 32-bit  
applications, and on the 32-bit kernel too.  We did this by defining  
specific %fs and %gs values that point into the GDT, and arranged that  
context switches update the corresponding underlying base address from  
predefined lwp-private values - either explicitly by rewriting the  
relevant GDT entries on the 32-bit kernel, or implicitly via theFSBASE  
and GSBASE MSRs on the 64-bit kernel.  The result of all this work  
makes the code simpler, it scales cleanly, and the resulting upper  
bound on the number of LWPs is derived only from available memory  
(modulo resource controls, obviously).

[end excerpt]

So it sounds like the functionality is there, it's just a question of  
whether/how it's exposed to user processes.