[Openmcl-devel] Solaris x86-64?

Chris Curtis enderx12 at mac.com
Mon Jan 7 09:18:44 PST 2008


Well, after a fair bit of digging I've made a little bit of headway on  
this issue. It feels like 2 steps forward, (- 2 epsilon) steps back.

The good news is that the post you found here seems to mean we  
probably don't have quite the same problem as on Darwin.  
DARWIN_GS_HACK swaps the pthread %gs data with the CCL tcr, and on  
Solaris x86-64 libpthread uses %fs instead as per the ABI (supposedly  
leaving %gs alone).

Interestingly, I can do "mov %0,%%gs" with inline assembly, but only  
as long as the value I'm trying to set is [0-3]. Any other value  
segfaults. (Same with %fs, BTW.)

FWIW, the sbcl runtime (on Solaris x86) sets %fs directly, but only  
after setting up a new LDT block via SI86DSCR. It then saves the LDT  
selector in its pthread_specific block.

I'm continuing to dig... any thoughts or suggestions would be greatly  
appreciated. :-)

--chris


On Jan 3, 2008, at 3:07 PM, R. Matthew Emerson wrote:

> I stumbled across this:
> http://blogs.sun.com/tpm/entry/solaris_10_on_x64_processors3
>
> [begin excerpt]
>
> Threads and Selectors
>
> In previous releases of Solaris, the 32-bit threads library used the  
> %gs selector to allow each LWP in a process to refer to a private  
> LDT entry to provide the per-thread state manipulated by the  
> internals of the thread library.  Each LWP gets a different %gs  
> value that selects a different LDT entry; each LDT entry is  
> initialized to point at per-thread state.  On LWP context switch,  
> the kernel loads the per-process LDT register to virtualize all this  
> data to the process.  Workable, yes, but the obvious inefficiency  
> here was requiring every process to have at least one extra locked- 
> down page to contain a minimal LDT.  More serious, was the implied  
> upper bound of 8192 LWPs per process (derived from the hardware  
> limit on LDT entries).
>
> For the amd64 port, following the draft ABI document, we needed to  
> use the %fs selector for the analogous purpose in 64-bit processes  
> too.  On the 64-bit kernel, we wanted to use the FSBASE and GSBASE  
> MSRs to virtualize the addresses that a specific magic %fs and magic  
> %gs select, and we obviously wanted to use a similar technique on 32- 
> bit applications, and on the 32-bit kernel too.  We did this by  
> defining specific %fs and %gs values that point into the GDT, and  
> arranged that context switches update the corresponding underlying  
> base address from predefined lwp-private values - either explicitly  
> by rewriting the relevant GDT entries on the 32-bit kernel, or  
> implicitly via theFSBASE and GSBASE MSRs on the 64-bit kernel.  The  
> result of all this work makes the code simpler, it scales cleanly,  
> and the resulting upper bound on the number of LWPs is derived only  
> from available memory (modulo resource controls, obviously).
>
> [end excerpt]
>
> So it sounds like the functionality is there, it's just a question  
> of whether/how it's exposed to user processes.
>




More information about the Openmcl-devel mailing list