[Openmcl-devel] Solaris x86-64 and %gs

Mon Jan 7 10:26:47 PST 2008

(Replying to myself with some additional info...)

 From looking at lisp-kernel/thread-manager.c, it looks like  
setup_tcr_extra_segment doesn't ever get called unless both HAVE_TLS  
and X8664 are defined (in new_tcr), which may or may not be the case  
for Solaris. So on the one hand, it looks like it *should* be possible  
to run without HAVE_TLS and just get the tcr chain calloc'd. Is this  
code path actually valid on any current platforms?

OTOH, all the asm macros refer to %rcontext (= %gs) so that doesn't  
seem quite likely.

Deep knowlege helpful here. ;-)

--chris

On Jan 7, 2008, at 12:18 PM, Chris Curtis wrote:

> Well, after a fair bit of digging I've made a little bit of headway on
> this issue. It feels like 2 steps forward, (- 2 epsilon) steps back.
>
> The good news is that the post you found here seems to mean we
> probably don't have quite the same problem as on Darwin.
> DARWIN_GS_HACK swaps the pthread %gs data with the CCL tcr, and on
> Solaris x86-64 libpthread uses %fs instead as per the ABI (supposedly
> leaving %gs alone).
>
> Interestingly, I can do "mov %0,%%gs" with inline assembly, but only
> as long as the value I'm trying to set is [0-3]. Any other value
> segfaults. (Same with %fs, BTW.)
>
> FWIW, the sbcl runtime (on Solaris x86) sets %fs directly, but only
> after setting up a new LDT block via SI86DSCR. It then saves the LDT
> selector in its pthread_specific block.
>
> I'm continuing to dig... any thoughts or suggestions would be greatly
> appreciated. :-)
>
> --chris
>
>
> On Jan 3, 2008, at 3:07 PM, R. Matthew Emerson wrote:
>
>> I stumbled across this:
>> http://blogs.sun.com/tpm/entry/solaris_10_on_x64_processors3
>>
>> [begin excerpt]
>>
>> Threads and Selectors
>>
>> In previous releases of Solaris, the 32-bit threads library used the
>> %gs selector to allow each LWP in a process to refer to a private
>> LDT entry to provide the per-thread state manipulated by the
>> internals of the thread library.  Each LWP gets a different %gs
>> value that selects a different LDT entry; each LDT entry is
>> initialized to point at per-thread state.  On LWP context switch,
>> the kernel loads the per-process LDT register to virtualize all this
>> data to the process.  Workable, yes, but the obvious inefficiency
>> here was requiring every process to have at least one extra locked-
>> down page to contain a minimal LDT.  More serious, was the implied
>> upper bound of 8192 LWPs per process (derived from the hardware
>> limit on LDT entries).
>>
>> For the amd64 port, following the draft ABI document, we needed to
>> use the %fs selector for the analogous purpose in 64-bit processes
>> too.  On the 64-bit kernel, we wanted to use the FSBASE and GSBASE
>> MSRs to virtualize the addresses that a specific magic %fs and magic
>> %gs select, and we obviously wanted to use a similar technique on 32-
>> bit applications, and on the 32-bit kernel too.  We did this by
>> defining specific %fs and %gs values that point into the GDT, and
>> arranged that context switches update the corresponding underlying
>> base address from predefined lwp-private values - either explicitly
>> by rewriting the relevant GDT entries on the 32-bit kernel, or
>> implicitly via theFSBASE and GSBASE MSRs on the 64-bit kernel.  The
>> result of all this work makes the code simpler, it scales cleanly,
>> and the resulting upper bound on the number of LWPs is derived only
>> from available memory (modulo resource controls, obviously).
>>
>> [end excerpt]
>>
>> So it sounds like the functionality is there, it's just a question
>> of whether/how it's exposed to user processes.
>>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel