[Openmcl-devel] Solaris x86-64 and %gs
Chris Curtis
enderx12 at mac.com
Mon Jan 7 10:26:47 PST 2008
(Replying to myself with some additional info...)
From looking at lisp-kernel/thread-manager.c, it looks like
setup_tcr_extra_segment doesn't ever get called unless both HAVE_TLS
and X8664 are defined (in new_tcr), which may or may not be the case
for Solaris. So on the one hand, it looks like it *should* be possible
to run without HAVE_TLS and just get the tcr chain calloc'd. Is this
code path actually valid on any current platforms?
OTOH, all the asm macros refer to %rcontext (= %gs) so that doesn't
seem quite likely.
Deep knowlege helpful here. ;-)
--chris
On Jan 7, 2008, at 12:18 PM, Chris Curtis wrote:
> Well, after a fair bit of digging I've made a little bit of headway on
> this issue. It feels like 2 steps forward, (- 2 epsilon) steps back.
>
> The good news is that the post you found here seems to mean we
> probably don't have quite the same problem as on Darwin.
> DARWIN_GS_HACK swaps the pthread %gs data with the CCL tcr, and on
> Solaris x86-64 libpthread uses %fs instead as per the ABI (supposedly
> leaving %gs alone).
>
> Interestingly, I can do "mov %0,%%gs" with inline assembly, but only
> as long as the value I'm trying to set is [0-3]. Any other value
> segfaults. (Same with %fs, BTW.)
>
> FWIW, the sbcl runtime (on Solaris x86) sets %fs directly, but only
> after setting up a new LDT block via SI86DSCR. It then saves the LDT
> selector in its pthread_specific block.
>
> I'm continuing to dig... any thoughts or suggestions would be greatly
> appreciated. :-)
>
> --chris
>
>
> On Jan 3, 2008, at 3:07 PM, R. Matthew Emerson wrote:
>
>> I stumbled across this:
>> http://blogs.sun.com/tpm/entry/solaris_10_on_x64_processors3
>>
>> [begin excerpt]
>>
>> Threads and Selectors
>>
>> In previous releases of Solaris, the 32-bit threads library used the
>> %gs selector to allow each LWP in a process to refer to a private
>> LDT entry to provide the per-thread state manipulated by the
>> internals of the thread library. Each LWP gets a different %gs
>> value that selects a different LDT entry; each LDT entry is
>> initialized to point at per-thread state. On LWP context switch,
>> the kernel loads the per-process LDT register to virtualize all this
>> data to the process. Workable, yes, but the obvious inefficiency
>> here was requiring every process to have at least one extra locked-
>> down page to contain a minimal LDT. More serious, was the implied
>> upper bound of 8192 LWPs per process (derived from the hardware
>> limit on LDT entries).
>>
>> For the amd64 port, following the draft ABI document, we needed to
>> use the %fs selector for the analogous purpose in 64-bit processes
>> too. On the 64-bit kernel, we wanted to use the FSBASE and GSBASE
>> MSRs to virtualize the addresses that a specific magic %fs and magic
>> %gs select, and we obviously wanted to use a similar technique on 32-
>> bit applications, and on the 32-bit kernel too. We did this by
>> defining specific %fs and %gs values that point into the GDT, and
>> arranged that context switches update the corresponding underlying
>> base address from predefined lwp-private values - either explicitly
>> by rewriting the relevant GDT entries on the 32-bit kernel, or
>> implicitly via theFSBASE and GSBASE MSRs on the 64-bit kernel. The
>> result of all this work makes the code simpler, it scales cleanly,
>> and the resulting upper bound on the number of LWPs is derived only
>> from available memory (modulo resource controls, obviously).
>>
>> [end excerpt]
>>
>> So it sounds like the functionality is there, it's just a question
>> of whether/how it's exposed to user processes.
>>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
More information about the Openmcl-devel
mailing list