[Openmcl-devel] Quick HW question...

Daniel Weinreb dlw at itasoftware.com
Mon Nov 15 14:01:05 PST 2010


I agree with what your saying, and will even amplify it.

In fact, I was just at a talk at MIT by Nir Shavit. who
does a lot of research into concurrency control
mechanisms on real, current processors. 

http://www.math.tau.ac.il/~shanir/

He says that the cost of CAS (compare and store)
instructions is very high compared to what
you might think, on a multi-core system,
and worse as the number of cores goes
up (and the level of caches therefore
increases).  The effect on caches is really bad,
and hurting the caching these days really
slow things down.

Dave Moon said to me serveral years ago that
the entire concept of looking at main memory
as if it were an addressible array of words
is entirely out of date if you're looking for
high performance (in certain situations).
You must think of it as a sequence of cache
lines.  And it gets more complicated once
you're dealing with both L2 and L3 caches,
which have different line sizes, and different
sets of cores accessing them.  When you have
a L3 cache, you really have a NUMA architecture
and if you want speed, you have to write your
code accordingly, i.e., a core should not read
and write data from some other L2 cache
than its own and expect that to be fast.

-- Dan



Your result about getting rid of the spin locks is
less paradoxical than you might think, or even
not pardoxical at all once you take a look at
the data that guys like Shavit are doing.

Gary Byers wrote:
> Someone asked about the i7, and I remember professing ignorance (several
> paragraphs of it.)
>
> The Mac Pro (at least some models) use/have used Intel XEON processors
> which in turn use HTT; it's reasonable to assume that the OSX scheduler's
> been HTT-aware for some time.  (I don't know if it's true, but it's a
> reasonable assumption.)
>
> CCL uses spinlocks on most platforms; acquiring a spinlock involves a
> loop like:
>
> (block got-lock
>   (loop
>     (dotimes (i *spin-lock-tries*)
>       (if (%store-conditional something somewhere)
>         (return-from got-lock t)))
>     (give-up-timeslice)))
>
> where *spin-lock-tries* is 1 on a uniprocessor system and maybe a few
> thousand on a multiprocessor system.  On a system that uses
> hyperthreading, that sort of loop interferes with the hardware's 
> ability to schedule another hardware thread, and it's necessary to
> use a special "pause" instruction (a kind of NOP) in the loop to
> advise the hardware that the current thread wasn't really making
> progress.
>
> While profiling a customer's application a few years ago, we found
> that - when run on XEONs - a lot of time was reported as being spent
> looping around waiting for spinlocks.  I said "D'oh!" and added a
> "pause" instruction on the execution path, but that didn't make things
> better; on Linux, we replaced the use of spinlocks with a slightly
> more expensive mechanism, and (paradoxically) things improved, at least
> on the XEONs.
>
> That never made sense to me, and I always suspected that something was
> just wrong in the implementation (the "pause" instruction happened
> out-of-line) or that we were seeing profiling artifacts.  At the time,
> there wasn't time to explore this issue more fully, but those suspicions
> have lingered.
>
> CCL generally does a lot less locking (of hash-tables, streams, ...) 
> than it did a few years ago.  We still use spinlocks (some) on
> non-Linux platforms, so if there was a bad interaction between spinlock
> loops and hyperthreading it's likely still there but may not show up
> as often.
>
> Other than that issue, I'm not aware of any way in which HTT is directly
> visible to non-OS code.
>
>
>
> On Mon, 15 Nov 2010, Jon Anthony wrote:
>
>> Hi,
>>
>> I know the Wiki page for SysReq states no known issues with X86_64, but
>> I seem to recall something passing through here about some "issue" with
>> Intel Core i7 on Mac, or maybe generally, or am I just plain
>> misremembering?  On a related note, does MacOS understand HTT (does it
>> know the diff between physical and logical cores)?  Thanks for any info!
>>
>> /Jon
>>
>>
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>>
>>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel



More information about the Openmcl-devel mailing list