[Openmcl-devel] MacBook Pro compatibility

Gary Byers gb at clozure.com
Wed Apr 14 02:20:54 PDT 2010


The way that SMT/hyperthreading has worked in the past (on some Xeons,
Atoms - of all things - and other Intel machines), it's not clear it's
not clear that there's too much that an application can do to exploit
the technology (that's much more a question of how the OS schedules
things), but there is at least one thing that can be done to stay out
of its way (or, depending on your point of view, to keep it from
getting in its own way.)

Unlike multicore systems (where multiple CPU cores share the same
packaging), in SMT/HTT multiple "logical CPUs" (sets of
architecturally visible registers) sharing the same cache and
microarchitecture (including things like branch prediction/history
mechanisms, out-or-order execution logic (the Atom is in-order),
functional/execution units, etc.)  If I understand correctly, the
Nehalem machines have more of these sharable resources than earlier
implementations did, and try harder to get the logical CPUs to share
these resources.  Whether that's possible and how well it works has
a lot to do with what threads the OS schedules on what logical CPUs.
(If the OS is blissfully unaware of the issue, HTT can be slower
than single-core execution.)

HTT systems can also work with the OS to force context switches when
logical CPUs are idle or stalled.  Because of the way that this
techology works (this was true of the Xeon; I don't know if it still
is), certain kinds of busy-waiting loops could cause a thread to get
locked on a logical CPU unless certain kinds of NOPs ("pause"
instructions) occurred during the loop.  CCL had some busy-waiting
code (to implement spinlocks), and we found that code running on a
customer's Xeon machines was spending much more time in that loop than
made sense (and didn't see the same behavior on other machines.)  I
remember adding the "pause" instruction, but we changed CCL to use another
lock-contention mechanism on Linux at the time, and I don't know if this
had the intended effect.  (Mac Pros (some? all?) are also Xeons and spinlocks
are still used for some things on OSX; I don't remember hearing of this
problem, but am not sure that I would have unless someone was looking for it.)

I tend to think of HTT as being a failed experiment (and to some extent it's
seemed like Intel does, too), but Intel keeps bringing it back from the dead.
Most OSes are "HTT aware" these days (since the Mac Pro has been around for
several years, I assume that HTT awareness has been in OSX for at least as long)
so some of the worst-case thrashorama behavior is probably not as common as
it once was, and there may be cases where an HTT system performs as well as
or better than a multi-core or multi-CPU system does.  Intel has other reasons
(manufacturing-related reasons) for pushing the technology, but it's not clear
that they'd have reintroduced it into the Nehalem unless they thought that
they'd addressed some of the problems with earlier attempts (perhaps by throwing
hardware at those problems.)

The things that a thread could do to improve performance on an HTT system aren't
obviously under a compiler's control.  ("Don't have cache conflicts with the
thread on the other logical CPU(s)!")  I suppose that a compiler might try
to unroll loops to reduce branch misprediction penalties (not a bad idea in
general on current Intel machines, maybe a better idea if you're sharing the
branch prediction hardware with another partial/logical CPU); there are similar
(presumably shared) caching mechanism that try to pair call/return, so inlining
might help a little.  I dunno; up until this point, winning on HTT systems has
mostly seemed to involve the luck of the draw in scheduling decisions made by
the OS, and nothing that I've seen about the Nehalem machines seems to change
that.


On Tue, 13 Apr 2010, Alexander Repenning wrote:

> just wondering if anybody has some good, or bad, experience, or speculation, running CCL on  Intel Core i5 (e.g., iMac) or Core i7 Macs?
>
> Any speculation on how well CCL compiled Lisp code will work with Intel Hyper Threading?
>
> Alex
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list