[Openmcl-devel] Quick HW question...

Tue Nov 16 09:33:59 PST 2010

On Nov 16, 2010, at 10:16 AM, Jon Anthony wrote:

> This is some good information.  Thanks for the pointers.  But it also
> highlights an issue I've thought about from time to time: with modern
> processor architectures (especially pipelines, caches, and now cores)
> how does one _not_ write naive code for these things?  Sure, 90+% of the
> worry on this goes to the compiler writers, but it can be easy to
> accidentally write something that defeats their efforts.

On modern x86, I've all but given up.  I just write
naive and straightforward code, and assume (or hope) that
the hardware guys have optimized for that. In my experience,
measurements typically show that the difference in execution
time between "clever" and naive code is negligible.

Intel has an optimization guide (you should be able to
find it at http://www.intel.com/products/processor/manuals/).

Clearly you can win big by writing cache-aware (or at least
virtual memory-aware) code;  I remember a fairly ecent article in
ACM Queue about this.

http://queue.acm.org/detail.cfm?id=1814327

One interesting quotation:

The speed disparity between primary and secondary storage on the Atlas Computer was on the order of 1:1,000. The Atlas drum took 2 milliseconds to deliver a sector; instructions took approximately 2 microseconds to execute. You lost around 1,000 instructions for each VM page fault.

On a modern multi-issue CPU, running at some gigahertz clock frequency, the worst-case loss is almost 10 million instructions per VM page fault. If you are running with a rotating disk, the number is more like 100 million instructions.