[Openmcl-devel] Quick HW question...

Tue Nov 16 11:53:29 PST 2010

On Tue, 2010-11-16 at 12:33 -0500, R. Matthew Emerson wrote:
> On Nov 16, 2010, at 10:16 AM, Jon Anthony wrote:
> 
> > This is some good information.  Thanks for the pointers.  But it also
> > highlights an issue I've thought about from time to time: with modern
> > processor architectures (especially pipelines, caches, and now cores)
> > how does one _not_ write naive code for these things?  Sure, 90+% of the
> > worry on this goes to the compiler writers, but it can be easy to
> > accidentally write something that defeats their efforts.
> 
> On modern x86, I've all but given up. 

That's actually an example where "LOL" was appropriate

>  I just write
> naive and straightforward code, and assume (or hope) that
> the hardware guys have optimized for that. In my experience,
> measurements typically show that the difference in execution
> time between "clever" and naive code is negligible.
> 
> Intel has an optimization guide (you should be able to
> find it at http://www.intel.com/products/processor/manuals/).
> 
> Clearly you can win big by writing cache-aware (or at least
> virtual memory-aware) code;  I remember a fairly ecent article in
> ACM Queue about this.
> 
> http://queue.acm.org/detail.cfm?id=1814327

Thanks for these pointers as well.

/Jon

> 
> One interesting quotation:
> 
> The speed disparity between primary and secondary storage on the Atlas Computer was on the order of 1:1,000. The Atlas drum took 2 milliseconds to deliver a sector; instructions took approximately 2 microseconds to execute. You lost around 1,000 instructions for each VM page fault.
> 
> On a modern multi-issue CPU, running at some gigahertz clock frequency, the worst-case loss is almost 10 million instructions per VM page fault. If you are running with a rotating disk, the number is more like 100 million instructions.
> 
>