[Openmcl-devel] Quick HW question...

Daniel Weinreb dlw at itasoftware.com
Tue Nov 16 13:27:15 PST 2010



R. Matthew Emerson wrote:
> On Nov 16, 2010, at 10:16 AM, Jon Anthony wrote:
>
>   
>> This is some good information.  Thanks for the pointers.  But it also
>> highlights an issue I've thought about from time to time: with modern
>> processor architectures (especially pipelines, caches, and now cores)
>> how does one _not_ write naive code for these things?  Sure, 90+% of the
>> worry on this goes to the compiler writers, but it can be easy to
>> accidentally write something that defeats their efforts.
>>     
>
> On modern x86, I've all but given up.  I just write
> naive and straightforward code, and assume (or hope) that
> the hardware guys have optimized for that. In my experience,
> measurements typically show that the difference in execution
> time between "clever" and naive code is negligible.
>   
Indeed, it is extrmely difficult, if not impossible,
to anticipate the speed of code running on
a modern processor.  In the Good Old Days,
we would just count the number of instructions.
These days, what's going on down in that processor
is ridiculously complicated, even before you think
about cache hits and misses.  Don't even think
about trying.  Just measure it and measure other
things.
> Intel has an optimization guide (you should be able to
> find it at http://www.intel.com/products/processor/manuals/).
>   
Yes, Intel tries to help people and what they
say is good to know.  Intel has an extremely
clear, and actually simple, multiprocessor
memory model (which AMD follows as well).
We had a lecuture about it here at ITA from
an Intel architecture guy.
> Clearly you can win big by writing cache-aware (or at least
> virtual memory-aware) code;  I remember a fairly ecent article in
> ACM Queue about this.
>
> http://queue.acm.org/detail.cfm?id=1814327
>
> One interesting quotation:
>
> The speed disparity between primary and secondary storage on the Atlas Computer was on the order of 1:1,000. The Atlas drum took 2 milliseconds to deliver a sector; instructions took approximately 2 microseconds to execute. You lost around 1,000 instructions for each VM page fault.
>
> On a modern multi-issue CPU, running at some gigahertz clock frequency, the worst-case loss is almost 10 million instructions per VM page fault. If you are running with a rotating disk, the number is more like 100 million instructions.
>   
Yeah, for high performance systems, dealing with rotating
disks is SO twentieth-century!

By the way, we use Oracle RAC. :( :( :(

-- Dan


-- Dan
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20101116/49bcdf03/attachment.htm>


More information about the Openmcl-devel mailing list