|
Ravi Iyer Texas A&M and Intel Nancy Amato, Lawrence Rauchwerger*, Laxmi Bhuyan (* presenter) Dept of Computer Science, Texas A&M As processor technology continues to advance at a rapid pace, the principal performance bottleneck of shared memory systems has become the memory access latency. In order to understand the effects of cache and memory hierarchy on system latencies, performance analysts perform benchmark analysis on existing state-of-the-art multiprocessors. In this study, we present a detailed comparison of two recent commercial ventures, the HP V-Class and the SGI Origin 2000. Our goal is to compare and contrast design techniques used in these multiprocessors. We present the impact of processor design, cache/memory hierarchies and coherence protocol optimizations on the memory system performance of these multiprocessors. |
We also study the effect of parallelism
overheads such as process creation and synchronization on the user-level
performance of these multiprocessors. Our experimental methodology uses
microbenchmarks as well as scientific applications to characterize the
user-level performance.
Our microbenchmark results show the impact of L1/L2 cache size and TLB size on uniprocessor load/store latencies, the effect of coherence protocol design/optimizations and data sharing patterns on multiprocessor memory access latencies and finally the overhead of parallelism. Our application-based evaluation shows the impact of problem size, dominant sharing patterns and number of processors used on speedup and raw execution time. Finally, we use hardware counter measurements and simple performance models to study the correlation of system-level performance metrics and the application's execution time performance. |