|
Sharon Brunett
Multithreading is a technique often used to attain performance and scalability
for well suited coarse-grain applications. The underlying hardware and
software supporting such an application can and do have a profound effect
on a code's performance and scaling behavior.
|
The Hewlett Packard X200 and V2500 servers are symmetric multiprocessor (SMP) cache coherent nonuniform memory-access (ccNUMA) systems. The notion of a simple to program, large, integrated memory is appealing to a many types of scientific codes. The fundamental building block of the X and V-Class systems is the hypernode. Each hypernode is a SMP, containing multiple processors connected, local memory and an I/O subsystem A crossbar switch on each node provides nonblocking access from CPUs and I/O devices to the memory subsystem. Since our application is multithreaded by virtue of hand placed compiler directives, studying the scheduling policies applicable to threads is worthwhile. "Mpsched" allows a user to specify a variety of options for controlling the processor or locality domain on which a specific process executes. Mpsched options, " -T Policy " apply a specified scheduling policy to newly created threads of a process. The launch, or scheduling, policies are straight forward: RoundRobin, Least loaded, Fill first, and Packed. Performance tests include increasingly larger grid sizes, which knowingly fit in or exceed cache. Various scheduling options are employed to measure which policies work best under what circumstances. Investigation with cxperf, pmon, and gpm on the V-Class shows large memory latencies and significant cache misses, for particular problem sizes. The application kernel simulates a highly time-variable work imbalance per grid point. The resulting scaling behavior and performance on th HP VClass is reasonable, compared to the XClass, providing the problem size is large enough to yield adequate parallelism and a good choice of launch policies is selected. |