B. N. Cheng
Jet Propulsion Laboratory

A Technique for Improving Performance of Global Collective Operations on the HP Exemplar

We describe a technique for speeding up the performance of global collective operations on a cluster of symmetric multiprocessor (SMP) parallel computer such as the HP Exemplar.
Global collective operations are inherently faster within an SMP computer than between such computers. This algorithm takes advantage of this fact and performs the global collective operations first within the SMP machine, and then completes the operations between the machines. This results in significant improvement in global collective performance timing, almost twice as fast as conventional MPI global reduction calls in some cases.