Peter Junglas
University of Technology Hamburg-Harburg

Experiences with Scalapack and MPI-2 on V and N class machines

1. Introduction

Scalapack is a large package for dense linear algebra on distributed memory architectures, based on the well known Lapack library. It assumes that matrices are distributed block-cyclically over the CPUs. Since Scalapack can use MPI as its fundamental communication layer, it is available on supercomputers as well as on PC clusters.

MPI-2 is an extension of the well known MPI standard, describing routines mainly for the following areas:
  • Process Creation and Management
  • One-sided Communication
  • Extended Collective Operations
  • I/O
  • Language Bindings
Unlike MPI-1 there are no free implementations of MPI-2, since it is very hard to do in a hardware independent way. Unfortunately even most hardware vendors haven't produced a complete MPI-2 library. HP is including additional MPI-2 routines with each release of HP MPI, concentrating at the moment mainly on one-sided communication and parallel I/O.

2. Projects

We did some simple tests and benchmarks on our V and N class machines to see, whether these tools can be used reliably and with reasonable performance. Though some of the scalapack self tests didn't work properly and a few very helpful MPI-2 functions were missing, the results were promising enough to use these libraries in some projects:

- Simulation of electromagnetic scattering and radiation with CONCEPT

CONCEPT is a package which is mainly used for electromagnetic compatibility computations. It solves the fundamental Maxwell equations by a method of moments.
This leads to large full matrices, which are non-hermitian and often ill-conditioned. The basic kernel is a full LU decomposition. The main problem in using Scalapack for the solver was the rewriting of all data structures to adapt to the block-cyclic distribution. The reward was an almost linear speedup on V and N class machines.

- Solving the generalized eigenvalue problem for very large matrices

The so called condensation methods are used to cope with the very large number of degrees of freedom as they come f.i. from finite element computations. The basic idea is to split the degrees of freedom into the - hopefully few - most relevant ones and the rest and to invest more computing time in the former. This leads to rather involved hierarchical data structures. The scalapack library is used for the implementation of the basic matrix operations.

- Parallel solution of reaction-diffusion equations with a stabilization method

In the context of chemical process engineering systems of highly nonlinear partial differential equations have to be solved. Fast standard solvers don't work since a few of the modes are unstable. By decomposing the solution space, one can use different (slower) methods for the unstable part, but still apply the fast methods to the larger stable part. To synchronize the two solvers, one has to combine the data dynamically while the computations are running, which makes a traditional message passing approach difficult. Here the new one-sided communication routines of MPI-2 come in handy.

3. Experiences and Conclusions

We will present some of the problems we encountered, and the pitfalls we stepped into, before we could use both libraries. Finally we will suggest how HP could help us make them into standard tools for practical parallel programming.