|
Thuc Hoang, David Luginbuhl, Paul Messina
Abstract
The mission of the Department of Energy's Accelerated Strategic Computing Initiative (ASCI) is to provide the modeling and computer simulation capabilities required for maintaining the safety and reliability of the nuclear weapons stockpile in the absence of underground testing. The ASCI strategy is to build future high-end computing (HEC) systems by scaling commercially viable building blocks, both hardware and software, to 30 teraOPS and beyond. The goal of the ASCI's PathForward Program is to develop and accelerate critical technologies required for a balanced and reliable scientific computing environment for 30-100 teraOPS systems in the 2001-2004 timeframe. ASCI is executing PathForward by entering into partnerships with U.S. industry to develop these technologies for integration into the ASCI computing environment. An essential aspect of the strategy is that the PathForward-funded R&D lead to a standard product, not a custom or one-of-a-kind development. Consequently, the technologies involved are either in a company's current business plan but previously not planned to be available in the time frame or the scale required by ASCI or it is expected that they will be added to the company's business plan. Currently, the PathForward Program is investing in the development of technology in three critical areas: Interconnects, Storage, and Software.
|
A key obstacle to delivering and efficiently exploiting ultra-large systems (30 and 100 teraOPS) on the schedule envisioned by ASCI is the availability of interconnect mechanisms (hardware and software) for reliable communication with the required performance and scalability characteristics.
Five PathForward interconnect partnerships were formed in FY98. Three of these were for system interconnects, one was for signaling technology, and one was for development of HiPPI 6400 network interface cards.
ASCI terascale computing will require multiPetabyte-sized archives in the 2000-2001 time frame. In this time frame, archival storage data rates need to be in the 1 Gigabyte/s range to allow for efficient use of the ASCI computing environment. Extrapolation of archival storage sizes and data rates based on planned ASCI platform growth to the 2004 time frame yields archival storage sizes near 50-100 Petabytes and data rates in the 10-20 Gigabytes/s range. This archival storage outlook is precisely why a 5-10x capacity, bandwidth, and density improvement over current technology trends is needed. PathForward projects to pursue these storage goals include partnerships to develop high speed Redundant Array of Independent Tapes (RAIT) and very high density optical tape systems. ASCI has accelerated the development of high-fidelity, three-dimensional, multitasking physics simulations. A single ASCI simulation must be able to make effective use of the entire system. A parallel runtime system and a software development environment are needed to enable such simulations. The runtime system is the middleware that allows applications to use the cluster of hardware as a single system, enabling parallelism on ASCI platforms. The software development environment allows ASCI simulation developers to write parallel code that runs on these unique systems. One PathForward project has developed an initial version of a parallel debugging tool, which is operational on several platforms and has proven to be useful even in its current state. Several other projects are under negotiation to address the software goals mentioned above. A new area for PathForward investment is in scalable input/output (I/O) for ASCI applications. Unless I/O transfer rates are in balance with the increasing computation speeds, ASCI applications will spend far too much time storing and retrieving data. A global, parallel, scalable file system is required to deliver high speed and reliable I/O despite differences in the underlying low-level hardware and software. Pathforward is just beginning to explore partnerships in this area. |