Advances in both laboratory- and field-data acquisition, as well as computing, present us with an embarrassment of riches; it is difficult to store and archive, process and analyze, as well as visualize and absorb the large amounts of data required for progress in many areas of research. In many disciplines, such as fluid turbulence, astronomy, global-climate modeling, biology and neuroscience, high-energy physics, and others, the nature of various thrusts requires dealing with very large amounts of data. Often, the capability of acquiring and generating the requisite data sets is at hand. The ability and infrastructure to handle such data sets is, however, seriously lagging. The Distributed Teravoxel Data System: Acquisition, Networking, Archiving, Analysis, and Visualization, proposes to address this deficiency through the development of generic acquisition and processing capabilities to be hosted at Caltech and made available to both Caltech and outside collaborators.

Driven by investigations of flow turbulence, the Teravoxel project is designed to handle both laboratory and numerical-simulation data. As part of the laboratory-support, we will extend the capabilities of the recently developed, 10242-pixel KFS digital-imaging system to higher speeds (103 frames/s; 109 voxels/s) and field-data storage (1.7Tbytes). Networking, archiving, analysis, and visualization infrastructure necessary to analyze turbulence data of the type acquired by the KFS imaging system, as well as from numerical simulations, and from other applications will also be developed. The infrastructure will support analysis and visualization of terascale experiment or simulation datasets and their comparison to validate theories and simulation results. The infrastructure will, in particular, support transfer of terascale datasets across campus from the experimental facilities of the Graduate Aeronautical Laboratories (GALCIT) to the computational, storage, and visualization facilities of the Center for Advanced Computing Research (CACR). This infrastructure will leverage and be integrated with CACR'S resources. Present and anticipated CACR resources include high-performance networking, large-scale data archives, and successive generations of both Beowulf-class clusters and Hewlett-Packard's high-performance shared-memory systems. These resources, combined with the proposed Teravoxel system, will build on CACR's tradition of harnessing new technologies to create innovative large-scale computing environments.

The anticipated impact of the proposed projects is considerable. In reference to turbulence, the primary research focus, it will permit large data sets to be analyzed, enabling an assessment of available turbulence theories and validation of numerical simulations. Absent the proposed capability, substantial progress in turbulence is unlikely. For ground-based astronomical observations, it will extend the power of ground-based telescopes and enable a new atmospheric turbulence probe. Other research, ranging from life-sciences to radio astronomy will also benefit. The proposed resources will serve as a nucleus for researchers from many areas, facilitating technology transfer and scientific collaborations. Educationally, the impact will be greater yet. Experimentalists and computational analysts must participate in research and data analysis, with tools that are imperfect, incomplete and, generally, not up to the task to develop intuition. The proposed infrastructure, in conjunction with the NSF-sponsored Multiresolution Visualization Tools of Large Datasets project, will permit porting the necessary information to classroom environments, revolutionizing teaching.

The project brings together and builds on resources of major research activities at Caltech, within the Graduate Aeronautical Laboratories (GALCIT), the Center for Advanced Computing Research (CACR), Computer Science's Computer Graphics Group, the Beckman Institute (Chemistry and Life-Science centers), the Lee Center for Advanced Networking, Chemistry, and Applied Mathematics. These organizations have a record of collaborative work exemplified the Center for Simulation of the Dynamic Response of Materials, sponsored by the Academic Strategic Alliance Program of the Department of Energy's Accelerated Strategic Computing Initiative (ASCI). The organizations also have an exemplary record of collaboration with their colleagues at Caltech's Jet Propulsion Laboratory (JPL) and other academic, Federal, and industrial research centers. This project initiates collaboration with Compaq's Tandem Laboratories and Mitsubishi's RT Viz Group. In addition to continuing collaborations with JPL, it builds on existing collaborations with Argonne National Laboratory, Lawrence Livermore National Laboratory, U. Illinois - Urbana/Champaign, Princeton, Stanford, Intel, and Hewlett-Packard.

This material is based upon work supported by the National Science Foundation under Grant No.EIA-0079871. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).