
Center for Advanced Computing Research,
Caltech,
Pasadena, California.
Scientific data is being created at a great rate -- remote-sensing images of the Earth, high-resolution 3D images of the brain, astrophysical, social science, and ecological data -- but it seems to be available only to a limited priesthood of professionals in the field. Broadening the access will provide a clearer view of the world to those who are interested, whether they are research scientists or schoolchildren. Hence my interest in databases, user interfaces, public access to Government-funded data, and scientific visualization.
If it is raw data that is available, there must be software to mine it: to search for small signals in noisy data, to classify and count, to extract information from the data. This means that simple access is not sufficient, and computing services must also be there. Services that allow a user to think about the task at hand to the greatest possible extent, rather than thinking about translating their requirements into a language the machine can understand.
Often, however, new discoveries come about through cross-fertilization: comparison and cross-correlation between very different information sources. If this kind of activity is to happen with scientific databases, we must tackle to difficult activity that has come to be known as "federation": making a sophisticated, stand-alone software system interoperate with another such. Taking a known, evolved, trusted database or program, with its own ways of communication to the outside world, and changing and supplementing those enough so that the software can work with another database or program.