|
|
Imagine that we could fabricate small sensors for temperature, pressure and position for ten cents each. The applications for such technology in scientific investigation would be vast: spray them onto trees in a forest to study microclimates; dump a bucket of them into the ocean to map currents in detail; strew them behind a planetary explored to track diurnal and seasonal changes. Now imagine the data stream produced by, say, a few million deployed sensors. The average cost of acquiring a byte of scientific data is on a steep downward trend. It would be foolhardy to imagine that we can manage all scientific data in the future by the traditional approach of capturing it all on storage media and retrieving later for analysis. We should think about ways to process, partition, fuse and disseminate such data to a widely distributed body of investigators, without it first having to cross a disk platter or tape surface. We should be thinking about alternative models where such data is routed in near real time to interested clients, which can aggregate it, comb it for particular patterns or detect events that trigger capture of portions of it. To move in this direction, we need to think about data architectures that are "net centric" rather than "disk centric," and where the emphasis is data movement rather than data storage. Such a shift raises numerous issues about how data management systems should be constructed, including:
One project that is investigating net-centric data management, though not specifically focused on scientific data, is the NIAGRA project underway at the University of Wisconsin (David DeWitt, Jeffrey Naughton) and Oregon Graduate Institute (David Maier).
|