us-eu.small.jpg (31333 bytes)

 

 

Response to Arie Shoshani
Fabrizio Gagliardi, CERN
August 1999

I fundamentally agree with Arie's paper and its conclusions: the area where it will make sense to concentrate research and development efforts for large scientific data sets is the area of data management and data access.

In HEP, for instance, the OO model is the one, which is becoming more and more popular because of the easy code share and code reuse features. This is very important for a community which is world-wide wide spread. On the other hand, this model normally requires direct access to the entire data set.

This requirement is a major obstacle to scalability. We are already managing several 100's of TB data sets at CERN for the current generation of HEP experiments. These data sets are predicted to expand to several PB/year at the time of the LHC (2005 and beyond).

Hardware of both secondary (disks) and tertiary (tapes) storage is becoming more and more affordable.

We are buying disks at less than 60 USD/GB and tapes at less than 1 USD/GB.

Unfortunately there are not commercial products, which offer appropriate tools to manage and access data sets with sizes over few Tbs. Scalability of the tools/environments is one of the main issues.

HPSS is (has been…) a promising system which could scale up to the required levels but it still far to be a commercial product and the largest installation are below a 100 TB size.

A popular OODBMS in HEP is Objectivity but once again there is no example of installation above few 10s TB.

These two systems are also based on data models, which are basically incompatible. One interesting issue to pursue is the development of data models integrated with mass storage models to preserve performance (in term of access speed) and scaleable sizes.

Another area, which has been underestimated up today, is the overall issue of resilience and reliability.

Tertiary storage and cheap secondary storage are intrinsically unreliable. Tapes, which are good enough for conventional back-up and restore applications are hardly adequate when the system starts to become much larger. The present CERN system with over 20'000 tape mounts per week and over 4 TB of data movement per day requires constant service by both CERN and manufacturer experts. The failures are several per day, mostly recovered with no data loss but at the expense of heavy and intense labour efforts.

Given the niche nature of this tertiary storage market, technology is evolving very slowly and I am afraid the situation will be very much the same in 2005. Therefore smarter software needs to developed to cope with no 100% reliable mass storage hardware. The system of the future must stress resilience to hardware faults and scalability.

The latter needs to be implemented using parallelism more than high performance components. The dimensions of the requirements in terms of mass storage are such that only commodity-based systems will be financially affordable, at least this will be true for HEP.

Another issue is the intrinsic difference between today secondary storage (disks) and tertiary (tapes). The former is random access and improving its price/capacity ratio by 40%/year. The latter is strictly sequential and hardly improving (less than 10%/year).

If no technology breakthrough happens (and there is no one in sight today) then may be the entire model of data management should be revised with large data sets maintained on RAID 5 disks and tapes only used for back-ups. This model could also tolerate less reliable but cheaper tapes.

Concerning the statement than HEP experiments are limited to only 1% data collection because of the cost of storage I think that in fact experiments are designed to capture only the interesting data. This is a small fraction of the data that the detectors are capable to record, but a very complex system of data selection is built in the whole data acquisition chain to reduce the amount of data stored to the interesting part only.

Even if the storage cost would be dramatically reduced still it will not make sense to record 100% of the data which are going to be discarded later on anyway.