|
|
Preamble My position on
research issues for large scientific databases is based on the following four
(simplified) observations: Observation 1: "Déjà vu" Looking back, I notice that the research challenges proposed for discussion during this workshop have been formulated repeatedly in one form or another during the past ten years. Yet there is little evidence that any of the work that has been carried out is being considered as basis for new developments. This tells me, that great care must be taken in the choice of realistic and relevant research topics and that these must be maintained and monitored over extended time periods ( > 6 years). I think the topics that Arie discussed in his paper are essentially a good starting point. Observation
2: Pass the buck Domain scientists – the customers for scientific databases – typically regard data management as a "technical" problem that is best delegated to computer scientists, the computer industry, to software providers, or any data processing personnel for that matter. (Before they had computers on their desks, however, domain scientists were very creative and productive in inventing methods to deal with their data). Computer scientists on the other hand often operate in a "l'art pour l'art" mode and avoid tests with real-life data or real-life problem settings, often, however, because domain scientists are unable or unwilling to formulate the necessary requirements. Observation 3:
Back to school There are no guidelines as to what every domain
scientist should know about IT. Hence we cannot, as a rule, reasonably expect
them to substantially contribute during the definition of data representations
or the creation of metadata. This is a serious deficiency as the modelling of
scientific data should be the overall responsibility of the domain scientist
(she or he knows best for what purpose the data are collected). Observation 4: Don't rock the
boat Domain scientists typically consider their work
done once they have developed (or adopted) models to generate data. As a rule,
they will not make the additional effort required to model the data when left on
their own. Consequently they are happy to just work with files, not databases.
Any database that will be developed for them after the data start to pour in
will not be used, no matter how much research went into it. Where
research on large scientific databases is needed
Where
EU/US collaboration is necessary For the definition of
IT-skills required by domain scientists a collaboration between national
standardization bodies is essential. I could imagine a
degree-program, where part of the instruction is provided in the US and another
part in the EU (e.g. a US-part emphasizing technology aspects and a EU part
concentrating on data analytical questions). How
to further cooperative research Establish a (possibly
virtual) coordinating body to facilitate information exchange and to provide the
functions of a broker (to allow linking of research projects across national
boundaries). Zurich, 31. August 1999 |