us-eu.small.jpg (31333 bytes)

 


Don't Forget the Students!
Dr. H. Hinterberger, Institute of Scientific Computing, ETH Zürich

Preamble

My position on research issues for large scientific databases is based on the following four (simplified) observations:

Observation 1: "Déjà vu"

Looking back, I notice that the research challenges proposed for discussion during this workshop have been formulated repeatedly in one form or another during the past ten years. Yet there is little evidence that any of the work that has been carried out is being considered as basis for new developments. This tells me, that great care must be taken in the choice of realistic and relevant research topics and that these must be maintained and monitored over extended time periods ( > 6 years). I think the topics that Arie discussed in his paper are essentially a good starting point.

Observation 2: Pass the buck

Domain scientists – the customers for scientific databases­­ – typically regard data management as a "technical" problem that is best delegated to computer scientists, the computer industry, to software providers, or any data processing personnel for that matter. (Before they had computers on their desks, however, domain scientists were very creative and productive in inventing methods to deal with their data). Computer scientists on the other hand often operate in a "l'art pour l'art" mode and avoid tests with real-life data or real-life problem settings, often, however, because domain scientists are unable or unwilling to formulate the necessary requirements.   

Observation 3:  Back to school

There are no guidelines as to what every domain scientist should know about IT. Hence we cannot, as a rule, reasonably expect them to substantially contribute during the definition of data representations or the creation of metadata. This is a serious deficiency as the modelling of scientific data should be the overall responsibility of the domain scientist (she or he knows best for what purpose the data are collected).

Observation 4: Don't rock the boat

Domain scientists typically consider their work done once they have developed (or adopted) models to generate data. As a rule, they will not make the additional effort required to model the data when left on their own. Consequently they are happy to just work with files, not databases. Any database that will be developed for them after the data start to pour in will not be used, no matter how much research went into it.

Where research on large scientific databases is needed

Investigate ways to standardize requirements for IT-courses in the curricula of the domain sciences with emphasis in data modelling and use of databases. Develop, implement and evaluate pilot-courses at the undergraduate and the graduate levels (extending it to include continuing education).
In order to find profitable research topics, it might be worth while to investigate in which direction efforts should be undertaken in order to get domain scientists to use databases as a rule rather than as an exception. (Successful projects and the profiles of the qualifications of domain scientists who routinely apply databases might be a starting point). In other words, user-oriented research projects (with tightly coupled interdisciplinarity as a prerequisite) should, in my opinion, be given a high priority.
In the "traditional" research agenda I would push the development of efficient data management structures that support flexible, multidimensional accesses. With "efficient" I mean several characteristics: they must be fast in overall performance, they must take advantage of multiprocessor architectures (not highly parallel machines, but regular workstations with more than one processor) and they should support feature extraction.
If r&d in data visualization is left to commercial interests, I am afraid that little will be done w.r.t the visualization of data structures. This type of visualization, however, can substantially contribute to data mining techniques and also support the selection of subsets of large data collections. I therefore consider visualization techniques to be an important research topic.

Where EU/US collaboration is necessary

For the definition of IT-skills required by domain scientists a collaboration between national standardization bodies is essential.

I could imagine a degree-program, where part of the instruction is provided in the US and another part in the EU (e.g. a US-part emphasizing technology aspects and a EU part concentrating on data analytical questions).

How to further cooperative research

Establish a (possibly virtual) coordinating body to facilitate information exchange and to provide the functions of a broker (to allow linking of research projects across national boundaries).

 

 

Zurich, 31. August 1999