|
|
Thrust Areas for EU/US collaboration Federation of Data Collections Science proceeds through the unification of experimental data into a coherent whole. In the same way, collaboration on large scientific databases should emphasize the federation of multiple databases. We should encourage projects with the following ingredients:
Authentication In academia, authentication is treated as a sticky, difficult problem that is often left to the end of the project and then ignored; and yet a successful authentication framework is one that is built into the system from the beginning. Without authentication, we cannot rise above "toy data", we cannot ingest, process, or deliver the data that real scientists are interested in, because this is generally not public. Clearly, in this age of hackers, security through obscurity is insufficient. We should encourage projects that demonstrate easy to use, yet strong, authentication schemes, including ways to issue usage-permission to valid users, ways to log usage, ways to provide different levels of authentication. A most important facet of this problem is authentication in distributed systems, so that a user only needs to log-in once, yet multiple, heterogeneous services can be instantiated as a result. In other words, we need models by which one private service, accessed by a user of given authentication level, can access another private service at the same level. Standard Objects and Services for Science In the past there has been much work on building standard file formats for representing scientific data, for example HDF, netCDF, as well as proprietary formats such as Matlab, Excel, IDL and others. We can think of these as serializations of data objects. There are also new ways to encapsulate such files and to extend them with the addition of metadata, using MIME and XML technologies. At the same time, distributed object systems such as CORBA, Java RMI, and Voyager are allowing machines to exchange objects directly between trusted systems. We should encourage projects which define these objects for particular disciplines, and which provide the software to create, transform, and combine them. Such hierarchical collections of objects include: arrays, parameters, relational tables, human-readable documents, code fragments and agents, authentication certificates, query objects. Once an object and its meaning is defined, it is always important to
In a distributed system, emphasis shifts from objects to services. A request object is sent to the service, and a response object is returned. Potential users of the service need to know that it exists, presumably through a discovery service, they need to know how to use it -- how to construct a request and what kinds of response objects are available. Services should be designed for use by either a human or a machine, meaning that the response may be cast as a structured document that the (machine) client can interpret. E-commerce tools In the academic community, we must not insulate ourselves from the enormous Internet industry, and what it can offer us. Business software, at best, is cheap (compared to a graduate student, or a supercomputer), well-documented, and robust. Unlike home-made software, new versions, with new features, appear regularly with no personal coding effort. All we need to do, for long-term projects, is to insulate ourselves from reliance on a single vendor by using open interfaces. We should consider supporting academic projects that are closely partnered with industry, especially when they are on opposite sides of the Atlantic. The support is definitely not a subsidy to the business plan of the industrial partner, but rather should fund the insulation of the academic enterprise from the industrial partner! Specifically, we should fund the development of an open interface and the corresponding broker software, by which the two can effectively collaborate, and so that others can also join the enterprise. Multilingual Interfaces Obviously there are many languages in Europe, but this is also true in the US. We should support projects that allow multilingual interfaces, perhaps through translation, or even by simple mechanisms such as different words written on the GUI components. XML is a technology designed for flexible presentation of structured data: we can use this flexibility to provide a language-specific interface. We could also consider projects that can utilize automatic, perhaps private, translation services, thus allowing outsourcing of the translation. We could encourage projects that mark up text for translation or that define open interfaces for the exchange of the knowledge bases and ontologies that are used in machine translation. Scalability Computing infrastructure is like a food pyramid. PCs and workstations with business software are the base layer, like rice and pasta; installed, specialized software brings us to the next level (fruits and vegetables); remote machines and servers are at the third level (meat, eggs, milk); and at the tip are supercomputers and tape robots (chocolate). We should be interested in projects that address the concerns of all levels of the infrastructure, so that a user can learn at the lowest level, then move up if necessary.
In each case, we must be careful not to address only the high-end, the chocolate of the food pyramid, but instead there must be emphasis on balance. References Extensible Scientific Interchange Language (XSIL): http://www.cacr.caltech.edu/XSIL Interfaces to Scientific Data Archives, an NSF Workshop: http://www.cacr.caltech.edu/isda |