Distributed Integration Testbed

The Distributed Integration Testbed Project was a three year effort that began in July 1998. It embraced a set of inter-related activities intended to enhance the effectiveness of information management research and accelerate interoperability between independent digital libraries. A description of the project is provided below.

The D-Lib Test-Suite

The Test-Suite is a distributed testbed that will incorporate a set of existing testbeds and be made available to all DARPA-approved researchers. Since most problems in information management research are scale-related, a number of DARPA-sponsored projects have devoted major resources to building testbeds. The testbeds have made significant contributions to research by the groups that developed them, but they are not yet being fully utilized. The availability of this Test-Suite will accelerate the research cycle, by enabling research groups to carry out research without needing to develop new testbeds, and, hence, reduce the size of research group needed to carry out effective research in this field.

Metrics for Information Management Research

The distributed testbed will be used for quantitative evaluation of research. Information management research is still at an exploratory stage in its evolution, rarely going beyond informal demonstration of methods and results. Frequently, different groups are carrying out research on closely related topics. Research in related disciplines, such as speech recognition and the TREC conferences, has shown the benefits of quantitative evaluation. Following the examples of success in other fields, the approach will be to provide researchers with a set of criteria and an evaluation package, which includes data from the Test-Suite, to measure the effectiveness of their efforts. To permit systematic measurements and comparison of varying approaches to interoperability, the datasets in the integration testbed will be stable, with statistics about the information they contain carefully recorded.

Interoperability Research

Interoperation at both the syntactic and semantic level is a major challenge in building very large-scale, decentralized, networked applications. Such systems have two characteristics that inhibit interoperation. The first is decentralized control; various system components are created for different purposes, at different times, by groups with differing needs and technical approaches. The second is that the systems have widely diverse levels of maturity; there is no steady state; a wide variety of legacy systems, state-of-the-art, and experimental systems must coexist.

In traditional distributed systems, interoperation is achieved through standardization. This requires extensive agreement on protocols, naming schemes, formats, APIs, class hierarchies, and similar interfaces. Such tight standardization among all components is rarely obtained. This project is taking an alternative approach to interoperation, based on a minimal level of prior agreements. In a large-scale, decentralized system, the set of capabilities that are common to all system components is small, but any specific pair of components may share a significant set of interfaces and capabilities. Therefore, the approach followed by this effort is to negotiate

Information Capture of On-Line Research Information

This research is addressing two problems that impact research. The rapid growth of on-line information has enabled DARPA-funded researchers to disseminate their research much more rapidly than in the past. For example, all six projects of the NSF/DARPA/NASA Digital Libraries Initiative (DLI) maintain comprehensive Web sites. Such sources of information provide many benefits, but little thought has been given to preserving these valuable resources beyond the end of the projects that created them. The typical fate of a research web site is to be frozen at the end of a project, links get broken and key resources move to other locations; eventually the server itself is taken down and the information is lost. This effort is develop new tools and processes for creating, maintaining, and retaining such on-line sites, building on recent research in information management. Capturing such information will have a significant impact on the ability of others to build on DARPA-funded research, either through technology transfer, or as a basis for further research.