An Introduction to the CS-TR Project

The Computer Science Technical Reports (CS-TR) project, begun in 1992, embodied a vision of seamless access to a network of distributed information. With support from the Advanced Research Projects Agency (ARPA), the project has begun the process of creating network access to archives of technical information in the domain of computer science. Out of the project has come an architecture, an archive, and a series of experimental tools that bring the vision somewhat more clearly into focus.

The initial focus of this effort, organized and led by CNRI, was to develop a corpus of digitized material from the collections of computer science technical reports held by five universities with leading programs in computer science: Carnegie Mellon University, Cornell University, Massachusetts Institute of Technology, Stanford University, and the University of California at Berkeley. These reports are being made available in page image and other formats to the CS-TR project members, pursuant to agreements reached early in the project, and to other outside users. Equally important, this effort was intended to serve as a template for similar efforts in other related fields. In addition, CNRI and each of the universities were funded to carry out a program of research and development on digital libraries that could make use of this corpus of information and which would help to evolve our knowledge in the field of information storage, search, and retrieval.

During the course of this project, approximately 5,000 technical reports were digitized and made available on the network. Significant efforts were undertaken to address issues of concern to users, developers, and librarians, which are implicit in the transition from a paper-based system to an interactive, electronic system. Extensive consideration was given to issues of intellectual property rights, including copyright, and to the management of rights; a brief description of these intellectual property issues will be available shortly. A means of communicating certain bibliographic information was described in RFC 1807 and is used to inform different network servers of the existence of CS-TR material over the net. Sharing of these reports is facilitated by a system developed by Cornell University, called Dienst. Some other examples of the research to which CS-TR contributed include: Carnegie Mellon University's Lycos and Informedia projects; MIT's work on large-scale, high-resolution scanning workflow, canned document descriptions, and automatic linking; Stanford's SIFT, GLOSS, and SCAM; and UC Berkeley's "textiles" and "tilebars", the Lassen and Tioga interfaces, and work on "multi-valent documents".

The Corporation for National Research Initiatives (CNRI) participated technically in the CS-TR project in two principal ways.

(1) By developing architectural ideas for linking heterogeneous libraries and by interacting with other participants (particularly UC Berkeley) in further defining and refining the architecture; and

(2) By designing and developing a copyright registration and recordation system in coordination with the Copyright Office of the Library of Congress. This system accommodates deposits of copyright works in digital form over the Internet and is based on the architectural framework described above.

As part of the architectural framework effort, several key concepts emerged:(1) digital objects as the basic elements of the system; (2) handles as unique, location-independent identifiers; and (3) repositories to provide persistent storage for digital objects CNRI has deployed a Global Handle Registry on the Internet for use in mapping handles to repositories, and the Dienst system at Cornell is being re-engineered to provide a modular system that is compatible with this architectural framework. Several groups are collaborating on the design of interoperable secure repositories.

The output of the research effort will be made available for use in a variety of settings. One of these is the NCSTRL project, which will seek to broaden its base of participation to include other participants with significant collections of computer science technical reports.

Robert E. Kahn
Corporation for National Research Initiatives
December 11, 1995