2 The Starting Point

2.1 A Brief History

Computer networking dates from the late 1960s, when affordable minicomputer technology enabled the implementation of wide-area packet switching networks. The Arpanet, begun in 1969 as a research project by DARPA, provided a focal point within the U.S. for packet network technology development. In the 1970s, parallel development by DARPA of radio and satellite-based packet networks and TCP/IP internetworking technology resulted in the establishment of the Internet. The subsequent introduction and widespread use of ethernet, token ring and other LAN technologies in the 1980s, coupled with the expansion of the Internet by NSF to a broader user base, led to increasing growth and a transition of the Internet to a self-supporting operational status in the 1990s.

Wide-area packet switching technology has from its inception made use of the telephone infrastructure for its terrestrial links, with the packet switches forming a network overlay on the underlying carrier transmission system. The links were initially 50 Kbps leased lines in the original Arpanet, progressing to 1.5 Mbps T1 lines in the NSFNET circa 1988 and 45 Mbps T3 lines by about 1992. Thus, at the time the gigabit testbed project began, Internet backbone speeds and large-user access lines were in the 50 Kbps to 1.5 Mbps range and local-area aggregate speeds were typically 10 Mbps or less. Individual peak user speeds ranged from about 1 Mbps for high-end workstations to 9.6 Kbps or less for PC modem connections.

The dominant application which emerged on the Arpanet once the network became usable was not what had been expected when the network was planned. Conceived as a vehicle for resource sharing among the host computers connected to the network, people-to-people communication in the form of email quickly came to dominate network use. The ability to have extended conversations without requiring both parties to be available at the same time, being able to send a single message to an arbitrarily large set of recipients, and automatically having a copy of every message stored in a computer for future reference proved to be powerful stimuli to the network's use, and is an excellent example of the unforeseen consequences of making a new technology available for experimental exploration.

The computer resource sharing which did take hold was dominated by two applications-namely, file transfer and remote login. Applications which distributed a problem's computation among computers connected to the network were also attempted and in some cases demonstrated, but they did not become a significant part of the original Arpanet's use. Packetized voice experiments were demonstrated over the Arpanet in the 1970s, but with limited applicability due to limited bandwidth and long store-and-forward transmission delays at the switches.

The connection of the NSF-sponsored supercomputer centers to the Internet in the late 1980s provided a new impetus for networked resource sharing and resulted in an increase of activity in this application area, but multi-computer explorations were severely limited by network speeds.

2.2 State of Very High-Speed Networking in 1989-90

Prior to the time the testbeds were being formed in 1990, very little hands-on research in gigabit networking was taking place. Work by carriers and equipment vendors focused primarily on higher transmission speeds rather than on networking. There was a good deal of interest in high-speed networking within the research community, consisting mostly of paper studies and simulations, along with laboratory work at the device level. Interest was stimulated in the telecommunications industry by ongoing work on the standardization of Broadband ISDN (B-ISDN), which was intended to eventually address user data rates from about 50 Mbps upwards to the gigabit/s region, within the scientific community, interest in remote data visualization and multi-processor supercomputer-related activities was high.

A few high speed technologies had emerged by 1989, most notably HIPPI and Ultranet for local connections between computers and peripherals. HIPPI, developed at Los Alamos National Laboratory (LANL), was in the process of standardization at the time by an ANSI subcommittee and had been demonstrated with laboratory prototypes. Ultranet was based on proprietary protocols, and Ultranet products were in use at a small number of supercomputer centers and other installations. Both technologies provided point-to-point links between hosts at data rates of 800 Mbps to 1 Gbps.

In wide-area networking, SONET (Synchronous Optical Network) was being defined as the underlying transmission technology for the U.S. portion of B-ISDN by ANSI, and its European counterpart SDH (Synchronous Digital Hierarchy) was undergoing standardization by the CCITT. SONET and SDH were designed to provide wide-area carrier transport at speeds from approximately 50 Mbps to 10 Gbps and higher, along with the associated monitoring and control functions required for reliable carrier operation. While non-standard trunks were already in operation at speeds on the order of a gigabit/s, the introduction of SONET/SDH offered carriers the use of a scalable, all-digital standard with both flexible multiplexing and the prospect of ready interoperability among equipment developed by different vendors.

A number of high-speed switch designs were underway at the time, most focused on ATM cell switching. Examples of ATM switch efforts included the Sunshine switch design at Bellcore and the Knockout switch design at AT&T Bell Labs. Exploration of variable length packet switching at gigabit speeds was also taking place, most notably by the PARIS (later renamed Planet) switch effort at IBM. These efforts were focused on wide-area switching environments - investigation of ATM for local area networking had not yet begun.

Computing performance in 1990 was dominated by the vector supercomputer, with highly parallel supercomputers still in the development stage. The fastest supercomputer, the CRAY-YMP, achieved on the order of 1-2 gigaflops in 1990, while the only commercial parallel computer available was the Thinking Machines Corporation CM-2. Workstations had peak speeds in the 100 MIPS range, with PCs in about the 10 MIPS range. I/O interfaces for these machines consisted mainly of 10 Mbps ethernet and other LAN technologies with similar speeds, with some instances of 100 Mbps FDDI beginning to appear.

Optical researchers were making significant laboratory advances by 1990 in the development of optical devices to exploit the high bandwidth inherent in optical fibers, but this area was still in a very early stage with respect to practical networking components. Star couplers, multiplexors, and dynamic tuners were some of the key optical components being explored, along with several all-optical local area network designs.

The data networking research community had begun to focus on high-speed networking by the late 1980s, particularly on questions concerning protocol performance and flow/congestion control. New transport protocols such as XTP and various lightweight protocol approaches were being investigated through analysis, simulation, and prototyping, and a growing amount of conference and journal papers were focusing on high-speed networking problems.

The regulatory environment which existed in 1990, at the time the Gigabit Testbed Initiative was formed, was quite different from that which is now evolving. A regulated local carrier environment existed consisting of the seven regional Bell operating companies (RBOCs) along with some non-Bell companies such as GTE, which provided tariffed local telephone services throughout the U.S. Long distance services were being provided by AT&T, MCI, and Sprint in competition with each other. Cable television companies had not yet begun to expand their services beyond simple residential television delivery, and direct broadcast satellite services had not yet been successfully established. And while some independent research and development activities had been established within some of the RBOCs, the seven regional carriers continued to fund Bellcore as their common R&D laboratory.

With the passage of the Telecommunications Act of 1996, a more competitive telecommunications industry now seems likely. Mergers and buy-outs among the RBOCS are taking place, cable companies have begun to offer Internet access, and provisions for Internet telephony have begun to be accommodated by Internet service providers.

2.3 Gigabit Networking Research Issues

When the initiative began in 1990, many questions concerning high-speed networking technology were being considered by the research community. At the same time, telephone carriers were struggling with the question of how big the market, if any, might be for carrier services which would provide a gigabit/s service to the end-user. Cost was a major concern here. Research issues existed in most, if not all, areas of networking, including host I/O, switching, flow/congestion and other aspects of network control, operating systems, and application software. Two major questions underlie most of these technical issues: (1) could host I/O and other hardware and software operate at the high speeds involved? and (2) would speed of light delays in WANs degrade application and protocol performance?

These issues can be grouped into three general sets, which are discussed separately below:

· network issues

· platform issues

· application issues

Network Issues

A basic issue was whether existing conceptual approaches developed for lower speed networking would operate satisfactorily at gigabit speeds.Implementation issues were also uppermost in mind. For example, would a radically different protocol design allow otherwise unachievable low-cost implementations. However, most of the conceptual issues were driven by the fact that speed-of-light propagation delay across networks is constant, while data transmission times are a function of the transmission speed.

At a data rate of 1 Gbps, it takes only one nanosecond to transmit one bit, resulting in a link transmission time of 10 microseconds for a 10 kilobit packet. In contrast, for the 50 Kbps link speeds in use when the Arpanet was first designed, the same 10 kilobit packet has a transmission time of 200 milliseconds. The speed-of-light propagation delay across a 1000-mile link for either case, on the other hand, is on the order of 10 milliseconds. The result is that, whereas in the Arpanet case propagation delay is more than an order of magnitude smaller than the transmission time, in the gigabit network the propagation time is more than three orders of magnitude larger than the transmission time!

This difference has both positive and negative consequences. On the positive side, store-and-forward delays introduced by packet switches and routers along an end-to-end path are directly related to transmission time, causing them to become very small at gigabit speeds (barring unusual queuing situations). This removes a major problem inherent in the early Arpanet for packetized voice and other traffic having low delay requirements, since at gigabit speeds the resulting cumulative transmission delays effectively disappear relative to the propagation delay over wide-area distances.

On the negative side, the very small packet transmission time means that information sent to the originating node for feedback control purposes may no longer be useful, since the feedback is still subject to the same propagation delay across the network. Most networks in place in 1990, and particularly the Internet, relied on window-based end-to-end feedback mechanisms for flow/congestion control, for example that used by the TCP protocol. At 50 Kbps, a 200 millisecond packet transmission time meant that feedback from a destination node on a cross-country link could be returned to the sender before it had completed the transmission, causing further transmissions to be suppressed if necessary. At 1 Gbps, this type of short-term feedback control is clearly impossible for link distances of a few miles or more.

The impact of this feedback delay on performance is strongly related to the statistical properties of user traffic. If the peak and average bandwidth requirements of individual data streams are predictable over a time interval which is large relative to the network's roundtrip propagation delay, then one might expect roundtrip feedback mechanisms to continue to work well. On the other hand, if the traffic associated with a user `session', such as a file transfer, persists only for a duration comparable to or less than the roundtrip propagation time, then end-to-end feedback will be ineffective in controlling that stream relative to events occurring within the network while the stream is in progress. (And while we might look to the aggregation of large numbers of users to provide statistical predictability, the phenomenon of self-similar data traffic behavior has brought the prospect of aggregate data traffic predictability into question.)

Another control function impacted by the transmission/propagation time ratio is that of call setup in wide-area networks using virtual circuit (VC) mechanisms, for example in ATM networks. The propagation factor in this case can result in a significant delay before the first packet can be sent relative to what would otherwise be experienced. Moreover, for cases in which the elapsed time from the first to last packet sent is less than the VC setup time, inefficient resource utilization will typically result.

The transmission/propagation time ratio also impacts local area technologies. The performance of random access networks such as ethernet is premised on this ratio being much greater than one, so that collisions occurring over the maximum physical extent of the network can be detected at all nodes in much less than one packet transmission time. A factor of 100 increase from the original ethernet design rate of 10 Mbps to 1 Gbps implies that the maximum physical extent must be correspondingly reduced or the minimum packet size correspondingly increased, or some combination of the two, in order to use the original ethernet design without change.

More generally, as new competing technologies such as HIPPI or all-optical networks are introduced to deal explicitly with gigabit speeds, and with the prospect of still higher data rates in the future, issues of scalability and interoperability become increasingly important. Questions of whether ATM and SONET can scale independently of data rate or are in fact constrained by factors such as propagation delay, whether single-channel transmission at ever higher bit rates or striping over lower bit-rate multiple channels will prove more cost-effective, and how interoperability should best be achieved are important questions raised by the push to gigabit networking and beyond.

Along a somewhat different dimension, the proposed use of distributed shared memory (DSM) as a wide-area high speed communication paradigm instead of explicit message passing raised a number of issues. DSM attempts to make communication among a set of networked processors appear the same as if they were on a single machine using shared physical memory. A high bandwidth is required between the machines to allow successful DSM operation, and this had been achieved for local area networking environments. Issues concerning the application of DSM to a wide-area gigabit environment included how to hide speed-of-light latency so that processors do not have to stop and wait for remote memory updates and how far DSM could/should extend into the network; for example, should DSM be supported within network switches? Or, at the other extreme, should it exist only above the transport layer to provide a shared memory API for application programmers.

Platform Issues

A second set of issues concerns the ability of available computer and other technologies to support protocol processing, switching, and other networking functions at gigabit speeds. We use platform here very generally to mean the host computers, switching nodes internal to a network, routers or gateways which may be used for network interconnection, and specialized devices such as low level interfacing equipment.

For host computers the dominant question is the amount of resources required to carry out host-to-host and host-to-network protocol processing -- in particular, could the computers available in 1990 support application I/O at gigabit rates, and if not at what future point might they be expected to?

Because of the dominance of TCP/IP in wide-area data networking by 1990, a question frequently asked was whether TCP implementations would scale to gigabit/s operation on workstation-class hosts. Some researchers claimed it would not scale and would have to be replaced by a new protocol explicitly designed for efficient high speed operation, in some cases using special hardware protocol engines. Others did not go to this extreme, but argued that outboard processing devices would be required to offload the protocol processing burden from the host, with the outboard processing taking place either on a special host I/O board or on an external device. Still others held that internal TCP processing at gigabit rates was not a problem if care was taken in its implementation, or that hardware trends would soon provide sufficient processing power.

For network switching nodes, a key question in 1990 was whether hardware switching was required or software-based packet switching could be scaled up to handle gigabit port rates and multi-gigabit aggregate throughputs. Another important question was how much control processing could reasonably be provided at each switch for flow/congestion control and Quality-of-service algorithms that require per-packet or per-cell operations. Routers and gateways were subject to much the same questions as internal network switches.

Switching investigations were largely focused on detailed architectural choices for fixed-size ATM cell switching using a hardware paradigm, with the view that the fixed size allowed cost-effective and scalable hardware solutions. Issues concerned whether a sophisticated Batcher-Banyan design was necessary or relatively simple crossbar approaches could be used, how much cell buffering was needed to avoid excessive cell loss, whether the buffers should be at the input ports, output ports, intermediate points within the switch structure, or some combination of these choices, and whether input and output port controller designs should be simple or complex.

For variable-length PTM switching, issues concerned how to develop new software/hardware architectures to distribute per-port processing at gigabit rates while efficiently moving packets between ports, and how to implement network control functions within the new architectures. A key question was how much, if any, specialized hardware is necessary to move packets at these rates.

Other platform issues concerned the cost of achieving gigabit/s processing in specialized devices such as those needed for interworking different transmission technologies or for SONET crossconnect switching, and whether it was reasonable to accomplish these functions by processing data streams at the full desired end-to-end rate or alternatively to stripe the aggregate rate over multiple lower speed channels.

Software issues also existed within host platforms over and above transport and lower layer protocol processing. One set of issues concerned the operating system software used by each vendor, which like most platform hardware was designed primarily to support internal computation with little, if any, priority given to supporting efficient networking. In addition to questions concerning the environment provided by the operating system for general protocol transactions, an important issue concerned the introduction of multimedia services by external networks and whether sufficiently fast software response times could be achieved for passing real-time traffic between an application and the network interface.

Another host platform software issue concerned the presentation layer processing required to translate between data formats used by different platforms, for example different floating point formats -- because the translation must in general be applied to each word of data being transferred, it had the potential for being a major bottleneck.

Highly parallel distributed memory computer architectures which were coming into use in 1990 presented still another set of software issues for gigabit I/O. These architectures consisted of hundreds or thousands of individual computing nodes, each with their own local memory, which communicated with each other and the external world through a hardware interconnection structure within the computer. This gave rise to a number of questions, for example whether TCP and other protocol processing should be done by each node or by a dedicated I/O node or both, how data should be gathered and disseminated between the machine I/O interfaces and each internal node, and how well the different hardware interconnect architectures being used could support gigabit I/O data rates.

Application Issues

The overriding application concern for host-to-host gigabit networking was what classes of applications could benefit from such high data rates and what kind of performance gains or new functionality could be realized.

Prior to the Initiative, many people claimed to have applications needing gigabit/s rates, but most could not substantiate those claims quantitatively. It was the competition for participation in the Initiative that led to ideas for applications that required ~ Gb/s to the end user. Essentially all the applications which were selected had in common the need for supercomputer-class processing power, and these fell into two categories: 'grand challenge' applications in which the wall-clock time required to compute the desired results on a single 1990 supercomputer typically ranged from days to years, and interactive computations in which one or more users at remote locations desired to interact with a supercomputer modeling or other computation in order to visually explore a large data space.

The main issue for grand challenge applications was whether significant reductions in wall-clock solution time could be achieved by distributing the problem among multiple computers connected over a wide-area gigabit network. Here again, speed-of-light propagation delay loomed large -- could remote processors exchange data over paths involving orders of magnitude larger delays than that experienced within a single multiprocessor computer and still maintain high processor utilization?

While circumventing latency appeared to be a major challenge, another approach offered the promise of major improvements for distributed computing in spite of this problem. This was the prospect of partitioning an application among heterogeneous computer architectures so that different parts of the problem were solved on a machine best matched to its solution. For example, computations such as matrix diagonalizations were typically fastest on vector architectures, while computations such as matrix additions or multiplications were fastest on highly parallel scalar architectures. Depending on the amount of computation time required for the different parts on a single computer architecture, a heterogeneous distribution offered the possibility of superlinear speedups. (One definition of superlinear speedup is "an increase by more than a factor of N in effective computation speed, using N machines over a network, over that speed which the fastest of the N machines could have achieved by itself.)

Thus issues for this application domain included how to partition application software so as to maximize the resulting speedup for a given set of computers, which types of computers should be used for a particular solution, what computation granularities should be used and what constraints are imposed by the application on the granularities, and how to manage the overall distributed problem execution. The last question required that new software tools be developed to assist programmers in the application distribution, provide run-time execution control, and allow monitoring of solution progress.

The second class of applications, interactive computations, can range from a single user interacting with a remote supercomputer to a large number of collaborators sharing interactive visualization and control of a computation, which is itself distributed over a set of computing resources as described above and which may include very large distributed datasets. An important issue for this application class is determining acceptable user response times, for example 100 milliseconds or perhaps one second elapsed time to receive a full screen display in response to a control input. This should in general provide more relaxed user communication delay constraints than the first application class, since these times are large enough to not be significantly impacted by propagation delay, and will also remain constant as future computation times decrease due to increased computing power.

Other issues for remote visualization include where to generate the rendering, what form the data interface should take between the data generation output and the renderer, how best to provide platform-independent interactive control, and how to integrate multiple heterogeneous display devices. For large datasets, an important issue is how to best distribute the datasets and associated computational resources, for example performing preprocessing on a computer in close proximity to the dataset and moving the results across the network versus moving the unprocessed data to remote computation points.

Each of the above issues were examined in a variety of networking and application contexts and are described more fully in the referenced testbed reports. The investigations and findings are summarized in Section 4.