Gigabit Testbeds Final Report

4.6.3 General Support Tools

In addition to the specific application area investigations discussed above, the testbed work also included the development of software tools to support general distributed applications. This activity constituted an important part of the overall testbed effort, with the resulting tools providing ongoing utility to other projects.

DOME

Researchers at CMU in the Nectar testbed developed a system called DOME (Distributed Object Migration Environment) to perform dynamic load balancing and checkpointing in a distributed heterogeneous computing environment. Application programmers incorporate objects from the DOME object library into their High Performance FORTRAN or C++ programs using a single-program-multiple-data (SPMD) execution model, with special DOME data object classes defined to ease the work of programming. As a measure of the latter, DOME programs are typically much shorter than those using PVM for achieving distributed execution.

At runtime DOME distributes the program over the given set of machines and initiates execution, performing dynamic load balancing and checkpointing as the application proceeds. Time is divided into repeated cycles consisting of a work phase and a load balancing/checkpointing phase. Data on task execution times is collected on each machine during the work phase, and is then exchanged between neighbor machines during the balancing phase, with portions of DOME objects migrated to neighbors exhibiting better performance for a particular task. Checkpointing is accomplished in a machine-independent manner, so that a task can be restarted on a different machine architecture if necessary in the event of a machine or network failure.

To evaluate the network overhead costs introduced by DOME in carrying out the load balancing and checkpointing, experiments were run using a particular application on a workstation cluster. For a phase interval resulting in good load balancing, the total volume of traffic exchanged by the program increased 80%, with the traffic due to load balancing migration very bursty. Thus higher network data rates are required when using DOME to prevent the increased data exchanges from becoming a bottleneck and slowing overall execution.

Express

The Express effort in the Casa testbed had as its goal the development of a comprehensive software system for supporting applications running in distributed heterogeneous environments. Its starting point was existing software which had evolved out of parallel processing research at Caltech for supporting applications running on a single machine. Work during the testbed effort expanded on the original package to provide support for wide area metacomputing application algorithm design, code implementation and debugging, execution control, and performance analysis.

Communication among machines was accomplished using a message passing model, with both TCP/IP and raw HIPPI transfers supported by Express. A unified computation model was used in which the set of machines were seen by applications programmers as a single machine with multiple processors. Runtime control was centralized in an Express software "console" module which was run on one of the machines used for the application. Load balancing was provided by pre-assigning tasks based on user-supplied data about individual machine processing power, or by running a special test mode to obtain the data for the pre-execution assignments.

Express was implemented on all major machines used in Casa testbed application experiments: the Cray YMP, Cray C90, Intel Delta, Intel Paragon, TMC CM-5, and on selected workstations. In addition to its testbed use, the results of this work were incorporated into a new commercial version of Express for use by the general application community.

DICE/DTM

DICE, a Distributed Interactive Collaboration Environment, was developed by NCSA as part of their Blanca testbed work. It consists of a highly modular set of software objects which provide multiple-user control and data visualization for a simulation or other application running in a distributed environment. DICE separates the transmission of control and data streams as part of its communication management, applying the most appropriate handling to each stream, for example minimum latency or maximum bandwidth.

DTM, Data Transfer Mechanism, was developed by NCSA to provide a general communication environment for distributed heterogeneous applications. It uses a message-passing model consisting of block data transfers and a simple handshaking protocol for flow control, with TCP used to provide transmission reliability. DTM provides a simplified API for application programmers, while at the same time optimizing application communications by managing multiple TCP streams through a single DTM user socket.

Data transfers are optimized by DTM for the large data exchanges typical of high-end applications, and for the use of high speed networks. Messages are transferred directly in and out of application buffers without additional buffering by DTM, minimizing latencies which would otherwise be introduced due to copying activities. Data format conversions required between different machine architectures are handled directly by DTM, which contains special code to minimize conversion processing overhead. As noted in the section on Host I/O, this can be a significant source of processing overhead if standard vendor code is used.

BEE

To provide a general distributed application monitoring capability, CMU researchers in the Nectar testbed developed BEE (Basis for Distributed Event Environments). In large computationally intensive distributed environments, collecting and processing information about events can add large processing overheads to the program. BEE attacks this problem by removing event interpretation from the monitored program and performing it instead on a different node, minimizing local event processing. In addition, BEE provides a uniform platform for three kinds of event analysis: runtime monitoring, runtime analysis, and post-mortem analysis.

BEE allows event data to be collected in realtime from the different execution points of a distributed application and displayed using visualization techniques. Past events during the run can be included in the current visualization display, and more generally the visualization of a program's execution can be browsed at past time points without losing current visualization information through the use of caching and synchronization mechanisms.

A second version of BEE, called BEE++, was also created during the project. BEE++ is an object-oriented framework which allows application programmers to customize BEE's performance analysis tools. In addition, BEE++ allows users to dynamically control the monitoring process during execution.

HERMES

The HERMES near-realtime data acquisition tool mentioned earlier in the discussion of the global climate modeling work was also applied by SDSC to general application support. It provides support for programs written in FORTRAN and C and is designed to minimize the processing overhead introduced to applications running in distributed heterogeneous environments. HERMES provides the needed program hooks and data transmission and collection mechanisms, and uses the AVS software system for visualization processing and display. A second version of HERMES added functionality to allow interactive steering of distributed simulations and other features, and was demonstrated at Supercomputing 95.