High-Performance Programming with Distributed Shared Data
by Simon Dobson
Many problems in distributed computing may be stated in terms of
shared data for example a large matrix being generated by several
parallel processes, or a document being written co-operatively by several
people. Addressing these applications using low-level message passing places
a unacceptable burden of complexity on the programmer, but allows highly
efficient solutions to be developed; environments which provide distributed
shared memory often improve the programming abstraction only at the expense
of performance.
The goal of the TallShiP project a joint UK-funded research action
between CLRC and the University of Leeds was to advance the state
of the art in building distributed applications based on shared data by
addressing the properties of such systems which lead to inefficiencies.
Shared Abstract Data Types
Most distributed shared memory systems adopt an object model in which
the representation of objects is chosen from one of a small set of possibilities.
These choices limit the scope for optimisation due to stylised patterns
in the use of objects and enforce a single model of strong coherence on
all objects. In many cases applications may tolerate weaker degrees of
coherence, so that changes in a structure may not be immediately visible
to all objects sharing it.
TallShiP has adopted a model of Shared Abstract Data Types (SADTs) which
allows an object to be represented using any number of distributed implementation
strategies. The programmer accesses the SADT through an abstract interface,
and the exact representation of the SADT may be changed transparently to
accommodate different usage patterns or coherence guarantees. This allows
an SADT to deploy a number of important optimisations for example
to improve the proportion of operations which can occur without network
access without affecting the application-level code.


Some simple examples of this technique include moving data from 'mostly
producer' nodes to 'mostly consumer' nodes in the background using additional
processes (latency hiding), and applying a number of updates en masse to
reduce network traffic. More complex examples create local caches of objects
which are periodically refreshed and combination updates created using
the known algebraic properties of operations. When used carefully, these
optimisations allow shared data applications to approach the efficiency
of explicit message passing algorithms.
An important aspect of the SADT approach is the use of a modified process
algebra as a specification framework. This allows us to explore the effects
that different representation strategies have on the detailed behaviour
of a type.
Evaluation
We have used the language Modula-3 to implement a prototype library
of SADTs, each with a number of different representations. The effect of
weakening and usage optimisation is to reduce the costs in maintaining
the shared data abstraction. Essentially the weaker types are better suited
to distributed representation than their stronger variants.

For example in a parallel solver for the Travelling Salesman Problem
(TSP) a naïve implementation using centralised objects spends most
of the processors' time in the shared data structure rather than in actually
calculating the lengths of tours. By weakening the coherence model and
tailoring the type representations to the actual pattern of use without
changing the application code at all the cost of maintaining the
SADTs is reduced dramatically.
This reduction in costs translates to speed-ups in the execution of
such applications. For TSP on networks of workstations we have obtained
speed-ups of 8 on 14 processors very acceptable for such a fine-grained
problem and of 106 on 128 processors on a Cray T3D. More information
on TallShiP at http://www.dci.clrc.ac.uk/Activity.asp?TallShiP
Please contact:
Simon Dobson - CLRC
Tel: +44 1235 445867
E-mail: s.dobson@rl.ac.uk