`gossip`: Efficient Communication Primitives for Multi-GPU Systems

Gossip supports scatter, gather and all-to-all communication. To execute one of the communication primitives a transfer plan is needed. Use the provided scripts to generate optimized plans for your specific NVLink topology. The plans directory contains optimized plans for typical 4 GPU configurations (P100 and V100) as well as 8 GPU DGX-1 Volta. If no transfer plan is provided gossip will fall back to the default strategy using direct transfers between GPUs.

Gossip was presented at ICPP '19.

Using gossip

To use gossip clone this repository and check out the submodule hpc_helpers by calling git submodule update --init include/hpc_helpers. Include the header gossip.cuh in your project which provides all communication primitives. To parse transfer plans make use of the plan parser which can be compiled as a separate unit like in the example Makefile.

Examples

The example execute.cu executes gossip's communication primitives on uniformly distributed random numbers. The data is first split into a number of chunks corresponding to the number of GPUs (multisplit). The chunks sizes are displayed as a partiton table (row=source GPU, column=target GPU). Then the data is transferred between the GPUs according to the provided transfer plan. At the end it validates if all data reached the correct destinations.

The example simulate.cu allows to run the multi-GPU example above simulated on a single GPU.

Build example

Compile the example using the provided Makefile by calling git submodule update --init && make.

Requirements:

CUDA >= 9.2
GNU g++ >= 5.5 compatible with your CUDA version
Python >= 3.0 including
- Matplotlib
- NumPy

Run example

./execute (all2all|all2all_async) <transfer plan> [--size <size>] [--memory-factor <factor>]

./execute scatter_gather <scatter plan> <gather plan> [--size <size>] [--memory-factor <factor>]

Use ./simulate instead of ./execute if you want to simulate the example on a single GPU.

Mandatory:

Choose all2all (double buffered), all2all_async or scatter_gather mode
Provide path(s) to transfer plan(s) (one for all2all, two for scatter+gather)

Optional:

Choose data size (2^<size> 64-bit elements per GPU) (default: 28)
Choose memory factor (account for random transfer sizes) (default: 1.5)

Benchmark

For benchmark scripts and results see the benchmark directory.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
benchmark		benchmark
include		include
plans		plans
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
execute.cu		execute.cu
executor.cuh		executor.cuh
simulate.cu		simulate.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`gossip`: Efficient Communication Primitives for Multi-GPU Systems

Using gossip

Examples

Build example

Run example

Benchmark

About

Releases

Packages

Contributors 2

Languages

License

Funatiq/gossip

Folders and files

Latest commit

History

Repository files navigation

gossip: Efficient Communication Primitives for Multi-GPU Systems

Using gossip

Examples

Build example

Run example

Benchmark

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`gossip`: Efficient Communication Primitives for Multi-GPU Systems

Packages