Skip to content
/ gossip Public

gossip: Efficient Communication Primitives for Multi-GPU Systems

License

Notifications You must be signed in to change notification settings

Funatiq/gossip

Repository files navigation

gossip: Efficient Communication Primitives for Multi-GPU Systems

Gossip supports scatter, gather and all-to-all communication. To execute one of the communication primitives a transfer plan is needed. Use the provided scripts to generate optimized plans for your specific NVLink topology. The plans directory contains optimized plans for typical 4 GPU configurations (P100 and V100) as well as 8 GPU DGX-1 Volta. If no transfer plan is provided gossip will fall back to the default strategy using direct transfers between GPUs.

Gossip was presented at ICPP '19.

Using gossip

To use gossip clone this repository and check out the submodule hpc_helpers by calling git submodule update --init include/hpc_helpers. Include the header gossip.cuh in your project which provides all communication primitives. To parse transfer plans make use of the plan parser which can be compiled as a separate unit like in the example Makefile.

Examples

The example execute.cu executes gossip's communication primitives on uniformly distributed random numbers. The data is first split into a number of chunks corresponding to the number of GPUs (multisplit). The chunks sizes are displayed as a partiton table (row=source GPU, column=target GPU). Then the data is transferred between the GPUs according to the provided transfer plan. At the end it validates if all data reached the correct destinations.

The example simulate.cu allows to run the multi-GPU example above simulated on a single GPU.

Build example

Compile the example using the provided Makefile by calling git submodule update --init && make.

Requirements:

  • CUDA >= 9.2
  • GNU g++ >= 5.5 compatible with your CUDA version
  • Python >= 3.0 including
    • Matplotlib
    • NumPy

Run example

./execute (all2all|all2all_async) <transfer plan> [--size <size>] [--memory-factor <factor>]

./execute scatter_gather <scatter plan> <gather plan> [--size <size>] [--memory-factor <factor>]

Use ./simulate instead of ./execute if you want to simulate the example on a single GPU.

Mandatory:

  • Choose all2all (double buffered), all2all_async or scatter_gather mode
  • Provide path(s) to transfer plan(s) (one for all2all, two for scatter+gather)

Optional:

  • Choose data size (2^<size> 64-bit elements per GPU) (default: 28)
  • Choose memory factor (account for random transfer sizes) (default: 1.5)

Benchmark

For benchmark scripts and results see the benchmark directory.

About

gossip: Efficient Communication Primitives for Multi-GPU Systems

Resources

License

Stars

Watchers

Forks

Packages

No packages published