- Numba parallel - a way to develop threaded parallel code in python without GIL
- joblib - a library of multiprocessing primitives similar to mp.Pool, but with some extra conveniences
- BytePS paper - https://www.usenix.org/system/files/osdi20-jiang.pdf
- Alternative lecture: Parameter servers from CMU 10-605 - here
- Alternative seminar: python multiprocessing - playlist
- Python multiprocessing docs (pay attention to
fork
vsspawn
!) - PyTorch Distributed tutorial
- Collective communication protocols in NCCL
- There's a ton of links on the slides, please check the PDF.