Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A100集群测试2机16卡吞吐低于单机8卡吞吐 #206

Open
iamweizhi opened this issue Jul 6, 2021 · 1 comment
Open

A100集群测试2机16卡吞吐低于单机8卡吞吐 #206

iamweizhi opened this issue Jul 6, 2021 · 1 comment

Comments

@iamweizhi
Copy link

A100集群测试2机16卡加速比为0.35,测试吞吐低于单机8卡

@strint
Copy link

strint commented Jul 6, 2021

建议先用英伟达profile工具看下瓶颈

英伟达profile工具

地址

https://developer.nvidia.com/nsight-systems

nsys使⽤

nsys有多种使⽤⽅法,为了测量统计埋点的时间,简单直观的使⽤⽅式直接运⾏如下指令:

nsys profile --stats=true \
python3 cnn_benchmark/of_cnn_benchmarks.py \
 --gpu_num_per_node=1 \
 --model="alexnet" \
 --batch_size_per_device=8 \
 --iter_num=20 \
 --learning_rate=0.01 \
 --optimizer="sgd" \
 --loss_print_every_n_iter=1 \
 --data_dir="/dataset/imagenet_227/train/32"

需要注意的是,训练的step不能太多,⽐如只训练20步,不然⽂件太⼤,后⾯就打不开了。
nsys输出了两个⽂件分别以 qdrep 和 sqlite 为后缀。 qdrep 能够⽤NVIDIA Nsight Systems打开。

nccl tests

nccl tests是⼀个测试集群⽹速的⼯具,⼀般需要⾃⼰编译运⾏,git地址如下:https://github.com/NVIDIA/nccl-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants