We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A100集群测试2机16卡加速比为0.35,测试吞吐低于单机8卡
The text was updated successfully, but these errors were encountered:
建议先用英伟达profile工具看下瓶颈
https://developer.nvidia.com/nsight-systems
nsys有多种使⽤⽅法,为了测量统计埋点的时间,简单直观的使⽤⽅式直接运⾏如下指令:
nsys profile --stats=true \ python3 cnn_benchmark/of_cnn_benchmarks.py \ --gpu_num_per_node=1 \ --model="alexnet" \ --batch_size_per_device=8 \ --iter_num=20 \ --learning_rate=0.01 \ --optimizer="sgd" \ --loss_print_every_n_iter=1 \ --data_dir="/dataset/imagenet_227/train/32"
需要注意的是,训练的step不能太多,⽐如只训练20步,不然⽂件太⼤,后⾯就打不开了。 nsys输出了两个⽂件分别以 qdrep 和 sqlite 为后缀。 qdrep 能够⽤NVIDIA Nsight Systems打开。
nccl tests是⼀个测试集群⽹速的⼯具,⼀般需要⾃⼰编译运⾏,git地址如下:https://github.com/NVIDIA/nccl-tests
Sorry, something went wrong.
No branches or pull requests
A100集群测试2机16卡加速比为0.35,测试吞吐低于单机8卡
The text was updated successfully, but these errors were encountered: