You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is my software environment。
ubuntu 22.04.1
torch 1.14.0.dev20221027+cu117 pypi_0 pypi
torchaudio 0.14.0.dev20221027+cu117 pypi_0 pypi
torchvision 0.15.0.dev20221027+cu117
mmselfsup 0.10.0
cuda 11.7
nvidia driver 520.06
whenever i pass“bash tools/dist_train.sh configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-100e_in1k.py 4”,1~2 minutes, the machine will restart。
Run 3 graphics cards, the machine no restart.
I first suspected a hardware failure related,But I pass things like stress-ng, gpu__burn,Let the machine's IO, graphics card, and CPU all reach the maximum load, but the machine will not restart,
So I want to ask for help, is it a related problem in the code
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This is my software environment。
ubuntu 22.04.1
torch 1.14.0.dev20221027+cu117 pypi_0 pypi
torchaudio 0.14.0.dev20221027+cu117 pypi_0 pypi
torchvision 0.15.0.dev20221027+cu117
mmselfsup 0.10.0
cuda 11.7
nvidia driver 520.06
whenever i pass“bash tools/dist_train.sh configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-100e_in1k.py 4”,1~2 minutes, the machine will restart。
Run 3 graphics cards, the machine no restart.
I first suspected a hardware failure related,But I pass things like stress-ng, gpu__burn,Let the machine's IO, graphics card, and CPU all reach the maximum load, but the machine will not restart,
So I want to ask for help, is it a related problem in the code
Beta Was this translation helpful? Give feedback.
All reactions