Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

NNI not running anymore without error messages when CPU reached 100% once #5775

Open
BitCalSaul opened this issue May 2, 2024 · 0 comments

Comments

@BitCalSaul
Copy link

Describe the issue:
Once the CPU utilization reached 100% once, even though NNI will finish the running trials but will not run the remaining trials.

Environment:

  • NNI version: 3.0
  • Training service (local|remote|pai|aml|etc): local
  • Client OS: ubuntu 22.04, 20.04
  • Server OS (for remote mode only):
  • Python version: 3.8
  • PyTorch/TensorFlow version: 2.1.0
  • Is conda/virtualenv/venv used?: conda
  • Is running in Docker?: no

How to reproduce it?:
You could run a task that consumes CPU resources across multiple trials simultaneously, and you will observe this issue.

I think this issue is as the same as this one #965 .

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant