-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The model is loaded repeatedly #692
Comments
linux tinygrad |
same problem!
In AWS |
These looks like two different models. |
In my experiment, it's happened on every model I tried on linux instance. cluster with
|
I just started seeing this today as well (on Jetson Orin Nano), no idea why. Trying to load llama 3.2 1b model took up all (8gb, but about 7.4 available) memory and locked up the device each time I attempted. After the first locked up, I watched the loading output and kept seeing it load all the way, then just start doing it again. I noticed the model it was trying to use started with |
I'm experiencing the same issue on model loaded twice while running Tinygrad on Linux (Testing at a605e23). Below is the error log when it runs out of memory: This commit (a174c78) is the latest one that loaded model once. However, after this change (5460529), I started getting a different error related to mlx. But I believe the model load twice might have already existed. |
I also created a log of this occurring with the llama3.2 1b model on Jetson Orin Nano Super 8gb device. If I do not have any debug flags on, ex. neither If I do have debug enabled, as I did in the log below, it took a good bit longer to load, but the device didn't lock up during the subsequent loading, which I found interesting. You can see on line 928: It reaches 100% with Then at line 1546 100% again and I also noticed that after the initial model loading, there are these warnings a bit further down: WARNING: not loading output.weight
WARNING: not loading freqs_cis
loaded weights in 5348.72 ms, 2.47 GB loaded at 0.46 GB/s Then it starts loading the model again right after that. Here is the full log. https://gist.github.com/MostHated/9aa7845d95aba47baf27ab3a9080e184 |
The model was loaded twice, and the 1B llama model took up more than 10g of ram, is this normal
新建 文本文档.txt
The text was updated successfully, but these errors were encountered: