-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JukeBox Augmentation Triggers CUDNN Error #27
Comments
Hello. I'm not 100% sure what's causing this. Containerizing Jukebox GPU support via Docker has unfortunately always been quite brittle. What version of CUDA / cuDNN are you on on the host machine? One possible thing to try is upgrading (or downgrading?) these packages. I think I was using CUDA 11 and CUDNN 8 on the host machine back last I ran feature extraction from Jukebox |
I'm getting the exact same error message; and I have CUDA 11 and CUDNN 8 on the host machine (Ubuntu 22) as you can see in the below terminal output. FWIW, other folks seemed to have ran into this issue. I'd be happy to make a PR with an updated README if we can figure this error out. I think we're close 🤞 and I think it'd help lots of other folks to find a solution
cc @elloza and @tanchihpin0517 from the other post so we can consolidate the problem+solution in 1 thread! As an aside, @chrisdonahue, thank you SO much for making this code open source 😄 I can tell you put a lot of time making the code polished (with the dockerization, comprehensive README, scripts and etc.) But I did have a high-level question. I read over the paper, and I understood Jukebox to only be used as part of the training step; I wasn't aware of Jukebox being used in the inference step (I very well could be missing something here). Did I miss that aspect in the paper? |
Hi @matthewliuswims . Sorry this is still causing problems - I'm not sure how to replicate / debug. Will happily review a PR though if someone is able to resolve... Re: high-level question. Our best model takes features computed from Jukebox (intermediate layer activations) as inputs (as opposed to common features like Mel spectrograms). This behavior is enabled with |
Hi @chrisdonahue . I inspected the Docker image sheetsage-dev you provided and I found this line: |
Hi @XaryLee - I'm using
I'm curious from your own experience, how much better the results (qualitatively) were for you compared to running the model without the |
Thanks for your information @matthewliuswims . To my knowledge, the use of Jukebox features as representation in Sheet Sage significantly improves the quality of results compared to its non-Jukebox version. This improvement is observed in various aspects, including pitch, rhythm, and more. So I think for high-quality transcription, using Jukebox is necessary. And I attempted to re-build the Docker image using the Dockerfile provided in the source code but failed. I found that the outdated package versions specified in the Dockerfile might cause compatibility issues with recent GPU architectures. For example, A100 GPUs require a CUDA version >= 11.0 and a torch version >= 1.7. However, in the Dockerfile, the PyTorch version is 1.4 with CUDA 10.4. So, using older GPUs may work, or alternatively, I would appreciate it if the code could be updated to ensure compatibility with the latest architectures. |
As per the above, I was able to get further by running this repo on a g4dn.xlarge which has the Tesla architecture for the GPU 😄. This gets rid of the error that was in the original post. But, the command doesn't have any kind of successful output (not does it give any indication that there was an error)
It seems like the script never gets past this step. The odd part is that it doesn't seem to be actually hanging for that long 🤔 and the audio file I gave is only 7 seconds. Sorry to bother you again @XaryLee but since you actually have ran the augmentation successfully, I'm curious if you had ran into this same issue. |
Hi @matthewliuswims . Hmm the program exits without any error reporting is indeed an unusual issue. I have never encountered this before. But based on my experience, during the initial run of the program, it will download the Jukebox and Sheetsage model from the cloud, which are approximately 10GB and may take some time depending on your server's Internet speed. I recall it took me about 10 mins. And the time cost is independent of the length of the input song. Perhaps with some patient waiting, the program may run successfully. Additionally, I am curious about the number of CPU cores on your server. I am running on an eight-core CPU device, and according to one of my research partners, the program cannot run on a four-core CPU, although I have not personally tested this. Hope these can help you with the problem.
|
Hi @matthewliuswims . I hope your issue has been resolved. I have recently conducted a thorough examination of the Dockerfile and the environment upon which Sheet Sage depends. Subsequently, I updated the Dockerfile, rebuilt the Docker image, and have successfully run the Jukebox version of Sheet Sage on my A100 GPU-equipped machine. In the refreshed Dockerfile, I have upgraded several outdated libraries and resolved some version conflicts present in the open-source code. Additionally, I have revised the shell scripts to accommodate these changes. Presently, I have updated my forked repository with these modifications and have submitted a pull request. I would greatly appreciate it if the maintainers @chrisdonahue, could review my recent pull request. |
Greetings!
Thank you for releasing this repo. We were trying to do an inference using the GPU (JukeBox) version on an EDM dataset of ours. We rent a bare-metal machine on Featurize, with an RTX A4000 (16G memory). However, it produced the following error which seemed to have something to do with CUDNN.
We would really appreciate it if you could provide any advice. Thanks again :)
The text was updated successfully, but these errors were encountered: