-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant Runtime Increase with Medaka v2.0.1 CPU compared to GPU #547
Comments
Hi @wshropshire, You are not the first person to report this discrepancy. I was not able to reproduce this until noticing that users were installing medaka through conda. It seems likely that the pytorch packages coming through conda are not as optimised as those that get installed through Python's pip package manager. I'm currently running through an Arabidopsis assembly in pararallel having installed medaka 2.0.1 with both conda and pip. The pip installed setup is running around 1.6x faster. I have not noticed as large a discrepancy between versions 1.21.1 and 2.0.1 when installing with pip. |
Additionally I'm not convinced the pytorch package available from conda-forge will correctly use a GPU. Certainly for me medaka's logging reports:
on startup, whereas with a pip installation I have:
|
Hey Chris, I confirmed that running the pip3 install of medaka v2.0.1 runs significantly faster using both cpu and gpu compared to the conda install. May want to discourage persons from using the conda install in the README until the compute time issue is resolved. While I have you, I see that dorado polish v0.9.1 supports bacterial genome polishing now. Would you suggest to move over to this software for polishing? |
Using Dorado isn't quite our official recommendation, only because we haven't switched our Nextflow workflows over to using at (which causes people to question why we don't ourselves use the recommended tool 😬🤣). As a user of standalone medaka you are welcome to test Dorado and provide feedback. It will be the official recommendation in the future. |
Okay, I'm certain I'll be testing it out soon |
Describe the bug
A clear and concise description of what the bug is including the command that you have run.
Logging
Please attach any relevant logging messages. (Use ``` before and after code blocks).
Environment (if you do not have a GPU, write No GPU):
Installation method [from github source, pypi (pip install), conda]
Conda
OS: [e.g. Ubuntu 16.04]
RHEL 7.9 operating system, Bright Cluster Manager, IBM Spectrum LSF (job scheduler)
medaka version (can be found by running
medaka --version
)v2.0.1
GPU model
Nvidia driver version
NVIDIA V100
Additional context
Add any other context about the problem here.
I have noticed that runnning medaka_consensus on flye output has taken significantly longer since I bumped up my version of Medaka from v1.8.0 to v2.0.1. Runtimes that took 30-60 minutes take now well over 5 hours. I tested a sample with ~50X coverage depth using a CPU v. GPU node on our HPC and found that the GPU took 2.5 hours compared to CPU (7 hours) at this particular stage (screen shot attached). Older stdout of this process seemed to take much less time with tensorflow compared to the pytorch version of medaka.
The text was updated successfully, but these errors were encountered: