Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Runtime Increase with Medaka v2.0.1 CPU compared to GPU #547

Open
wshropshire opened this issue Dec 23, 2024 · 2 comments
Open
Labels

Comments

@wshropshire
Copy link

Describe the bug
A clear and concise description of what the bug is including the command that you have run.

Logging
Please attach any relevant logging messages. (Use ``` before and after code blocks).

Environment (if you do not have a GPU, write No GPU):

  • Installation method [from github source, pypi (pip install), conda]
    Conda

  • OS: [e.g. Ubuntu 16.04]
    RHEL 7.9 operating system, Bright Cluster Manager, IBM Spectrum LSF (job scheduler)

  • medaka version (can be found by running medaka --version)
    v2.0.1

  • GPU model

  • Nvidia driver version
    NVIDIA V100

Additional context
Add any other context about the problem here.
I have noticed that runnning medaka_consensus on flye output has taken significantly longer since I bumped up my version of Medaka from v1.8.0 to v2.0.1. Runtimes that took 30-60 minutes take now well over 5 hours. I tested a sample with ~50X coverage depth using a CPU v. GPU node on our HPC and found that the GPU took 2.5 hours compared to CPU (7 hours) at this particular stage (screen shot attached). Older stdout of this process seemed to take much less time with tensorflow compared to the pytorch version of medaka.

MB9846_medaka_cpu MB9846_medaka_gpu
@cjw85
Copy link
Member

cjw85 commented Jan 6, 2025

Hi @wshropshire,

You are not the first person to report this discrepancy. I was not able to reproduce this until noticing that users were installing medaka through conda. It seems likely that the pytorch packages coming through conda are not as optimised as those that get installed through Python's pip package manager.

I'm currently running through an Arabidopsis assembly in pararallel having installed medaka 2.0.1 with both conda and pip. The pip installed setup is running around 1.6x faster.

I have not noticed as large a discrepancy between versions 1.21.1 and 2.0.1 when installing with pip.

@cjw85
Copy link
Member

cjw85 commented Jan 6, 2025

Additionally I'm not convinced the pytorch package available from conda-forge will correctly use a GPU. Certainly for me medaka's logging reports:

[17:10:25 - Predict] Model device: cpu

on startup, whereas with a pip installation I have:

[17:02:07 - Predict] Model device: cuda:0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants