Significant Runtime Increase with Medaka v2.0.1 CPU compared to GPU #547

wshropshire · 2024-12-23T18:16:11Z

Describe the bug
A clear and concise description of what the bug is including the command that you have run.

Logging
Please attach any relevant logging messages. (Use ``` before and after code blocks).

Environment (if you do not have a GPU, write No GPU):

Installation method [from github source, pypi (pip install), conda]
Conda
OS: [e.g. Ubuntu 16.04]
RHEL 7.9 operating system, Bright Cluster Manager, IBM Spectrum LSF (job scheduler)
medaka version (can be found by running medaka --version)
v2.0.1
GPU model
Nvidia driver version
NVIDIA V100

Additional context
Add any other context about the problem here.
I have noticed that runnning medaka_consensus on flye output has taken significantly longer since I bumped up my version of Medaka from v1.8.0 to v2.0.1. Runtimes that took 30-60 minutes take now well over 5 hours. I tested a sample with ~50X coverage depth using a CPU v. GPU node on our HPC and found that the GPU took 2.5 hours compared to CPU (7 hours) at this particular stage (screen shot attached). Older stdout of this process seemed to take much less time with tensorflow compared to the pytorch version of medaka.

The text was updated successfully, but these errors were encountered:

cjw85 · 2025-01-06T15:30:09Z

Hi @wshropshire,

You are not the first person to report this discrepancy. I was not able to reproduce this until noticing that users were installing medaka through conda. It seems likely that the pytorch packages coming through conda are not as optimised as those that get installed through Python's pip package manager.

I'm currently running through an Arabidopsis assembly in pararallel having installed medaka 2.0.1 with both conda and pip. The pip installed setup is running around 1.6x faster.

I have not noticed as large a discrepancy between versions 1.21.1 and 2.0.1 when installing with pip.

cjw85 · 2025-01-06T17:13:25Z

Additionally I'm not convinced the pytorch package available from conda-forge will correctly use a GPU. Certainly for me medaka's logging reports:

[17:10:25 - Predict] Model device: cpu

on startup, whereas with a pip installation I have:

[17:02:07 - Predict] Model device: cuda:0

wshropshire · 2025-01-29T17:29:03Z

Hey Chris,

I confirmed that running the pip3 install of medaka v2.0.1 runs significantly faster using both cpu and gpu compared to the conda install. May want to discourage persons from using the conda install in the README until the compute time issue is resolved.

While I have you, I see that dorado polish v0.9.1 supports bacterial genome polishing now. Would you suggest to move over to this software for polishing?

cjw85 · 2025-01-29T17:37:37Z

Using Dorado isn't quite our official recommendation, only because we haven't switched our Nextflow workflows over to using at (which causes people to question why we don't ourselves use the recommended tool 😬🤣).

As a user of standalone medaka you are welcome to test Dorado and provide feedback. It will be the official recommendation in the future.

wshropshire · 2025-01-29T21:37:43Z

Okay, I'm certain I'll be testing it out soon

wshropshire added the bug label Dec 23, 2024

wshropshire closed this as completed Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant Runtime Increase with Medaka v2.0.1 CPU compared to GPU #547

Significant Runtime Increase with Medaka v2.0.1 CPU compared to GPU #547

wshropshire commented Dec 23, 2024

cjw85 commented Jan 6, 2025

cjw85 commented Jan 6, 2025

wshropshire commented Jan 29, 2025

cjw85 commented Jan 29, 2025

wshropshire commented Jan 29, 2025

Significant Runtime Increase with Medaka v2.0.1 CPU compared to GPU #547

Significant Runtime Increase with Medaka v2.0.1 CPU compared to GPU #547

Comments

wshropshire commented Dec 23, 2024

cjw85 commented Jan 6, 2025

cjw85 commented Jan 6, 2025

wshropshire commented Jan 29, 2025

cjw85 commented Jan 29, 2025

wshropshire commented Jan 29, 2025