Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High number of bacterial genes in vRhyme bins #19

Open
ShailNair opened this issue Jan 11, 2023 · 6 comments
Open

High number of bacterial genes in vRhyme bins #19

ShailNair opened this issue Jan 11, 2023 · 6 comments

Comments

@ShailNair
Copy link

HI,

I used vRhyme with the default settings on my assembled contigs. I concatenated contigs from the same bins into a single fasta file using the provided bin sequences.py script.Later, I used CheckV ( with prodigal -m option enabled) on the concatenated fasta file. Strangely, CheckV analysis revealed that a large number of the bins contained an extremely high number of host (bacterial) genes, accounting for more than 50% (many contigs with more than 90%) of the total number of genes. Surprisingly, CheckV indicates that many of these bins are complete and without contamination. However, the contig/genome size (many of them in the 500kb-4 Mb range) is too large to be considered a virus/phage. Is it normal to have this kinda results? I have attached the Checkv quality summary file for your reference.
quality_summary.txt

@KrisKieft
Copy link
Member

Hi,

Did you input predicted virus sequences or VLP sequencing contigs, or a whole assembly including microbes?

@ShailNair
Copy link
Author

Whole assembly including microbes

@KrisKieft
Copy link
Member

KrisKieft commented Jan 17, 2023

vRhyme does not function to identify viral sequences and expects the input to be viruses. This is the source of microbial contamination since microbes were binned. Please see the "important note" in the program description section of the README.

@ShailNair
Copy link
Author

ShailNair commented Jan 17, 2023

I see.. Probably, i misunderstood the Description section which said that ''vRhyme can take an entire metagenome as input, but the performance for a whole metagenome has not been fully evaluated.''. I will re-run it with identified viral contigs.

Thank you.

@KrisKieft
Copy link
Member

You can bin sequences as you did and the next step would be filtering out non-viral bins. This can help to recruit viral fragments into bins that otherwise cannot be predicted. But keep in mind that microbes will be binned too, leading to the indicated CheckV results and large bin sizes.

@ShailNair
Copy link
Author

yes, got it. better first I will do the viral contigs prediction and then run vRhyme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants