Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Is this for binning contigs from the same viral genome or clustering viral genomes? #27

Open
jolespin opened this issue Oct 18, 2023 · 3 comments

Comments

@jolespin
Copy link

The reason why I'm asking is because VirFinder, geNomad, VirSorter(2), etc. operate on individual contigs then those results are usually fed into CheckV to determine how complete/contaminated the virus is (similar to CheckM and BUSCO). So let's say there are 3 contigs that are all 100% complete and 0% contaminated determined by CheckV. If those are binned together, would that bin be considered a metagenome-assembled genome or would it be a pangenome since the contamination would be high based on the notes above.

@cody-mar10
Copy link
Member

vRhyme is for binning contigs sequenced from the same genome but assembled in various fragments. It is not a clustering tool.

In your example with 3 "complete" viral genome fragments, if vRhyme were to bin those genomes together, that would be a viral MAG, not a pan genome. As far as what that means with regard to checkV's quality metrics, it would probably require a more in depth analysis of your data to understand why checkV produces those numbers.

You can also feed checkV viral MAGs, so long as you setup the genes-to-genome file to point all genes from each scaffold in the viral MAG back to a single identifier. That could help you evaluate how checkV views the individual scaffolds vs a binned genome.

@KrisKieft
Copy link
Member

To add to that, vRhyme isn't without error (neither is CheckV). If CheckV is wrong, then vRhyme may be binning 3 contigs of a genome into a vMAG. If vRhyme is wrong then the bin created contains 3 different genomes and is contaminated (or a pangenome). As @cody-mar10 mentioned, it may required a more in depth analysis of your data to resolve this.

@jolespin
Copy link
Author

jolespin commented Dec 5, 2023

@KrisKieft is there a way to calculate coverage separately? I'm seeing cov_table_convert.py but it's not installed with conda.

Is it possible to use coverm instead? https://github.com/wwood/CoverM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants