Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpretation of results #5

Open
michoug opened this issue Mar 23, 2022 · 3 comments
Open

Interpretation of results #5

michoug opened this issue Mar 23, 2022 · 3 comments
Labels
question Further information is requested

Comments

@michoug
Copy link

michoug commented Mar 23, 2022

Hi,
I tried your tool on one of my datasets where I got viral contigs with Vibrant then I ran VRhyme and compared the results obtained after the dereplication part before or after generating vMAGs.

Here are before generating vMAGs

checkv_quality n mean sum max
Complete 557 46179.5 25721993 373392
High-quality 413 44008.8 18175622 275626

Here is after

checkv_quality n mean sum max
Complete 437 48556.5 21219180 373392
High-quality 514 46641.6 23973794 387939

Checked the quality with checkV and only selected best quality "viruses"
Where mean is the mean of contig length, sum is the total length of all contigs and max is the maximum size of the biggest “virus”.

So the average length is higher but the "contamination" is also higher?
Any input on these results?
Best
Greg

@KrisKieft
Copy link
Member

Hi,

I have a couple questions before I can give more thoughts on this.

  • Are you estimating higher contamination due to the drop in complete genomes after binning, assuming the drop is due to complete genomes being incorrectly binned with other scaffolds?
  • Are you running checkV after binning on both the bins and unbinned contigs?
  • Is this an aggregation of data from multiple samples binned, or binning multiple samples combined at one time? 557 complete genomes is a lot to get from one sample.

@michoug
Copy link
Author

michoug commented Mar 23, 2022

Hi,

Out of the 557 "complete" contigs, 137 were clustered with others in vRhyme.
Yes, I'm running checkV after binning on both the bins and unbinned contigs.
And yes, it's an aggregation of data from multiple samples binned

Greg

@KrisKieft
Copy link
Member

Are these complex virome samples? I'm currently working on an updated v1.1.0 that should address some of these issues. In addition to updates that improve precision, I implemented a step to remove complete (circular) sequences before binning. The update should be available within the next 1-2 weeks.

@KrisKieft KrisKieft added the question Further information is requested label May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants