Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcf2vcf genotyping using bam input #202

Open
fpbarthel opened this issue Nov 6, 2018 · 1 comment
Open

vcf2vcf genotyping using bam input #202

fpbarthel opened this issue Nov 6, 2018 · 1 comment
Assignees

Comments

@fpbarthel
Copy link

fpbarthel commented Nov 6, 2018

One of the really nice features of vcf2vcf is the genotyping feature. I don't know any other software that really has this capability. I've tried using freebayes force-calling which is very fast but it doesn't work as expected (skips a handful of variants), and bcftools/samtools don't seem to offer this possibility without additional post-processing (as vcf2vcf essentially does)

However, I've found vcf2vcf to be too slow when genotyping a large number of variants. I wonder if this could be made more efficient by using samtools mpileup on a bed file covering all of the features in the VCF file and post-processing the results, rather than calling samtools mpileup separately for each variant in the VCF file?

Unrelated, but wondering why the DP tag is chosen over DP4 to output read depth?

@ckandoth ckandoth self-assigned this Mar 18, 2019
@ckandoth
Copy link
Collaborator

Thanks. I agree - vcf2vcf's genotyping feature should be sped up with a BED file. I use a similar strategy to speed up samtools faidx to pull flanking bps. But that was easy since samtools faidx can take many regions in command-line.

Speeding up vcf2vcf genotyping will need to remain in my backlog. It will be a while till I can get to it. I'll leave this issue open. In the meantime, look at GetBaseCountsMultiSample. It accepts either VCF or MAF as input, and produces a MAF-like output file.

In vcf2vcf output, DP is for total depth. In other VCF specs, DP4 lists 4 values for fwd/rev read counts of REF/ALT alleles, but it does not work for multi-allelic ALTs. So mpileup uses ADF and ADR instead to represent fwd/rev read counts for all alleles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants