You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since variant allele frequency (VAF), allele depth (AD), depth (DP) is the fundamental information to interpret NGS data, but unfortunately it is not readily available in the outputs from Strelka.
If there is no plan to incorporate such findings in the outputs, can you provide the bash script as to extract such information and output in a separate column, or directly filter the variants based on the values of VAF, AD and DP?
bcftools can filter such information directly if such information is available directly from INFO or FORMAT field
e.g. bcftools filter -i FORMAT/AF[1] >0.05 input.vcf.gz
But unfortunately extracting information is extremely complicated as stated in the manual:
refCounts = Value of FORMAT column $REF + “U” (e.g. if REF="A" then use the value in FOMRAT/AU)
altCounts = Value of FORMAT column $ALT + “U” (e.g. if ALT="T" then use the value in FOMRAT/TU)
tier1RefCounts = First comma-delimited value from $refCounts
tier1AltCounts = First comma-delimited value from $altCounts
Somatic allele freqeuncy is $tier1AltCounts / ($tier1AltCounts + $tier1RefCounts)
How exactly can I implement the above pseudocode in the bash script with bcftools or other tools?
I have searched hundreds of webpage, and there is no one giving solutions or even discussing it!!
The text was updated successfully, but these errors were encountered:
@maximus3219 I don't know if you are still interested but I had the same problem last week. I wrote a Python script to calculate VAF for indels and snvs from the somatic VCF files. I couldn't get it done with bcftools either, but here is the script. It calculates the VAF for each variant and includes this information for the normal and tumour samples in the final output vcf. Usage instructions are on the README.md
Since variant allele frequency (VAF), allele depth (AD), depth (DP) is the fundamental information to interpret NGS data, but unfortunately it is not readily available in the outputs from Strelka.
If there is no plan to incorporate such findings in the outputs, can you provide the bash script as to extract such information and output in a separate column, or directly filter the variants based on the values of VAF, AD and DP?
bcftools can filter such information directly if such information is available directly from INFO or FORMAT field
e.g. bcftools filter -i FORMAT/AF[1] >0.05 input.vcf.gz
But unfortunately extracting information is extremely complicated as stated in the manual:
refCounts = Value of FORMAT column $REF + “U” (e.g. if REF="A" then use the value in FOMRAT/AU)
altCounts = Value of FORMAT column $ALT + “U” (e.g. if ALT="T" then use the value in FOMRAT/TU)
tier1RefCounts = First comma-delimited value from $refCounts
tier1AltCounts = First comma-delimited value from $altCounts
Somatic allele freqeuncy is $tier1AltCounts / ($tier1AltCounts + $tier1RefCounts)
How exactly can I implement the above pseudocode in the bash script with bcftools or other tools?
I have searched hundreds of webpage, and there is no one giving solutions or even discussing it!!
The text was updated successfully, but these errors were encountered: