You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried running a modified version of the README suggestion for HLA-TAPAS (code below), which gave the following example output for one individual for HLA-A (code to reproduce below as as well as log). Can you provide guidance on how to interpret this, if I'm specifically interested in 4-digit HLA types?
So these are the VCF entries for the first imputed sample for HLA-A
My intuition is to a) disregard the 2-digit rows b) collapse >4 digit to 4-digit types c) sum 4-digit types. This gives something like the following, if I then sum by the second entry in (the DS = dosage) of the VCF
My interpretation is that the most likely genotype for this sample would be HLA_A*01:01, HLA_A*24:02, with some sort of confidence (here perhaps confidence = (1.04 + 0.62) / (sum of the rest) = 0.83? If this is right, is there a recommended default confidence threshold (0.5? 0.9?)
Can you also confirm, for the imputation, that the only thing that matters from the reference samples is the *.phased.vcf.gz file? This is what this code seems to show
If there is some sort of parser that already does this that you can point me to that'd be great (e.g. takes as input the phased output file, and generates something like one row per sample, one column per HLA-region, with entry the most likely single genotype, and a confidence measure)
Hi @rwdavies, sorry about the the slow response - I have only recently realise that i need to turn the watch on before getting notified with all open issues. You are right in interpreting the result. As for filtering - we usually filter out variants with imputation r2 < 0.5 and minor allele frequency < 1% (depends on the size of you dataset).
All required information for downstream association analysis is included in the *.phased.vcf.gz file.
Hi,
I tried running a modified version of the README suggestion for HLA-TAPAS (code below), which gave the following example output for one individual for HLA-A (code to reproduce below as as well as log). Can you provide guidance on how to interpret this, if I'm specifically interested in 4-digit HLA types?
So these are the VCF entries for the first imputed sample for HLA-A
My intuition is to a) disregard the 2-digit rows b) collapse >4 digit to 4-digit types c) sum 4-digit types. This gives something like the following, if I then sum by the second entry in (the DS = dosage) of the VCF
My interpretation is that the most likely genotype for this sample would be
HLA_A*01:01, HLA_A*24:02
, with some sort of confidence (here perhaps confidence = (1.04 + 0.62) / (sum of the rest) = 0.83? If this is right, is there a recommended default confidence threshold (0.5? 0.9?)Can you also confirm, for the imputation, that the only thing that matters from the reference samples is the *.phased.vcf.gz file? This is what this code seems to show
Thanks,
Robbie
Code I used
Output of code run
The text was updated successfully, but these errors were encountered: