-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What if the ref allele is a common SNP? #215
Comments
What distinguishes a somatic from germline variant in the GT field. A variant with both should look something like:
The The GT field is actually very important to differentiate between more than one sample, so it should be added to the |
We tried to only put required/non-optional fields on the |
Oh, @joaoe, I might have misunderstood. Do you want Varcode to parse alts such as "A,T" and then use the GT to determine each samples value for the alt field? |
I don't know how that would be implemented since current varcode does not provide a way to know which sample the variant applies to, in case the VCF file has multiple samples. Varcode just splits the ALT field and returns the same number of variants as ALT alleles, with the info from CHROM, POS, REF and ALT. VCF files produced with a germline and cancer sample will typically contain both samples, so if you want varcode to be useful for these cases, there should be a way to tell to which sample the variant applies to. So, for each column after Having the GT is actually not directly useful as it still has to be parsed. Example:
If the GT field is not there, then parse as before, This would change the behavior of the API so it should be behind an option to load_vcf. It can get complicated further. If the VCF file was 'phased', then the GT field is split by a pipe PS: A example containing multiple variants grouped by mouse strain. ftp://ftp.ncbi.nih.gov/snp/organisms/mouse_10090/VCF/genotype/SC_MOUSE_GENOMES.genotype.vcf.gz |
I was wondering how SNPs should be handled. If one uses the germline reference (a common SNP in this case) as ref, the variant effect prediction runs into the below error. How is one supposed to encode such a somatic variant? Using two distinct variants, one for the SNP and one for the somatic variant?
The text was updated successfully, but these errors were encountered: