Skip to content

Latest commit

 

History

History
27 lines (24 loc) · 1.75 KB

README.md

File metadata and controls

27 lines (24 loc) · 1.75 KB

Calculate polygenic risk score

1. Preprocessing data

1.1 Weights files

Check prs scoring files (for example, files from PGS Catalog) for errors:
script preprocess pgs/preprocess_prs_file.py,
see --help for help with arguments,
'PGS001833.txt' example in data/.

1.2 VCF files

Check your vcf file for SNPs IDs - rsIDs. To get more info to your vcf:

  • VCF files can be annotated with oakvar
  • VCF files can be annotated with bcftools, source of annotation can be selected from NCBI Human Variation Sets in VCF Format (ClinVar, dbSNP)
    See launch example in preprocess/annotate_vcf.sh

2. Calculate polygenic risk score

Check header of script calc_and_visualize/calc_prs.py: file paths to vcf and prs scoring file must be specified, to use the optional enable/disable some SNPs, the paths to these lists can be set.

3. Visualize

To draw bullet plot fill the data (custom_snp_number, overall_snp_number variables) in header calc_and_visualize/draw_bullet_plot.py and run.

3. Calculate PRS with plink

We can also perform additional check with plink.

3.1 Plink

  • Get plink binary files by running the script preprocess/plink convert/get_plink_fileset_bin.sh.
    If get into troubles with getting binary fileset, some intermediate steps may help. See plink/get_plink_fileset_pgen.sh, plink/pgen_to_bed.sh, or just try to fix initial .vcf file.
  • Run calc_and_visualize/run_plink_prs.sh

3.2 Plink2

  • Get plink pfiles by running the script preprocess/plink convert/get_plink_fileset_pgen.sh.
  • Run calc_and_visualize/run_plink2_prs.sh plink/run_plink_prs.sh