Skip to content

Commit

Permalink
Reference data refactor (#991)
Browse files Browse the repository at this point in the history
* begin reference dataset refactor

* hgmd

* basewritetask

* PR commentes

* Reference data refactor feature branch

* remove utils for now

* cadd

* hgmd selects

* import

* minor things

* config enum attribute

* config out of enum, get_ht, for_reference_genome_dataset_type

* return table

* kwargs

* tiny changes

* frozenset

* cadd filtering

* changes to the cadd script that will be moot?

* add some gnomad datasets

* hacking on clinvar

* ruff

* add 38 dbnsfp config

* get cadd from dbnsfp

* get primate ai and mpc from dbnsfp

* Cleanup

* cleanup

* Update misc.py

* Update clinvar.py

* Update clinvar.py

* Update clinvar_test.py

* poach some files from bens pr

* Update definitions.py

* first pass enums

* use liftover for 37 data instead of old version

* remove cadd

* Add clinvar path (#961)

* Add clinvar path

* Fix missing requires bug

* remove dataset type from filter contigs

* Move filter_contigs to "get_ht" so its generalizable

* gnomad_exomes unit tests

* all enum selects helper

* gnomad_genomes tests

* clean up

* Generalize enum annotation

* fix tempdir usage

* add topmed

* Benb/clinvar refactor (#960)

* hacking on clinvar

* ruff

* Cleanup

* cleanup

* Update misc.py

* Update clinvar.py

* Update clinvar.py

* Update clinvar_test.py

* Update definitions.py

* Add clinvar path (#961)

* Add clinvar path

* Fix missing requires bug

* remove dataset type from filter contigs

* Move filter_contigs to "get_ht" so its generalizable

* Generalize enum annotation

* Add back enum select fields

* remove unnecessary line

* clean up

* ruff

* wip hgmd test

* ruff

* share enum transmute

* done

* notebook

* ruff

* linter for now

* first pass splice ai

* Mitimpact

* Add the enum 🤦

* bad typo

* gnomad_mito, gnomad_non_coding_constraint, local_constraint_mito, screen

* gnomad_qc typo

* module_file_name

* gnomad_genomes CONFIG deduplication

* zipfile helper

* MITIMPACT (#965)

* Mitimpact

* Add the enum 🤦

* bad typo

* use helper for zip download

* pr feedback

* ruff

* ruff

* ruff

* ruff

* unshare extracted filename

* clean up transmute

* ruff

* trailing comma

* maybe clearer gnomad

* fix property syntax

* gnomad_mito selects

* use hanas enum notation

* shared import vcf helper

* proper splice ai parsing

* valid paths

* ruff

* ruff

* mitomap

* add coment

* merge

* screenums

* explicit handling for already mapped enums

* add tests

* ruff

* ruff

* ruff

* min_partitions

* simplify mitomap

* jupyter

* hmtvar reference dataset (#971)

* hmtvar reference dataset

* ruff

* eigen reference dataset (#970)

* eigen reference dataset

* Fix typo

---------

Co-authored-by: Benjamin Blankenmeister <[email protected]>

* Exac reference dataset (#969)

* add exac reference dataset

* use vcf

* remove comment

---------

Co-authored-by: Benjamin Blankenmeister <[email protected]>

* helix mito (#972)

* split genomes and exomes again

* fix screen

* screen and gnomad non coding

* unzip local_constraint_mito

* Fix bugs related to nested fields/split_multi (#973)

* helix mito

* Fix split_multi and select bugs

* fixme

* ruff

* Add test for exac

* Add test for split multi check

* Add test for `UpdatedReferenceDataset` and `UpdatedReferenceDatasetQuery` (#974)

* helix mito

* Fix split_multi and select bugs

* fixme

* ruff

* get test working

* fix bugs

* bug fixes

* Bugfixes

* Refactor tests

* Add comment

* quixotic

* missed one

* Add test for exac

* Add test for split multi check

* fix zip write

* Benb/add missing queries (#977)

* Add missing datasets

* Fix reference

* Add test

* lint

* remove complete() (#979)

* remove complete()

* ruff

* Fix mock

* Benb/update gnomad qc crdq with updated format (#980)

* remove complete()

* ruff

* Fix mock

* Replace the gnomad_qc crdq

* Fix test

* format

* Remove ht and tests (#981)

* remove complete()

* ruff

* Fix mock

* Replace the gnomad_qc crdq

* Fix test

* format

* Remove ht and tests

* Updated `gnomad_coding_and_noncoding` test table. (#982)

* remove complete()

* ruff

* Fix mock

* Replace the gnomad_qc crdq

* Fix test

* format

* Remove ht and tests

* Change validation table reference

* Update README.txt

* remove crdq reference

* Update mock

* ruff

* Fix imports

* remove mock

* fixme

* Change rsync to new path (#983)

* Remove `version` from reference dataset query path (#984)

* Change rsync to new path

* Remove version from reference dataset query path

* Make rdq dataset type specific (#985)

* Make rdq dataset type specific

* Add test for mito

* Add pathogenicities to clinvar

* tweak

* update annotations with updated reference datasets refactor (#978)

* first pass update vat

* merge feature

* fix the diff for now

* include_queries

* interval ht

* tests

* exclude

* nicer

* fix inteval test

* split fn

* eigen test

* clinvar wip

* hgmd

* clinvar

* gnomad genomes and exomes

* delete

* 38 snv_indel done

* mito tests

* done with tests?

* custom_select

* fields test

* disable write new samples tests for now

* working on tests

* update update vat with new samples tests

* extra file

* other skipped test

* make select and filter similar

* tweak

* rename path and locus/interval filtering

* make select and filter similar (#988)

* make select and filter similar

* tweak

* Cleanest set diff

* Finish off

* Tests passing!

* ruff

* ruff

* Change the params

* Fix params

* params

* More clinvar mocking

* hardcode these

---------

Co-authored-by: Benjamin Blankenmeister <[email protected]>
Co-authored-by: Benjamin Blankenmeister <[email protected]>

* delete old reference data code 😝  (#990)

* first pass update vat

* merge feature

* fix the diff for now

* include_queries

* interval ht

* tests

* exclude

* nicer

* fix inteval test

* split fn

* eigen test

* clinvar wip

* hgmd

* clinvar

* gnomad genomes and exomes

* delete

* 38 snv_indel done

* mito tests

* done with tests?

* custom_select

* fields test

* disable write new samples tests for now

* working on tests

* update update vat with new samples tests

* extra file

* other skipped test

* make select and filter similar

* tweak

* rename path and locus/interval filtering

* make select and filter similar (#988)

* make select and filter similar

* tweak

* Cleanest set diff

* Finish off

* Tests passing!

* ruff

* ruff

* Change the params

* Fix params

* params

* More clinvar mocking

* hardcode these

* delete a bunch of stuff

* ruff

* remove rdc and crdq

* delete v02

* remove comment references to deleted file

* last test

---------

Co-authored-by: Benjamin Blankenmeister <[email protected]>
Co-authored-by: Benjamin Blankenmeister <[email protected]>

---------

Co-authored-by: Julia Klugherz <[email protected]>
Co-authored-by: Hana Snow <[email protected]>
  • Loading branch information
3 people authored Nov 25, 2024
1 parent b3e996a commit f89a6d3
Show file tree
Hide file tree
Showing 895 changed files with 4,008 additions and 6,890 deletions.
8 changes: 0 additions & 8 deletions download_and_create_reference_datasets/v02/create_ht__cadd.py

This file was deleted.

This file was deleted.

This file was deleted.

14 changes: 0 additions & 14 deletions download_and_create_reference_datasets/v02/create_ht__eigen.py

This file was deleted.

14 changes: 0 additions & 14 deletions download_and_create_reference_datasets/v02/create_ht__mpc.py

This file was deleted.

This file was deleted.

14 changes: 0 additions & 14 deletions download_and_create_reference_datasets/v02/create_ht__topmed.py

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Loading

0 comments on commit f89a6d3

Please sign in to comment.