Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete old reference data code 😝 #990

Merged
merged 57 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
5f4a7fd
first pass update vat
jklugherz Nov 20, 2024
4305efe
Merge remote-tracking branch 'origin/reference-data-refactor' into re…
jklugherz Nov 20, 2024
7ef58ba
merge feature
jklugherz Nov 20, 2024
1ffa3f5
fix the diff for now
jklugherz Nov 20, 2024
c1dd30c
include_queries
jklugherz Nov 20, 2024
045bfcf
interval ht
jklugherz Nov 20, 2024
fcd9785
tests
jklugherz Nov 20, 2024
de3a81e
Merge remote-tracking branch 'origin/reference-data-refactor' into re…
jklugherz Nov 20, 2024
3fbf6ea
exclude
jklugherz Nov 20, 2024
33d63ff
nicer
jklugherz Nov 20, 2024
eb7fae4
fix inteval test
jklugherz Nov 21, 2024
914a95c
split fn
jklugherz Nov 21, 2024
e5a62f3
eigen test
jklugherz Nov 21, 2024
4647f78
clinvar wip
jklugherz Nov 21, 2024
e23384b
hgmd
jklugherz Nov 21, 2024
1987cf6
clinvar
jklugherz Nov 21, 2024
b7a7081
gnomad genomes and exomes
jklugherz Nov 21, 2024
e496c4d
delete
jklugherz Nov 21, 2024
60adacd
38 snv_indel done
jklugherz Nov 21, 2024
572d6d2
mito tests
jklugherz Nov 21, 2024
9ce1c3c
done with tests?
jklugherz Nov 21, 2024
95f467c
merge feature
jklugherz Nov 21, 2024
13d6c72
custom_select
jklugherz Nov 22, 2024
d1abbfb
fields test
jklugherz Nov 22, 2024
1ef92f4
disable write new samples tests for now
jklugherz Nov 22, 2024
d536157
Merge remote-tracking branch 'origin/reference-data-refactor' into re…
jklugherz Nov 22, 2024
bb8c15c
working on tests
jklugherz Nov 22, 2024
09ef322
merge feature
jklugherz Nov 22, 2024
849f6fb
Merge remote-tracking branch 'origin/reference-dataset-update-vat' in…
jklugherz Nov 22, 2024
d7bd77b
update update vat with new samples tests
jklugherz Nov 22, 2024
b825bdb
extra file
jklugherz Nov 22, 2024
2909ec0
other skipped test
jklugherz Nov 22, 2024
725e878
make select and filter similar
bpblanken Nov 22, 2024
9bc0f64
tweak
bpblanken Nov 22, 2024
b9a7049
rename path and locus/interval filtering
bpblanken Nov 23, 2024
2f90b76
make select and filter similar (#988)
bpblanken Nov 23, 2024
5cee9f2
merge
bpblanken Nov 23, 2024
832d6c9
Cleanest set diff
bpblanken Nov 23, 2024
b123572
Finish off
bpblanken Nov 24, 2024
e3f4528
merge
bpblanken Nov 24, 2024
cba27ef
Tests passing!
bpblanken Nov 24, 2024
2382a42
ruff
bpblanken Nov 24, 2024
3092c84
ruff
bpblanken Nov 24, 2024
4d4df48
Change the params
bpblanken Nov 25, 2024
8bac0e0
Fix params
bpblanken Nov 25, 2024
9db1d3c
params
bpblanken Nov 25, 2024
526e62b
More clinvar mocking
bpblanken Nov 25, 2024
a5ebfa1
hardcode these
bpblanken Nov 25, 2024
fddba67
Merge pull request #989 from broadinstitute/benb/other_cleanup
jklugherz Nov 25, 2024
d7f8580
Merge pull request #986 from broadinstitute/reference-data-new-variants
jklugherz Nov 25, 2024
3a89d33
delete a bunch of stuff
jklugherz Nov 25, 2024
f2b0529
ruff
jklugherz Nov 25, 2024
18c9d63
remove rdc and crdq
jklugherz Nov 25, 2024
21be5f8
merge
jklugherz Nov 25, 2024
2bd006c
delete v02
jklugherz Nov 25, 2024
1e29b6d
remove comment references to deleted file
jklugherz Nov 25, 2024
250256f
last test
jklugherz Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,15 @@
import hail as hl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can just be deleted!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the rest of the stuff in the v02 directory?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can go now, everything should have an appropriate analog in v3!


from v03_pipeline.lib.model import ReferenceGenome
from v03_pipeline.lib.reference_data.clinvar import (
download_and_import_latest_clinvar_vcf,
CLINVAR_GOLD_STARS_LOOKUP,
)
from hail_scripts.utils.hail_utils import write_ht
from v03_pipeline.lib.reference_datasets.clinvar import CLINVAR_GOLD_STARS_LOOKUP, get_ht

CLINVAR_PATH = 'ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_{reference_genome}/clinvar.vcf.gz'
CLINVAR_HT_PATH = 'gs://seqr-reference-data/{reference_genome}/clinvar/clinvar.{reference_genome}.ht'

for reference_genome in ReferenceGenome:
clinvar_url = CLINVAR_PATH.format(reference_genome=reference_genome.value)
ht = download_and_import_latest_clinvar_vcf(clinvar_url, reference_genome)
ht = get_ht(clinvar_url, reference_genome)
timestamp = hl.eval(ht.version)
ht = ht.annotate(
gold_stars=CLINVAR_GOLD_STARS_LOOKUP.get(hl.delimit(ht.info.CLNREVSTAT))
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

32 changes: 19 additions & 13 deletions v03_pipeline/lib/annotations/fields_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,32 +6,39 @@
from v03_pipeline.lib.annotations.fields import get_fields
from v03_pipeline.lib.model import (
DatasetType,
ReferenceDatasetCollection,
ReferenceGenome,
)
from v03_pipeline.lib.paths import valid_reference_dataset_collection_path
from v03_pipeline.lib.paths import valid_reference_dataset_path
from v03_pipeline.lib.reference_datasets.reference_dataset import ReferenceDataset
from v03_pipeline.lib.test.mocked_dataroot_testcase import MockedDatarootTestCase
from v03_pipeline.lib.vep import run_vep
from v03_pipeline.var.test.vep.mock_vep_data import MOCK_37_VEP_DATA, MOCK_38_VEP_DATA

TEST_INTERVAL_1 = 'v03_pipeline/var/test/reference_data/test_interval_1.ht'
GRCH37_TO_GRCH38_LIFTOVER_REF_PATH = (
'v03_pipeline/var/test/liftover/grch37_to_grch38.over.chain.gz'
)
GRCH38_TO_GRCH37_LIFTOVER_REF_PATH = (
'v03_pipeline/var/test/liftover/grch38_to_grch37.over.chain.gz'
)
TEST_GNOMAD_NONCODING_CONSTRAINT_38_HT = 'v03_pipeline/var/test/reference_datasets/GRCh38/gnomad_non_coding_constraint/1.0.ht'
TEST_SCREEN_38_HT = 'v03_pipeline/var/test/reference_datasets/GRCh38/screen/1.0.ht'


class FieldsTest(MockedDatarootTestCase):
def setUp(self) -> None:
super().setUp()
shutil.copytree(
TEST_INTERVAL_1,
valid_reference_dataset_collection_path(
TEST_GNOMAD_NONCODING_CONSTRAINT_38_HT,
valid_reference_dataset_path(
ReferenceGenome.GRCh38,
DatasetType.SNV_INDEL,
ReferenceDatasetCollection.INTERVAL,
ReferenceDataset.gnomad_non_coding_constraint,
),
)
shutil.copytree(
TEST_SCREEN_38_HT,
valid_reference_dataset_path(
ReferenceGenome.GRCh38,
ReferenceDataset.screen,
),
)

Expand Down Expand Up @@ -120,18 +127,17 @@ def test_get_formatting_fields(self, mock_vep: Mock) -> None:
reference_genome,
),
**{
f'{rdc.value}_ht': hl.read_table(
valid_reference_dataset_collection_path(
f'{reference_dataset}_ht': hl.read_table(
valid_reference_dataset_path(
reference_genome,
DatasetType.SNV_INDEL,
rdc,
reference_dataset,
),
)
for rdc in ReferenceDatasetCollection.for_reference_genome_dataset_type(
for reference_dataset in ReferenceDataset.for_reference_genome_dataset_type_annotations(
reference_genome,
DatasetType.SNV_INDEL,
)
if rdc.requires_annotation
if reference_dataset.is_keyed_by_interval
},
**(
{
Expand Down
8 changes: 0 additions & 8 deletions v03_pipeline/lib/annotations/mito.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,6 @@ def HL(mt: hl.MatrixTable, **_: Any) -> hl.Expression: # noqa: N802
return hl.if_else(is_called, mt.HL, 0)


def high_constraint_region_mito(
ht: hl.Table,
interval_ht: hl.Table,
**_: Any,
) -> hl.Expression:
return hl.is_defined(interval_ht[ht.locus])


def mito_cn(mt: hl.MatrixTable, **_: Any) -> hl.Expression:
return hl.int32(mt.mito_cn)

Expand Down
29 changes: 0 additions & 29 deletions v03_pipeline/lib/annotations/rdc_dependencies.py

This file was deleted.

14 changes: 7 additions & 7 deletions v03_pipeline/lib/annotations/snv_indel.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,16 +73,16 @@ def gt_stats(

def gnomad_non_coding_constraint(
ht: hl.Table,
interval_ht: hl.Table,
gnomad_non_coding_constraint_ht: hl.Table,
**_: Any,
) -> hl.Expression:
return hl.Struct(
z_score=(
interval_ht.index(ht.locus, all_matches=True)
gnomad_non_coding_constraint_ht.index(ht.locus, all_matches=True)
.filter(
lambda x: hl.is_defined(x.gnomad_non_coding_constraint['z_score']),
lambda x: hl.is_defined(x['z_score']),
)
.gnomad_non_coding_constraint.z_score.first()
.z_score.first()
),
)

Expand All @@ -98,16 +98,16 @@ def rg38_locus(

def screen(
ht: hl.Table,
interval_ht: hl.Table,
screen_ht: hl.Table,
**_: Any,
) -> hl.Expression:
return hl.Struct(
region_type_ids=(
interval_ht.index(
screen_ht.index(
ht.locus,
all_matches=True,
).flatmap(
lambda x: x.screen['region_type_ids'],
lambda x: x['region_type_ids'],
)
),
)
Expand Down
8 changes: 0 additions & 8 deletions v03_pipeline/lib/model/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
from v03_pipeline.lib.model.cached_reference_dataset_query import (
CachedReferenceDatasetQuery,
)
from v03_pipeline.lib.model.dataset_type import DatasetType
from v03_pipeline.lib.model.definitions import (
AccessControl,
Expand All @@ -10,18 +7,13 @@
Sex,
)
from v03_pipeline.lib.model.environment import Env
from v03_pipeline.lib.model.reference_dataset_collection import (
ReferenceDatasetCollection,
)

__all__ = [
'AccessControl',
'CachedReferenceDatasetQuery',
'DatasetType',
'Env',
'Sex',
'PipelineVersion',
'ReferenceDatasetCollection',
'ReferenceGenome',
'SampleType',
]
65 changes: 0 additions & 65 deletions v03_pipeline/lib/model/cached_reference_dataset_query.py

This file was deleted.

1 change: 0 additions & 1 deletion v03_pipeline/lib/model/dataset_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,6 @@ def formatting_annotation_fns(
DatasetType.MITO: [
mito.common_low_heteroplasmy,
mito.haplogroup,
mito.high_constraint_region_mito,
mito.mitotip,
mito.rsid,
shared.variant_id,
Expand Down
Loading
Loading