A python client integration of gen3 and terra.
For python developers, who have requirements to access both terra and gen3 platforms, pyAnVIL is an integration module that provides SSO (single sign on) using terra as an IDP (identity provider) and manages distribution of dependencies unlike juggling multiple credentials and installs, pyAnVIL provides developer friendly experience.
For terra users, skip this step, terra installs these dependencies in each vm
-
gcloud cli tools installed and configured gcloud.
-
Google Id provisioned in both Terra and Gen3:
- One time Account Linking:
-
Pre-requisite: google account provisioned in both Gen3 and Terra.
-
Log into https://gen3.theanvil.io/
-
Log into https://anvil.terra.bio
-
In Terra, navigate to your profile
-
“unlink” my NHGRI AnVIL Data Commons Framework Services from https://anvil.terra.bio/#profile
-
open a new window to gen3.theanvil.io and login using my google id
-
return to the terra profile screen and “renew” the identity
-
- One time Account Linking:
-
Per instance, terra API setup:
- Use the google account and billing project to setup credentials for the terra api.
gcloud auth login <google-account> gcloud auth application-default set-quota-project <billing-project-id>
- Use the google account and billing project to setup credentials for the terra api.
-
Validation
gcloud auth print-access-token >>> ya29.a0AfH6SMBSPFSt252qQNl....... fissfc config >>> .... root_url https://broad-bond-prod.appspot.com/
-
Setup
pip install pyAnVIL
from anvil.gen3_auth import Gen3TerraAuth
from gen3.submission import Gen3Submission
auth = Gen3TerraAuth()
gen3_endpoint = "https://gen3.theanvil.io"
submission_client = Gen3Submission(gen3_endpoint, auth)
query = '{project(first:0) {code, subjects {submitter_id}, programs {name} }}'
results = submission_client.query(query)
[p['code'] for p in results['data']['project']]
>>> ['GTEx', '1000Genomes']
from anvil.terra import FAPI
FAPI.whoami()
>>> '[email protected]'
At this time, AnVIL's terra workspaces contain data from five consortiums spread across 446 workspaces. These workspaces express study entities in a wide variety of ways:
Distinct schemas of major entities:
- Patient: (participant) 9, (subject) 32
- Specimen: (sample) 27
- FamilyRelationship: (family) 3
- DocumentReference: (blob) 8
- Task: (sequencing) 24
Note this break down does not account for diversity in vocabularies, entity linking, etc.
# create a working directory for our data
mkdir -p ./DATA
# manually maintained data tracking
anvil_etl extract spreadsheet
# gen3 DRS identifiers
anvil_etl extract gen3
# terra workspaces, (takes several minutes)
anvil_etl extract terra extract 2> /tmp/extract_terra.log
# review /tmp/extract_terra.log
# should end in `INFO Indexing`
tail /tmp/extract_terra.log
# google blob
anvil_etl extract google --user_project $WORKSPACE_NAMESPACE 2> /tmp/extract_google.log
# review /tmp/extract_google.log
# should end in `INFO Indexing`
tail /tmp/extract_google.log
2022-04-12 01:49:30,144 data_ingestion_tracker.py INFO Read 485 projects from https://raw.githubusercontent.com/anvilproject/anvil-portal/main/plugins/utils/dashboard-source-anvil.tsv. Wrote to ./DATA/data_ingestion_tracker.json
2022-04-12 01:50:03,672 gen3.py INFO Created ./DATA/drs_file.sqlite
2022-04-12 01:50:03,822 gen3.py INFO
Extracted File Counts
gen3_project_id anvil_project_id file_count
-------------------------------------------------- ------------------ ------------
CCDG-phs001259-DS-CARD-MDS-GSO 4318
CCDG-phs001398-GRU 992
CCDG-phs001487-DS-MULTIPLE_DISEASES-IRB-COL-NPU-RD 1663
CCDG-phs001569-GRU 2272
CCDG-phs001642-DS-GID 166
CCDG-phs001642-DS-IBD 1462
CCDG-phs001642-GRU 2757
CCDG-phs001642-HMB 1810
CF-GTEx 122990
CMG-Broad-DS-KRD-RD Hildebrandt 2444
CMG-Broad-DS-NIC-EMP-LENF KNC 116
CMG-Broad-GRU Bonnemann 234
CMG-Broad-GRU Manton 990
CMG-Broad-GRU Pierce 1274
CMG-Broad-HMB-MDS Gleeson 2362
CMG-Broad-HMB-MDS Laing 62
CMG-Broad-HMB-MDS VCGS-White 1026
CMG-Broad-pre-release-DS-BFD-MDS Sankaran 554
CMG-Broad-pre-release-DS-CSD-MDS Seidman 258
CMG-Broad-pre-release-DS-CVD-MDS Ware 20
CMG-Broad-pre-release-DS-NEURO-GSO-MDS Beggs 218
CMG-Broad-pre-release-DS-NEURO-MDS Walsh 1534
CMG-Broad-pre-release-GRU Estonia-Ounap 292
CMG-Broad-pre-release-GRU OGrady 146
CMG-Broad-pre-release-HMB-MDS Myoseq 2580
CMG-Broad-pre-release-HMB-MDS Ravenscroft 70
open_access-1000Genomes 13008
tutorial-synthetic_data_set_1 10060
2022-04-12 02:03:48,500 terra.py INFO ('NIMH', 'AnVIL_NIMH_Broad_WGSPD1_McCarroll_Pato_GRU_WGS')
2022-04-12 02:03:56,872 terra.py INFO ('NIMH', 'AnVIL_NIMH_Broad_WGSPD1_McCarroll_Escamilla_DS_WGS')
2022-04-12 02:03:57,274 terra.py INFO ('NIMH', 'AnVIL_NIMH_Broad_WGSPD1_McCarroll_Pato_GRU_10XLRGenomes')
2022-04-12 02:03:58,069 terra.py INFO ('NIMH', 'AnVIL_NIMH_Broad_ConvergentNeuro_McCarroll_Eggan_CIRM_GRU_WGS')
2022-04-12 02:03:58,784 terra.py INFO ('NIMH', 'AnVIL_NIMH_Broad_WGSPD1_McCarroll_Braff_DS_WGS')
2022-04-12 02:03:59,978 terra.py INFO ('NIMH', 'AnVIL_NIMH_Broad_ConvergentNeuro_McCarroll_Eggan_Finkel_SMA_DS_WGS')
2022-04-12 02:04:00,221 terra.py INFO ('NIMH', 'AnVIL_NIMH_Broad_WGSPD1_McCarroll_Braff_DS_10XLRGenomes')
2022-04-12 02:04:00,779 terra.py INFO ('NIMH', 'AnVIL_NIMH_Broad_ConvergentNeuro_McCarroll_Eggan_CIRM_GRU_VillageData')
2022-04-12 02:04:01,959 terra.py INFO ('Public', '1000G-high-coverage-2019')
2022-04-12 02:04:06,578 entities.py INFO Indexing
2022-04-12 14:50:34,050 google.py INFO ('CCDG', 'anvil_ccdg_broad_ai_ibd_daly_xavier_share_wes', 'fc-secure-abc7f058-0260-4e82-a911-abfec3dcb676')
2022-04-12 14:50:34,425 google.py INFO ('CCDG', 'anvil_ccdg_broad_ai_ibd_niddk_daly_brant_wes', 'fc-secure-29cd113f-7eca-4526-aa52-dde1b8cb41d0')
2022-04-12 14:50:34,978 google.py INFO ('CCDG', 'anvil_ccdg_broad_ai_ibd_niddk_daly_duerr_wes', 'fc-secure-877e6c8c-72ef-46d0-b3f3-37dd175771fe')
2022-04-12 14:50:35,865 google.py INFO ('CCDG', 'anvil_ccdg_broad_ai_ibd_niddk_daly_silverberg_wes', 'fc-secure-0eba3dae-89be-4642-8982-9a80a7428cd2')
2022-04-12 14:50:37,030 google.py INFO ('CCDG', 'anvil_ccdg_broad_daly_igsr_1kg_twist_gsa', 'fc-secure-752e48e6-1e66-4f85-9194-456562e87b90')
2022-04-12 14:50:37,594 google.py INFO ('CCDG', 'anvil_ccdg_broad_daly_igsr_1kg_twist_wes', 'fc-secure-b41964ad-0c8a-47da-8504-f8636ff3d318')
2022-04-12 14:50:37,980 google.py INFO ('NHGRI', 'anvil_nhgri_broad_ibd_daly_kugathasan_wes', 'fc-secure-0ca0c5e6-26ca-47ea-b509-ec4eaa058fc6')
2022-04-12 14:50:38,277 google.py INFO ('NHGRI', 'anvil_nhgri_broad_ibd_daly_turner_wes', 'fc-secure-bee7792c-ef35-478d-a9bb-c8f2054c335c')
2022-04-12 14:50:38,398 google.py INFO ('NHGRI', 'anvil_nhgri_broad_ibd_daly_winter_wes', 'fc-secure-72a949c5-0b7d-45c9-96c3-ff4d25815ed5')
2022-04-12 14:50:38,675 entities.py INFO Indexing
# setup environmental values
source /dev/stdin <<< `anvil_etl utility env`
# normalize the data
anvil_etl transform normalize 2> /tmp/normalize.log
# log should list workspaces, with any warnings or errors logged without exception stack traces.
tail /tmp/normalize.log
# gather statistics
anvil_etl transform analyze 2> /tmp/analyze.log
tail /tmp/analyze.log
# render the qa-report
anvil_etl utility qa > ./DATA/qa-report.md
# render the qa-report in a notebook
from IPython.display import Markdown, display, HTML
display(Markdown("./DATA/qa-report.md"))
export FHIR_PROJECT_NAME=xxx
export GOOGLE_LOCATION=xxx
export FHIR_PROJECT=xxx
export GOOGLE_DATASET=xxx
export TOKEN=xxx
export GOOGLE_DATASTORES=xxx
export GOOGLE_DATASTORE=xxx
export GOOGLE_BUCKET=xxx
export OUTPUT_PATH=xxx
export IMPLEMENTATION_GUIDE_PATH=xxx
2022-04-12 15:05:34,105 normalizer.py INFO ('CCDG', 'anvil_ccdg_broad_ai_ibd_niddk_daly_silverberg_wes')
2022-04-12 15:05:36,168 normalizer.py INFO ('CCDG', 'anvil_ccdg_broad_daly_igsr_1kg_twist_gsa')
2022-04-12 15:05:36,899 normalizer.py INFO ('CCDG', 'anvil_ccdg_broad_daly_igsr_1kg_twist_wes')
2022-04-12 15:05:37,483 normalizer.py INFO ('NHGRI', 'anvil_nhgri_broad_ibd_daly_kugathasan_wes')
2022-04-12 15:05:37,939 normalizer.py INFO ('NHGRI', 'anvil_nhgri_broad_ibd_daly_turner_wes')
2022-04-12 15:05:38,096 normalizer.py INFO ('NHGRI', 'anvil_nhgri_broad_ibd_daly_winter_wes')
2022-04-19 00:31:47,216 transform.py INFO working on anvil_nhgri_broad_ibd_daly_kugathasan_wes
2022-04-19 00:31:47,237 transform.py INFO working on anvil_nhgri_broad_ibd_daly_turner_wes
2022-04-19 00:31:47,290 transform.py INFO working on anvil_nhgri_broad_ibd_daly_winter_wes
consortium | workspace | patients | specimens | tasks | documents | vcf | tbi | cram | qa_grade | drs_grade | md5 | crai | idat | gtc | NA | bam | bai | bedpe | loupe | csv | txt | nan | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Public | 1000G-high-coverage-2019 | 3202 | 3202 | 3202 | 9609 | 3205 | 3202 | 3202 | 99.9994 | 99.0006 | ||||||||||||
1 | CMG | ANVIL_CMG_BROAD_BRAIN_ENGLE_WES | 946 | 946 | 946 | 1419 | 473 | 100 | 0 | 473 | 473 | ||||||||||||
2 | CMG | ANVIL_CMG_BROAD_BRAIN_SHERR_WGS | 6 | 6 | 6 | 9 | 3 | 100 | 0 | 3 | 3 | ||||||||||||
3 | CMG | ANVIL_CMG_BROAD_ORPHAN_SCOTT_WGS | 30 | 30 | 30 | 45 | 15 | 100 | 0 | 15 | 15 | ||||||||||||
4 | CMG | ANVIL_CMG_Broad_Muscle_Laing_WES | 31 | 31 | 31 | 62 | 31 | 99.9355 | 99 | 31 | |||||||||||||
5 | CMG | ANVIL_CMG_Broad_Orphan_Jueppner_WES | 20 | 20 | 20 | 30 | 10 | 100 | 0 | 10 | 10 | ||||||||||||
6 | CMG | ANVIL_CMG_UWASH_DS-BAV-IRB-PUB-RD | 177 | 177 | 0 | 0 | 99.9774 | 0 | |||||||||||||||
... | |||||||||||||||||||||||
445 | NHGRI | anvil_nhgri_broad_ibd_daly_winter_wes | 823 | 823 | 823 | 1236 | 412 | 100 | 0 | 412 | 412 |
./DATA
├── analysis.ndjson
├── google_entities.sqlite
├── fhir
│ ├── IG
│ │ ├── ImplementationGuide-NCPI-FHIR-Implementation-Guide.json
│ │ ├── ...
│ └── public
│ └── Public
│ └── 1000G-high-coverage-2019
│ ├── protected
│ │ ├── DocumentReference.ndjson
│ │ ├── Patient.ndjson
│ │ ├── ResearchSubject.ndjson
│ │ ├── Specimen.ndjson
│ │ └── Task.ndjson
│ └── public
│ ├── Organization.ndjson
│ ├── Practitioner.ndjson
│ ├── PractitionerRole.ndjson
│ ├── ResearchStudy.ndjson
│ └── ResearchStudyObservationSummary.ndjson
│ ├── pending
│ │ ├── CCDG
│ │ │ ├── AnVIL_ccdg_asc_ndd_daly_talkowski_ac-boston_asd_exome
│ │ │ │ ├── protected
│ │ │ │ └── public
...
│ │ ├── CMG
│ │ │ ├── AnVIL_CMG_Broad_Brain_Engle_WGS
│ │ │ │ ├── protected
│ │ │ │ └── public
│ ├── phs000160-Consortia_Access_Only
│ │ └── CCDG
│ │ └── AnVIL_CCDG_NYGC_NP_Alz_LOAD_WGS
│ │ ├── protected
│ │ └── public
│ ├── phs000298
│ │ └── CCDG
│ │ ├── AnVIL_ccdg_asc_ndd_daly_talkowski_CDCSEED_asd_exome
│ │ │ ├── protected
│ │ │ └── public
...
├── sample
│ ├── CCDG
│ │ ├── AnVIL_CCDG_Baylor_CVD_AFib_BioVU_WGS
│ │ │ ├── blob.ndjson
│ │ │ ├── qc_result_sample.ndjson
│ │ │ ├── schema.ndjson
│ │ │ ├── sequencing.ndjson
│ │ │ └── subject.ndjson
│ │ ├── AnVIL_CCDG_Baylor_CVD_AFib_Groningen_WGS
│ │ │ └── ...
│ │ ├── ...
│ ├── CMG
│ │ ├── ANVIL_CMG_BROAD_BRAIN_ENGLE_WES
│ │ │ ├── blob.ndjson
│ │ │ ├── participant.ndjson
│ │ │ ├── sample.ndjson
│ │ │ └── schema.ndjson
│ │ ├── ANVIL_CMG_BROAD_BRAIN_SHERR_WGS
│ │ │ └── ...
│ │ ├── ...
│ ├── GTEx
│ │ └── AnVIL_GTEx_V8_hg38
│ │ └── ...
│ ├── NHGRI
│ │ ├── anvil_nhgri_broad_ibd_daly_kugathasan_wes
│ │ │ └── ...
│ │ └── ...
│ ├── NIMH
│ │ ├── AnVIL_NIMH_Broad_ConvergentNeuro_McCarroll_Eggan_CIRM_GRU_WGS
│ │ │ └── ...
│ │ └── ...
│ └── Public
│ ├── 1000G-high-coverage-2019
│ │ │ └── ...
│ │ └── ...
└── workspaces
├── CCDG
│ ├── AnVIL_CCDG_Baylor_CVD_AFib_BioVU_WGS.pickle
│ └── ...
├── CMG
│ ├── ANVIL_CMG_BROAD_BRAIN_ENGLE_WES.pickle
│ └── ...
├── GTEx
│ └── ...
├── NHGRI
│ └── ...
├── NIMH
│ └── ...
└── Public
└── 1000G-high-coverage-2019-DEV_ONLY.pickle
468 directories, 2386 files
- See the load options on
anvil_etl load fhir
$ anvil_etl load fhir --help
Usage: anvil_etl load fhir [OPTIONS] COMMAND [ARGS]...
Commands to setup and load fhir server.
Options:
--help Show this message and exit.
Commands:
IG Commands to create and delete implementation guide.
data-set Commands to create and delete data_set.
data-store Commands to create and delete data_store.
# create the IG, FHIR's "schema"
anvil_etl load fhir IG create
# create the data set and data store containers
anvil_etl load fhir data-set create
anvil_etl load fhir data-store create
# load the data to respective stores
anvil_etl load fhir data-store load
# load all public resources into the public store
anvil_etl load fhir data-store load-public
- You can view progress at https://console.cloud.google.com/healthcare/browser/locations/us-west2/datasets/anvil-test/operations?project=fhir-test-16-342800
Roles are assigned at the data set level
and are inherited by child data-stores:
-
See FHIR's Search API
-
See Google's Healthcare API conformance statement
-
The base url is:
https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores
-
Append data-store to this for a complete url e.g.
public/fhir/
:
-
Complete data-store list:
- pending
- phs000160-Consortia_Access_Only
- phs000298
- phs000298-DS-ASD
- phs000298-DS-MH
- phs000298-GRU
- phs000298-GRU-NPU
- phs000298-HMB
- phs000356-HMB-NPU
- phs000424-GRU
- phs000496-Consortia_Access_Only
- phs000693-DS-BAV-IRB-PUB-PD
- phs000693-DS-BDIS
- phs000693-DS-EP
- phs000693-DS-HFA
- phs000693-DS-NBIA
- phs000693-GRU
- phs000693-GRU-IRB
- phs000693-HMB
- phs000693-HMB-IRB
- phs000711-HMB-IRB-NPU
- phs000711-HMB-NPU
- phs000744-DS-MC
- phs000744-DS-RARED
- phs000744-GRU
- phs000744-HMB
- phs000744-HMB-GSO
- phs000920-DS-LD-RD
- phs000997
- phs000997-Consortia_Access_Only
- phs001062
- phs001062-Consortia_Access_Only
- phs001155-GRU
- phs001211-HMB-IRB
- phs001222-DS-DRC-IRB-NPU
- phs001227-DS-ATHSCL-IRB-MDS
- phs001227-GRU-IRB
- phs001259-DS-CARD-MDS-GSO
- phs001272
- phs001272-Consortia_Access_Only
- phs001272-DS-BFD-MDS
- phs001272-DS-CSD-MDS
- phs001272-DS-NEURO-GSO-MDS
- phs001272-GRU
- phs001272-HMB-MDS
- phs001395-HMB-NPU
- phs001398-GRU
- phs001487-DS-CVD-IRB-COL-MDS
- phs001489
- phs001489-DS-CARDI_NEURO
- phs001489-DS-EAED-MDS
- phs001489-DS-EARET-MDS
- phs001489-DS-EP
- phs001489-DS-EP-ETIOLOGY-MDS
- phs001489-DS-EP-MDS
- phs001489-DS-EP-NPU
- phs001489-DS-EPI-ASZ-MED-MDS
- phs001489-DS-EPI-MUL-CON-MDS
- phs001489-DS-EPI-NPU-MDS
- phs001489-DS-EPIL-BC-ID-MDS
- phs001489-DS-NEURO-AD-NPU
- phs001489-DS-NEURO-MDS
- phs001489-DS-NPD-IRB-NPU
- phs001489-DS-SEIZD
- phs001489-EPIL_BRAINAB_CONVUL_INTELCT_DIS_MDS
- phs001489-EPIL_BRAIN_AB_INTEL_DIS_MDS
- phs001489-EPIL_BRAIN_AB_MDS
- phs001489-EPIL_CO_MORBIDI_MDS
- phs001489-GRU
- phs001489-GRU-IRB
- phs001489-GRU-NPU
- phs001489-HMB
- phs001489-HMB-IRB-MDS
- phs001489-HMB-MDS
- phs001489-HMB-NPU
- phs001489-HMB-NPU-MDS
- phs001498
- phs001502
- phs001502-Consortia_Access_Only
- phs001502-HMB-IRB-PUB
- phs001506-DS-CVD-IRB
- phs001543-Consortia_Access_Only
- phs001544-Consortia_Access_Only
- phs001545-Consortia_Access_Only
- phs001546
- phs001547-Consortia_Access_Only
- phs001569-Consortia_Access_Only
- phs001569-GRU
- phs001579-GRU-IRB-NPU
- phs001592-DS-CVD
- phs001598-Consortia_Access_Only
- phs001600
- phs001600-Consortia_Access_Only
- phs001624
- phs001624-Consortia_Access_Only
- phs001624-HMB-GSO
- phs001642
- phs001642-DS-GID
- phs001642-DS-IBD
- phs001642-GRU
- phs001642-HMB
- phs001644
- phs001676-DS-AONDD-IRB
- phs001725
- phs001740-DS-ASD-IRB
- phs001741-DS-ASD-IRB
- phs001766-DS-ASD
- phs001766-DS-ASD-IRB
- phs001871-DS-CAD-IRB-COL-NPU
- phs001873-HMB-GSO
- phs001880-GRU-NPU
- phs001894-DS-EAC-PUB-GSO
- phs001913-GRU-IRB
- phs001933
- phs001933-Consortia_Access_Only
- phs002004-DS-ASD
- phs002018-Consortia_Access_Only
- phs002018-HMB
- phs002032-GRU
- phs002041-DS-MLHLTH-MDS
- phs002041-DS-SZRD-MDS
- phs002041-GRU
- phs002042-DS-ASD-MDS-PUB
- phs002042-GRU-MDS-PUB
- phs002043-DS-ASD
- phs002043-GRU
- phs002044-DS-ASD-IRB
- phs002163-GRU
- phs002236
- phs002242
- phs002243-HMB
- phs002282-DS-CVDRF
- phs002325-DS-CVD
- phs002726
- phs002774
- public
- The
anvil_curl
command will dispatch to all stores and discoverResearchStudy
entities:
export TOKEN=$(gcloud auth application-default print-access-token)
export GOOGLE_DATASTORES=$(gcloud beta healthcare fhir-stores list --dataset=$GOOGLE_DATASET --location=$GOOGLE_LOCATION | awk '(NR>1){print $1}' | sed 's/$/,/g' | tr -d "\n")
anvil_curl '/ResearchStudy?_elements=id&_count=1000' | jq -rc '.entry[] | .fullUrl' | sort
https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores/pending/fhir/ResearchStudy/AnVIL-ccdg-asc-ndd-daly-talkowski-ac-boston-asd-exome
https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores/pending/fhir/ResearchStudy/AnVIL-ccdg-asc-ndd-daly-talkowski-AGRE-FEMF-asd-exome
https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores/pending/fhir/ResearchStudy/AnVIL-ccdg-asc-ndd-daly-talkowski-domenici-asd-exome
https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores/pending/fhir/ResearchStudy/AnVIL-ccdg-asc-ndd-daly-talkowski-herman-asd-exome
https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores/pending/fhir/ResearchStudy/AnVIL-ccdg-asc-ndd-daly-talkowski-lattig-asd-exome
https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores/pending/fhir/ResearchStudy/AnVIL-ccdg-asc-ndd-daly-talkowski-mcpartland-asd-exome
https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores/pending/fhir/ResearchStudy/AnVIL-ccdg-asc-ndd-daly-talkowski-minshew-asd-exome
https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores/pending/fhir/ResearchStudy/AnVIL-ccdg-asc-ndd-daly-talkowski-palotie-asd-exome
https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores/pending/fhir/ResearchStudy/AnVIL-ccdg-asc-ndd-daly-talkowski-puura-asd-exome
https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores/pending/fhir/ResearchStudy/AnVIL-ccdg-asc-ndd-daly-talkowski-TASC-asd-exome
...
# setup env variables
source /dev/stdin <<< `anvil_etl utility env`
# show count of ResearchStudy in each data store
anvil_curl '/ResearchStudy?_count=1000&_elements=id' | jq -c '{"data_store": (.link[2].url | match(".*/fhirStores/(.*)/fhir.*").captures[0].string), "total":.total}'
- Setup Environment
Set environmental variables by calling fhir_env
Provide a project name and region. Note: please ensure the healthcare API is available in that region. https://cloud.google.com/healthcare-api/docs/concepts/regions
The script will set reasonable values for other environmental variables. You may override them on the command line.
# setup environmental values
$ source /dev/stdin <<< `anvil_etl utility env`
Google project setup:
Bash snippets to set up Google Project:
export FHIR_PROJECT=$(gcloud projects list --filter=name=$FHIR_PROJECT_NAME --format="value(projectId)" )
if [ -z "$FHIR_PROJECT" ]; then
echo "Need to create project"
unset MISSING
[ -z "$GOOGLE_BILLING_ACCOUNT" ] && echo "missing: GOOGLE_BILLING_ACCOUNT billing account for project" && MISSING="Y"
[ -z "$GOOGLE_LOCATION" ] && echo "missing: GOOGLE_LOCATION the google region for the project & service" && MISSING="Y"
[ ! -z "$MISSING" ] && echo "please set required env variables" && exit 1
# create the project
gcloud projects create --name=$FHIR_PROJECT_NAME --quiet
# capture that ID and assign it to an environmental variable
export FHIR_PROJECT=$(gcloud projects list --filter=name=$FHIR_PROJECT_NAME --format="value(projectId)" )
[ -z "$FHIR_PROJECT" ] && echo "Could not create FHIR_PROJECT" && exit 1
# attach a billing to the project
gcloud beta billing projects link $FHIR_PROJECT --billing-account=$GOOGLE_BILLING_ACCOUNT
# point as this project by default.
gcloud config set project $FHIR_PROJECT
# ‘Cloud Healthcare API’ click ‘Enable’ to add the API to the current project.
gcloud services enable healthcare.googleapis.com
fi
# point as this project by default.
gcloud config set project $FHIR_PROJECT
# get service account
export GOOGLE_SERVICE_ACCOUNT=$(gcloud projects get-iam-policy $FHIR_PROJECT --format="value(bindings.members)" --flatten="bindings[]" | grep serviceAccount | sed s/serviceAccount:// | head -1)
[ -z "$GOOGLE_SERVICE_ACCOUNT" ] && echo "Unable to set GOOGLE_SERVICE_ACCOUNT ??" && exit 1
# assign bucket reader permissions so that it can be used to read the bucket.
gcloud projects add-iam-policy-binding $FHIR_PROJECT --member=serviceAccount:$GOOGLE_SERVICE_ACCOUNT --role=roles/storage.objectViewer
[ $? -ne 0 ] && echo "Unable to set roles/storage.objectViewer" && exit 1
echo Granted roles/storage.objectViewer to $GOOGLE_SERVICE_ACCOUNT on $GOOGLE_BUCKET
gcloud projects add-iam-policy-binding $FHIR_PROJECT --member=serviceAccount:$GOOGLE_SERVICE_ACCOUNT --role=roles/storage.objects.list
[ $? -ne 0 ] && echo "Unable to set roles/storage.objects.list" && exit 1
echo Granted roles/storage.objects.list to $GOOGLE_SERVICE_ACCOUNT on $GOOGLE_BUCKET
We incorporated fhirclient
, a flexible Python client for FHIR servers supporting the SMART on FHIR protocol.
Example
from anvil.clients.fhir_client import FHIRClient
from anvil.clients.smart_auth import GoogleFHIRAuth
settings = {
'app_id': 'my_web_app',
'api_base': 'https://healthcare.googleapis.com/v1beta1/projects/fhir-test-16-342800/locations/us-west2/datasets/anvil-test/fhirStores/public/fhir'
}
# optionally pass token e.g. GoogleFHIRAuth(access_token='ya29.abcd...')
smart = FHIRClient(settings=settings, auth=GoogleFHIRAuth())
smart.prepare()
assert smart.ready, "server should be ready"
# search for all ResearchStudy
import fhirclient.models.researchstudy as rs
[s.id for s in rs.ResearchStudy.where(struct={}).perform_resources(smart.server)]
>>>
['AnVIL-CMG-Broad-Muscle-Myoseq-WES', 'AnVIL-CMG-Broad-Orphan-Estonia-Ounap-WGS', ... ]
For more information on usage see smart-on-fhir/client-py
Local testing
Test json files using FHIR reference validator
java -jar validator_cli.jar /tmp/invalid_body_no_subject.json -ig ~/client-apis/pyAnVIL/DATA/fhir/IG/
-
set up virtual env
python3 -m venv venv source venv/bin/activate python3 -m pip install -r requirements.txt python3 -m pip install -r requirements-dev.txt python3 -m pip install -e .
-
tests
# setup environmental values source /dev/stdin <<< `anvil_etl utility env` pytest tests/integration/
-
exploratory
Investigates Google FHIR conformance, KF and dbGAP FHIR:
- they are designed to find corner cases in the way the data was encoded.
- as such they will fail
tests/exploratory/
├── google
│ │ └── test_validation.cpython-39-pytest-7.1.1.pyc
│ └── test_validation.py
└── ncpi
├── test_condition.py
└── test_ncpi_conformance.py
- PyPi
# update pypi
export TWINE_USERNAME= # the username to use for authentication to the repository.
export TWINE_PASSWORD= # the password to use for authentication to the repository.
rm -r dist/
python3 setup.py sdist bdist_wheel
twine upload dist/*
- Read The Docs
https://readthedocs.org/projects/pyanvil/