Skip to content

Commit

Permalink
Merge branch 'master' into public_release_fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
rmadupuri authored Dec 27, 2023
2 parents 2956f5c + 396f37d commit 3dc0f1f
Show file tree
Hide file tree
Showing 33 changed files with 227 additions and 2 deletions.
1 change: 1 addition & 0 deletions public/difg_glass/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The data are available under the ODC Open Database License (ODbL)(http://opendatacommons.org/licenses/odbl/1.0/) (summary available here: http://www.opendatacommons.org/licenses/odbl/1-0/summary/): you are free to share and modify the data so long as you attribute any public use of the database, or works produced from the database; keep the resulting data-sets open; and offer your shared or adapted version of the data-set under the same ODbL license.
89 changes: 89 additions & 0 deletions public/difg_glass/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Curation and transformation of GLASS Dataset:

## General Information
- Data Source: [GLASS data on Synapse](https://www.synapse.org/#!Synapse:syn17038081/wiki/585622)
- Reference publication: [PubMed Reference](https://pubmed.ncbi.nlm.nih.gov/35649412/)
- The data here represents the GLASS `2022-05-31` release version.
- Reference genome used: GRCh37

## Sample size selection
- Silver set samples were used for the study.
- Files used: analysis_silver_set & analysis_rna_silver_set (If a sample had multiple aliquots, the silver set represented a single aliquot per sample)
- `analysis_silver_set`: Contains DNA pairs that pass fingerprinting and coverage QC, representing a single aliquot per patient (maximal timepoint for patients with multiple recurrences).
- `rna_silver_set`: Lists RNA pairs passing fingerprinting and clinical QC, with the maximal timepoint taken for patients with multiple recurrences.
- If a patient underwent more than two surgical procedures, additional longitudinal aliquots were chosen. However, for patients with only one identical sample, the aliquots were excluded. In cases where samples had multi-sector information, one aliquot per silver set was selected (rather than merging aliquots) to better align with the Synapse instance.
- Overall Cohort Size: 694 samples (629 DNA and 355 RNA samples) from 329 patients.

## Clinical data
- Patient-Level Data: `clinical_cases`
- Sample-Level Data: `clinical_surgeries`, `biospecimen_sample_types`, `biospecimen_samples`, `biospecimen_aliquots`, `analysis_estimate_purity`
- Patient and sample files were subset to silver set samples.

## Timeline Data
- Cumulative time elapsed in months between surgeries is provided. For the initial surgery, t0 is consistently set to 0, and subsequent relative events are added into the timeline.
- File used: `clinical_surgeries`
- Surgery and Treatment (Chemo, Radiation and Targeted therapy) events relative to first surgery are added to timeline.

## Mutation Data
- Files used: `variants_passgeno_20220531.csv.gz` and `variants_anno_20220531.csv.gz`
- Mutation data was processed, filtered for variants that passed filters in single-sample Mutect2 mode.
- Each row in variants_passgeno table represents a single variant detected using multi-sample Mutect2 and is reported for a given sample. Thus, for each patient the variant information is listed for all samples (including normal blood). Single-sample mutation calls were overlaid on the multi-sample calls to infer whether variants were called in individual samples.
- The ssm2_pass_call column is a flag indicating whether the variant was called and passed filters in single-sample Mutect2 mode. Selected only the variants that passed filters and called in single-sample Mutect2 mode.
- The file is processed as : https://gist.github.com/rmadupuri/5e78309792181dbb1cdec88475f5afb5
- Each row in the variants_anno file indicates further annotations for the variants in the passgeno file.
- The anno file is merged to passgeno file as: https://gist.github.com/rmadupuri/757197f0d0cef4871254e5b2ffd51d4c
- IDH1, IDH2, and TERTp variants were force-filtered based on clinical tests and alt_counts.
- All the IDH1, IDH2 variants with the sample IDH_status = 'IDHmut' and the variants alt_count > 0, all TERT variants with alt_count > 0 are picked.
- https://gist.github.com/rmadupuri/92bbb19859b74d057088dbb4943e7a67
- Check for duplicate variants and remove : https://gist.github.com/rmadupuri/570782777e3e4e44ac31aae3f94cdf4a
- The variants were annotated using Genome Nexus.

## RNA-seq Expression
- File used: `gene_tpm_matrix_all_samples.tsv`
- Samples were filtered based on the rna_silver_set
- Expression data was log transformed and z-scores were calculated using https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/zscores/zscores_relative_allsamples (with -l, -e tags).

## Methylation Data
- Files used: `betas.450.tsv`, `betas.epic.tsv`, `betas.merged.tsv` (methylation beta-values per CpG probe ID (row) by aliquot (column) is given)
- Added methylation profiles for the Illumina 450K array, EPIC array, and merged 450K and EPIC arrays.

## Clinical data remapping
- Original clinical data columns were remapped to new column names for patients and samples, as listed in the table below.

Original column | Renamed column | Patient/sample
-- | -- | --
case_barcode | PATIENT_ID | Patient
case_project | CASE_PROJECT | Patient
Case_source | TISSUE_SOURCE | Patient
case_sex | SEX | Patient
case_age_diagnosis_years | AGE | Patient
case_vital_status | OS_STATUS | Patient
case_overall_survival_mo | OS_MONTHS | Patient
case_barcode | PATIENT_ID | Sample
sample_barcode | SAMPLE_ID | Sample
grade | TUMOR_GRADE | Sample
who_classification | TUMOR_CLASSIFICATION | Sample
histology | HISTOLOGY | Sample
idh_status | IDH_STATUS | Sample
codel_status | CODEL_STATUS | Sample
surgery_type | SURGERY_TYPE | Sample
surgery_indication | SURGERY_INDICATION | Sample
surgery_extent_of_resection | SURGERY_EXTENT_OF_RESECTION | Sample
surgery_laterality | SURGERY_LATERALITY | Sample
surgery_location | SURGERY_LOCATION | Sample
treatment_tmz | TREATMENT_TMZ | Sample
treatment_tmz_cycles | TREATMENT_TMZ_CYCLES | Sample
treatment_tmz_cycles_6 | TREATMENT_TMZ_CYCLES_6 | Sample
treatment_concurrent_tmz | TREATMENT_CONCURRENT_TMZ | Sample
treatment_radiotherapy | TREATMENT_RADIOTHERAPY | Sample
treatment_radiation_dose_gy | TREATMENT_RADIATION_DOSE_GY | Sample
idh_codel_subtype | IDH_CODEL_STATUS | Sample
treatment_alkylating_agent | TREATMENT_ALKYLATING_AGENT | Sample
mgmt_methylation_method | MGMT_METHYLATION_METHOD | Sample
aliquot_barcode | ALIQUOT_BARCODE | Sample
aliquot_analysis_type | ALIQUOT_ANALYSIS_TYPE | Sample
aliquot_portion | ALIQUOT_PORTION_ID | Sample
aliquot_batch | ALIQUOT_BATCH | Sample
sample_type_description | SAMPLE_TYPE | Sample


6 changes: 6 additions & 0 deletions public/difg_glass/case_lists/cases_RNA_Seq_mRNA.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
cancer_study_identifier: difg_glass
stable_id: difg_glass_rna_seq_mrna
case_list_name: Samples with mRNA data (RNA Seq)
case_list_description: Samples with mRNA expression data (355 samples)
case_list_category: all_cases_with_mrna_rnaseq_data
case_list_ids: GLSS-19-0267-R1 GLSS-19-0267-TP GLSS-19-0268-R1 GLSS-19-0268-TP GLSS-19-0269-R1 GLSS-19-0269-TP GLSS-19-0271-R1 GLSS-19-0271-TP GLSS-19-0272-R1 GLSS-19-0272-TP GLSS-19-0273-R1 GLSS-19-0273-TP GLSS-19-0274-R1 GLSS-19-0274-TP GLSS-19-0275-R1 GLSS-19-0275-TP GLSS-19-0278-R1 GLSS-19-0278-TP GLSS-19-0279-R1 GLSS-19-0279-R2 GLSS-19-0279-TP GLSS-19-0280-R1 GLSS-19-0280-TP GLSS-CU-P003-R1 GLSS-CU-P003-TP GLSS-CU-P013-R1 GLSS-CU-P013-R2 GLSS-CU-P013-R3 GLSS-CU-P020-R1 GLSS-CU-P020-R2 GLSS-CU-P020-R3 GLSS-CU-P021-R2 GLSS-CU-P021-TP GLSS-CU-P046-R1 GLSS-CU-P046-TP GLSS-CU-P053-R2 GLSS-CU-P053-TP GLSS-CU-P055-R1 GLSS-CU-P055-TP GLSS-CU-P056-R1 GLSS-CU-P056-R2 GLSS-CU-P069-R1 GLSS-CU-P069-TP GLSS-CU-P100-R1 GLSS-CU-P100-R2 GLSS-CU-P101-R1 GLSS-CU-P101-TP GLSS-CU-P102-R1 GLSS-CU-P102-TP GLSS-CU-P104-R1 GLSS-CU-P104-R2 GLSS-CU-R001-R1 GLSS-CU-R001-TP GLSS-CU-R002-R1 GLSS-CU-R002-TP GLSS-CU-R003-R1 GLSS-CU-R003-TP GLSS-CU-R004-R1 GLSS-CU-R004-TP GLSS-CU-R005-R1 GLSS-CU-R005-TP GLSS-CU-R006-R1 GLSS-CU-R006-TP GLSS-CU-R007-R1 GLSS-CU-R007-TP GLSS-CU-R008-R1 GLSS-CU-R008-TP GLSS-CU-R009-R1 GLSS-CU-R009-TP GLSS-CU-R010-R1 GLSS-CU-R010-TP GLSS-CU-R012-R1 GLSS-CU-R012-TP GLSS-CU-R019-R1 GLSS-CU-R019-TP GLSS-HF-1DDC-R1 GLSS-HF-1DDC-TP GLSS-HF-2548-R1 GLSS-HF-2548-TP GLSS-HF-2829-R1 GLSS-HF-2829-TP GLSS-HF-2869-R3 GLSS-HF-2869-TP GLSS-HF-2919-R1 GLSS-HF-2919-TP GLSS-HF-2934-R1 GLSS-HF-2934-TP GLSS-HF-3050-R1 GLSS-HF-3050-TP GLSS-HF-3081-R2 GLSS-HF-3081-TP GLSS-HF-3162-R1 GLSS-HF-3162-TP GLSS-HF-4B46-R1 GLSS-HF-4B46-TP GLSS-HF-4D9A-R1 GLSS-HF-4D9A-TP GLSS-HF-4F0A-R1 GLSS-HF-4F0A-R2 GLSS-HF-4F0A-TP GLSS-HF-4FBF-R1 GLSS-HF-4FBF-R2 GLSS-HF-50F3-R2 GLSS-HF-50F3-TP GLSS-HF-57AE-R1 GLSS-HF-57AE-R2 GLSS-HF-57AE-TP GLSS-HF-6504-R1 GLSS-HF-6504-TP GLSS-HF-6658-R1 GLSS-HF-6658-TP GLSS-HF-8FCD-R1 GLSS-HF-8FCD-TP GLSS-HF-9A7A-R1 GLSS-HF-9A7A-R2 GLSS-HF-9A7A-TP GLSS-HF-B92C-R2 GLSS-HF-B92C-TP GLSS-HF-B972-R1 GLSS-HF-B972-TP GLSS-HF-BCDE-R3 GLSS-HF-BCDE-TP GLSS-HF-CB66-R1 GLSS-HF-CB66-R2 GLSS-HF-CB66-TP GLSS-HF-DE05-R1 GLSS-HF-DE05-TP GLSS-HF-DF35-R1 GLSS-HF-DF35-TP GLSS-HF-EE74-R1 GLSS-HF-EE74-TP GLSS-HF-EE77-R1 GLSS-HF-EE77-R2 GLSS-HF-EE77-R3 GLSS-HK-0001-R1 GLSS-HK-0001-TP GLSS-HK-0002-R1 GLSS-HK-0002-TP GLSS-HK-0003-R1 GLSS-HK-0003-TP GLSS-HK-0004-R1 GLSS-HK-0004-TP GLSS-HK-0005-R1 GLSS-HK-0005-TP GLSS-LU-00B9-R1 GLSS-LU-00B9-TP GLSS-LU-00C1-R1 GLSS-LU-00C1-TP GLSS-LU-00C2-R1 GLSS-LU-00C2-TP GLSS-LU-0B10-R1 GLSS-LU-0B10-TP GLSS-LU-0B12-R1 GLSS-LU-0B12-TP GLSS-LU-0B13-R1 GLSS-LU-0B13-TP GLSS-LX-0083-R1 GLSS-LX-0083-R2 GLSS-LX-0267-R1 GLSS-LX-0267-TP GLSS-LX-0304-R1 GLSS-LX-0304-TP GLSS-LX-0357-R1 GLSS-LX-0357-R2 GLSS-LX-0357-R3 GLSS-LX-0561-R1 GLSS-LX-0561-TP GLSS-MD-0003-R2 GLSS-MD-0003-R3 GLSS-MD-0006-R1 GLSS-MD-0006-TP GLSS-MD-0011-R1 GLSS-MD-0011-TP GLSS-MD-0012-R1 GLSS-MD-0012-TP GLSS-MD-0017-R1 GLSS-MD-0017-R2 GLSS-MD-0020-R1 GLSS-MD-0020-TP GLSS-MD-0022-R1 GLSS-MD-0022-TP GLSS-MD-0026-R1 GLSS-MD-0026-TP GLSS-MD-0027-R1 GLSS-MD-0027-R2 GLSS-MD-0027-TP GLSS-MD-0032-R1 GLSS-MD-0032-TP GLSS-MD-0035-R1 GLSS-MD-0035-TP GLSS-MD-0042-R1 GLSS-MD-0042-TP GLSS-MD-0049-R1 GLSS-MD-0049-TP GLSS-MD-LP04-R1 GLSS-MD-LP04-TP GLSS-SF-0003-R1 GLSS-SF-0003-TP GLSS-SF-0014-R1 GLSS-SF-0014-TP GLSS-SF-0022-R1 GLSS-SF-0022-TP GLSS-SF-0037-R1 GLSS-SF-0037-TP GLSS-SF-0038-R1 GLSS-SF-0038-TP GLSS-SM-R056-R2 GLSS-SM-R056-R3 GLSS-SM-R056-TP GLSS-SM-R060-R1 GLSS-SM-R060-R3 GLSS-SM-R060-TP GLSS-SM-R061-R1 GLSS-SM-R061-TP GLSS-SM-R063-R1 GLSS-SM-R063-TP GLSS-SM-R064-R1 GLSS-SM-R064-R2 GLSS-SM-R064-TP GLSS-SM-R065-R1 GLSS-SM-R065-TP GLSS-SM-R066-R1 GLSS-SM-R066-TP GLSS-SM-R067-R1 GLSS-SM-R067-TP GLSS-SM-R068-R1 GLSS-SM-R068-TP GLSS-SM-R070-R1 GLSS-SM-R070-TP GLSS-SM-R071-R1 GLSS-SM-R071-TP GLSS-SM-R072-R1 GLSS-SM-R072-TP GLSS-SM-R076-R1 GLSS-SM-R076-R3 GLSS-SM-R078-R1 GLSS-SM-R078-R2 GLSS-SM-R080-R1 GLSS-SM-R080-TP GLSS-SM-R081-R1 GLSS-SM-R081-TP GLSS-SM-R082-R1 GLSS-SM-R082-TP GLSS-SM-R083-R1 GLSS-SM-R083-TP GLSS-SM-R085-R1 GLSS-SM-R085-TP GLSS-SM-R087-R1 GLSS-SM-R087-TP GLSS-SM-R088-R1 GLSS-SM-R088-TP GLSS-SM-R091-R1 GLSS-SM-R091-TP GLSS-SM-R093-R1 GLSS-SM-R093-TP GLSS-SM-R095-R1 GLSS-SM-R095-TP GLSS-SM-R099-R1 GLSS-SM-R099-TP GLSS-SM-R100-R1 GLSS-SM-R100-TP GLSS-SM-R101-R1 GLSS-SM-R101-TP GLSS-SM-R102-R1 GLSS-SM-R102-TP GLSS-SM-R103-R1 GLSS-SM-R103-TP GLSS-SM-R104-R1 GLSS-SM-R104-TP GLSS-SM-R106-R1 GLSS-SM-R106-TP GLSS-SM-R107-R1 GLSS-SM-R107-TP GLSS-SM-R108-R1 GLSS-SM-R108-TP GLSS-SM-R109-R1 GLSS-SM-R109-TP GLSS-SM-R110-R1 GLSS-SM-R110-TP GLSS-SM-R111-R1 GLSS-SM-R111-TP GLSS-SM-R112-R1 GLSS-SM-R112-TP GLSS-SN-0001-R1 GLSS-SN-0001-TP GLSS-SN-0002-R1 GLSS-SN-0002-TP GLSS-SN-0003-R1 GLSS-SN-0003-TP GLSS-SN-0004-R1 GLSS-SN-0004-TP GLSS-SN-0006-R1 GLSS-SN-0006-TP GLSS-SN-0007-R1 GLSS-SN-0007-R2 GLSS-SN-0008-R1 GLSS-SN-0008-TP GLSS-SN-0009-R1 GLSS-SN-0009-TP GLSS-SN-0010-R1 GLSS-SN-0010-TP GLSS-SN-0013-R2 GLSS-SN-0013-TP GLSS-SN-0015-R2 GLSS-SN-0015-TP GLSS-SN-0016-R1 GLSS-SN-0016-TP GLSS-SN-0017-R1 GLSS-SN-0017-TP TCGA-06-0125-R1 TCGA-06-0125-TP TCGA-06-0190-R1 TCGA-06-0190-TP TCGA-06-0210-R1 TCGA-06-0210-TP TCGA-06-0211-R1 TCGA-06-0211-TP TCGA-14-1034-R1 TCGA-14-1034-TP TCGA-19-0957-R1 TCGA-19-0957-TP TCGA-19-4065-R1 TCGA-19-4065-TP TCGA-DH-A669-R1 TCGA-DH-A669-TP TCGA-DU-5870-R1 TCGA-DU-5870-TP TCGA-DU-5872-R1 TCGA-DU-5872-TP TCGA-DU-6397-R1 TCGA-DU-6397-TP TCGA-DU-6404-R1 TCGA-DU-6404-R2 TCGA-DU-6404-TP TCGA-DU-6407-R1 TCGA-DU-6407-R2 TCGA-DU-6407-TP TCGA-DU-7304-R1 TCGA-DU-7304-TP TCGA-FG-5963-R1 TCGA-FG-5963-TP TCGA-FG-5965-R1 TCGA-FG-5965-R2 TCGA-FG-5965-TP TCGA-FG-A4MT-R1 TCGA-FG-A4MT-TP TCGA-TM-A7CF-R1 TCGA-TM-A7CF-TP TCGA-TQ-A7RK-R1 TCGA-TQ-A7RK-R2 TCGA-TQ-A7RK-TP TCGA-TQ-A7RV-R1 TCGA-TQ-A7RV-TP TCGA-TQ-A8XE-R1 TCGA-TQ-A8XE-TP
Loading

0 comments on commit 3dc0f1f

Please sign in to comment.