-
Notifications
You must be signed in to change notification settings - Fork 121
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into public_release_fixes
- Loading branch information
Showing
33 changed files
with
227 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
The data are available under the ODC Open Database License (ODbL)(http://opendatacommons.org/licenses/odbl/1.0/) (summary available here: http://www.opendatacommons.org/licenses/odbl/1-0/summary/): you are free to share and modify the data so long as you attribute any public use of the database, or works produced from the database; keep the resulting data-sets open; and offer your shared or adapted version of the data-set under the same ODbL license. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
# Curation and transformation of GLASS Dataset: | ||
|
||
## General Information | ||
- Data Source: [GLASS data on Synapse](https://www.synapse.org/#!Synapse:syn17038081/wiki/585622) | ||
- Reference publication: [PubMed Reference](https://pubmed.ncbi.nlm.nih.gov/35649412/) | ||
- The data here represents the GLASS `2022-05-31` release version. | ||
- Reference genome used: GRCh37 | ||
|
||
## Sample size selection | ||
- Silver set samples were used for the study. | ||
- Files used: analysis_silver_set & analysis_rna_silver_set (If a sample had multiple aliquots, the silver set represented a single aliquot per sample) | ||
- `analysis_silver_set`: Contains DNA pairs that pass fingerprinting and coverage QC, representing a single aliquot per patient (maximal timepoint for patients with multiple recurrences). | ||
- `rna_silver_set`: Lists RNA pairs passing fingerprinting and clinical QC, with the maximal timepoint taken for patients with multiple recurrences. | ||
- If a patient underwent more than two surgical procedures, additional longitudinal aliquots were chosen. However, for patients with only one identical sample, the aliquots were excluded. In cases where samples had multi-sector information, one aliquot per silver set was selected (rather than merging aliquots) to better align with the Synapse instance. | ||
- Overall Cohort Size: 694 samples (629 DNA and 355 RNA samples) from 329 patients. | ||
|
||
## Clinical data | ||
- Patient-Level Data: `clinical_cases` | ||
- Sample-Level Data: `clinical_surgeries`, `biospecimen_sample_types`, `biospecimen_samples`, `biospecimen_aliquots`, `analysis_estimate_purity` | ||
- Patient and sample files were subset to silver set samples. | ||
|
||
## Timeline Data | ||
- Cumulative time elapsed in months between surgeries is provided. For the initial surgery, t0 is consistently set to 0, and subsequent relative events are added into the timeline. | ||
- File used: `clinical_surgeries` | ||
- Surgery and Treatment (Chemo, Radiation and Targeted therapy) events relative to first surgery are added to timeline. | ||
|
||
## Mutation Data | ||
- Files used: `variants_passgeno_20220531.csv.gz` and `variants_anno_20220531.csv.gz` | ||
- Mutation data was processed, filtered for variants that passed filters in single-sample Mutect2 mode. | ||
- Each row in variants_passgeno table represents a single variant detected using multi-sample Mutect2 and is reported for a given sample. Thus, for each patient the variant information is listed for all samples (including normal blood). Single-sample mutation calls were overlaid on the multi-sample calls to infer whether variants were called in individual samples. | ||
- The ssm2_pass_call column is a flag indicating whether the variant was called and passed filters in single-sample Mutect2 mode. Selected only the variants that passed filters and called in single-sample Mutect2 mode. | ||
- The file is processed as : https://gist.github.com/rmadupuri/5e78309792181dbb1cdec88475f5afb5 | ||
- Each row in the variants_anno file indicates further annotations for the variants in the passgeno file. | ||
- The anno file is merged to passgeno file as: https://gist.github.com/rmadupuri/757197f0d0cef4871254e5b2ffd51d4c | ||
- IDH1, IDH2, and TERTp variants were force-filtered based on clinical tests and alt_counts. | ||
- All the IDH1, IDH2 variants with the sample IDH_status = 'IDHmut' and the variants alt_count > 0, all TERT variants with alt_count > 0 are picked. | ||
- https://gist.github.com/rmadupuri/92bbb19859b74d057088dbb4943e7a67 | ||
- Check for duplicate variants and remove : https://gist.github.com/rmadupuri/570782777e3e4e44ac31aae3f94cdf4a | ||
- The variants were annotated using Genome Nexus. | ||
|
||
## RNA-seq Expression | ||
- File used: `gene_tpm_matrix_all_samples.tsv` | ||
- Samples were filtered based on the rna_silver_set | ||
- Expression data was log transformed and z-scores were calculated using https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/zscores/zscores_relative_allsamples (with -l, -e tags). | ||
|
||
## Methylation Data | ||
- Files used: `betas.450.tsv`, `betas.epic.tsv`, `betas.merged.tsv` (methylation beta-values per CpG probe ID (row) by aliquot (column) is given) | ||
- Added methylation profiles for the Illumina 450K array, EPIC array, and merged 450K and EPIC arrays. | ||
|
||
## Clinical data remapping | ||
- Original clinical data columns were remapped to new column names for patients and samples, as listed in the table below. | ||
|
||
Original column | Renamed column | Patient/sample | ||
-- | -- | -- | ||
case_barcode | PATIENT_ID | Patient | ||
case_project | CASE_PROJECT | Patient | ||
Case_source | TISSUE_SOURCE | Patient | ||
case_sex | SEX | Patient | ||
case_age_diagnosis_years | AGE | Patient | ||
case_vital_status | OS_STATUS | Patient | ||
case_overall_survival_mo | OS_MONTHS | Patient | ||
case_barcode | PATIENT_ID | Sample | ||
sample_barcode | SAMPLE_ID | Sample | ||
grade | TUMOR_GRADE | Sample | ||
who_classification | TUMOR_CLASSIFICATION | Sample | ||
histology | HISTOLOGY | Sample | ||
idh_status | IDH_STATUS | Sample | ||
codel_status | CODEL_STATUS | Sample | ||
surgery_type | SURGERY_TYPE | Sample | ||
surgery_indication | SURGERY_INDICATION | Sample | ||
surgery_extent_of_resection | SURGERY_EXTENT_OF_RESECTION | Sample | ||
surgery_laterality | SURGERY_LATERALITY | Sample | ||
surgery_location | SURGERY_LOCATION | Sample | ||
treatment_tmz | TREATMENT_TMZ | Sample | ||
treatment_tmz_cycles | TREATMENT_TMZ_CYCLES | Sample | ||
treatment_tmz_cycles_6 | TREATMENT_TMZ_CYCLES_6 | Sample | ||
treatment_concurrent_tmz | TREATMENT_CONCURRENT_TMZ | Sample | ||
treatment_radiotherapy | TREATMENT_RADIOTHERAPY | Sample | ||
treatment_radiation_dose_gy | TREATMENT_RADIATION_DOSE_GY | Sample | ||
idh_codel_subtype | IDH_CODEL_STATUS | Sample | ||
treatment_alkylating_agent | TREATMENT_ALKYLATING_AGENT | Sample | ||
mgmt_methylation_method | MGMT_METHYLATION_METHOD | Sample | ||
aliquot_barcode | ALIQUOT_BARCODE | Sample | ||
aliquot_analysis_type | ALIQUOT_ANALYSIS_TYPE | Sample | ||
aliquot_portion | ALIQUOT_PORTION_ID | Sample | ||
aliquot_batch | ALIQUOT_BATCH | Sample | ||
sample_type_description | SAMPLE_TYPE | Sample | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
cancer_study_identifier: difg_glass | ||
stable_id: difg_glass_rna_seq_mrna | ||
case_list_name: Samples with mRNA data (RNA Seq) | ||
case_list_description: Samples with mRNA expression data (355 samples) | ||
case_list_category: all_cases_with_mrna_rnaseq_data | ||
case_list_ids: GLSS-19-0267-R1 GLSS-19-0267-TP GLSS-19-0268-R1 GLSS-19-0268-TP GLSS-19-0269-R1 GLSS-19-0269-TP GLSS-19-0271-R1 GLSS-19-0271-TP GLSS-19-0272-R1 GLSS-19-0272-TP GLSS-19-0273-R1 GLSS-19-0273-TP GLSS-19-0274-R1 GLSS-19-0274-TP GLSS-19-0275-R1 GLSS-19-0275-TP GLSS-19-0278-R1 GLSS-19-0278-TP GLSS-19-0279-R1 GLSS-19-0279-R2 GLSS-19-0279-TP GLSS-19-0280-R1 GLSS-19-0280-TP GLSS-CU-P003-R1 GLSS-CU-P003-TP GLSS-CU-P013-R1 GLSS-CU-P013-R2 GLSS-CU-P013-R3 GLSS-CU-P020-R1 GLSS-CU-P020-R2 GLSS-CU-P020-R3 GLSS-CU-P021-R2 GLSS-CU-P021-TP GLSS-CU-P046-R1 GLSS-CU-P046-TP GLSS-CU-P053-R2 GLSS-CU-P053-TP GLSS-CU-P055-R1 GLSS-CU-P055-TP GLSS-CU-P056-R1 GLSS-CU-P056-R2 GLSS-CU-P069-R1 GLSS-CU-P069-TP GLSS-CU-P100-R1 GLSS-CU-P100-R2 GLSS-CU-P101-R1 GLSS-CU-P101-TP GLSS-CU-P102-R1 GLSS-CU-P102-TP GLSS-CU-P104-R1 GLSS-CU-P104-R2 GLSS-CU-R001-R1 GLSS-CU-R001-TP GLSS-CU-R002-R1 GLSS-CU-R002-TP GLSS-CU-R003-R1 GLSS-CU-R003-TP GLSS-CU-R004-R1 GLSS-CU-R004-TP GLSS-CU-R005-R1 GLSS-CU-R005-TP GLSS-CU-R006-R1 GLSS-CU-R006-TP GLSS-CU-R007-R1 GLSS-CU-R007-TP GLSS-CU-R008-R1 GLSS-CU-R008-TP GLSS-CU-R009-R1 GLSS-CU-R009-TP GLSS-CU-R010-R1 GLSS-CU-R010-TP GLSS-CU-R012-R1 GLSS-CU-R012-TP GLSS-CU-R019-R1 GLSS-CU-R019-TP GLSS-HF-1DDC-R1 GLSS-HF-1DDC-TP GLSS-HF-2548-R1 GLSS-HF-2548-TP GLSS-HF-2829-R1 GLSS-HF-2829-TP GLSS-HF-2869-R3 GLSS-HF-2869-TP GLSS-HF-2919-R1 GLSS-HF-2919-TP GLSS-HF-2934-R1 GLSS-HF-2934-TP GLSS-HF-3050-R1 GLSS-HF-3050-TP GLSS-HF-3081-R2 GLSS-HF-3081-TP GLSS-HF-3162-R1 GLSS-HF-3162-TP GLSS-HF-4B46-R1 GLSS-HF-4B46-TP GLSS-HF-4D9A-R1 GLSS-HF-4D9A-TP GLSS-HF-4F0A-R1 GLSS-HF-4F0A-R2 GLSS-HF-4F0A-TP GLSS-HF-4FBF-R1 GLSS-HF-4FBF-R2 GLSS-HF-50F3-R2 GLSS-HF-50F3-TP GLSS-HF-57AE-R1 GLSS-HF-57AE-R2 GLSS-HF-57AE-TP GLSS-HF-6504-R1 GLSS-HF-6504-TP GLSS-HF-6658-R1 GLSS-HF-6658-TP GLSS-HF-8FCD-R1 GLSS-HF-8FCD-TP GLSS-HF-9A7A-R1 GLSS-HF-9A7A-R2 GLSS-HF-9A7A-TP GLSS-HF-B92C-R2 GLSS-HF-B92C-TP GLSS-HF-B972-R1 GLSS-HF-B972-TP GLSS-HF-BCDE-R3 GLSS-HF-BCDE-TP GLSS-HF-CB66-R1 GLSS-HF-CB66-R2 GLSS-HF-CB66-TP GLSS-HF-DE05-R1 GLSS-HF-DE05-TP GLSS-HF-DF35-R1 GLSS-HF-DF35-TP GLSS-HF-EE74-R1 GLSS-HF-EE74-TP GLSS-HF-EE77-R1 GLSS-HF-EE77-R2 GLSS-HF-EE77-R3 GLSS-HK-0001-R1 GLSS-HK-0001-TP GLSS-HK-0002-R1 GLSS-HK-0002-TP GLSS-HK-0003-R1 GLSS-HK-0003-TP GLSS-HK-0004-R1 GLSS-HK-0004-TP GLSS-HK-0005-R1 GLSS-HK-0005-TP GLSS-LU-00B9-R1 GLSS-LU-00B9-TP GLSS-LU-00C1-R1 GLSS-LU-00C1-TP GLSS-LU-00C2-R1 GLSS-LU-00C2-TP GLSS-LU-0B10-R1 GLSS-LU-0B10-TP GLSS-LU-0B12-R1 GLSS-LU-0B12-TP GLSS-LU-0B13-R1 GLSS-LU-0B13-TP GLSS-LX-0083-R1 GLSS-LX-0083-R2 GLSS-LX-0267-R1 GLSS-LX-0267-TP GLSS-LX-0304-R1 GLSS-LX-0304-TP GLSS-LX-0357-R1 GLSS-LX-0357-R2 GLSS-LX-0357-R3 GLSS-LX-0561-R1 GLSS-LX-0561-TP GLSS-MD-0003-R2 GLSS-MD-0003-R3 GLSS-MD-0006-R1 GLSS-MD-0006-TP GLSS-MD-0011-R1 GLSS-MD-0011-TP GLSS-MD-0012-R1 GLSS-MD-0012-TP GLSS-MD-0017-R1 GLSS-MD-0017-R2 GLSS-MD-0020-R1 GLSS-MD-0020-TP GLSS-MD-0022-R1 GLSS-MD-0022-TP GLSS-MD-0026-R1 GLSS-MD-0026-TP GLSS-MD-0027-R1 GLSS-MD-0027-R2 GLSS-MD-0027-TP GLSS-MD-0032-R1 GLSS-MD-0032-TP GLSS-MD-0035-R1 GLSS-MD-0035-TP GLSS-MD-0042-R1 GLSS-MD-0042-TP GLSS-MD-0049-R1 GLSS-MD-0049-TP GLSS-MD-LP04-R1 GLSS-MD-LP04-TP GLSS-SF-0003-R1 GLSS-SF-0003-TP GLSS-SF-0014-R1 GLSS-SF-0014-TP GLSS-SF-0022-R1 GLSS-SF-0022-TP GLSS-SF-0037-R1 GLSS-SF-0037-TP GLSS-SF-0038-R1 GLSS-SF-0038-TP GLSS-SM-R056-R2 GLSS-SM-R056-R3 GLSS-SM-R056-TP GLSS-SM-R060-R1 GLSS-SM-R060-R3 GLSS-SM-R060-TP GLSS-SM-R061-R1 GLSS-SM-R061-TP GLSS-SM-R063-R1 GLSS-SM-R063-TP GLSS-SM-R064-R1 GLSS-SM-R064-R2 GLSS-SM-R064-TP GLSS-SM-R065-R1 GLSS-SM-R065-TP GLSS-SM-R066-R1 GLSS-SM-R066-TP GLSS-SM-R067-R1 GLSS-SM-R067-TP GLSS-SM-R068-R1 GLSS-SM-R068-TP GLSS-SM-R070-R1 GLSS-SM-R070-TP GLSS-SM-R071-R1 GLSS-SM-R071-TP GLSS-SM-R072-R1 GLSS-SM-R072-TP GLSS-SM-R076-R1 GLSS-SM-R076-R3 GLSS-SM-R078-R1 GLSS-SM-R078-R2 GLSS-SM-R080-R1 GLSS-SM-R080-TP GLSS-SM-R081-R1 GLSS-SM-R081-TP GLSS-SM-R082-R1 GLSS-SM-R082-TP GLSS-SM-R083-R1 GLSS-SM-R083-TP GLSS-SM-R085-R1 GLSS-SM-R085-TP GLSS-SM-R087-R1 GLSS-SM-R087-TP GLSS-SM-R088-R1 GLSS-SM-R088-TP GLSS-SM-R091-R1 GLSS-SM-R091-TP GLSS-SM-R093-R1 GLSS-SM-R093-TP GLSS-SM-R095-R1 GLSS-SM-R095-TP GLSS-SM-R099-R1 GLSS-SM-R099-TP GLSS-SM-R100-R1 GLSS-SM-R100-TP GLSS-SM-R101-R1 GLSS-SM-R101-TP GLSS-SM-R102-R1 GLSS-SM-R102-TP GLSS-SM-R103-R1 GLSS-SM-R103-TP GLSS-SM-R104-R1 GLSS-SM-R104-TP GLSS-SM-R106-R1 GLSS-SM-R106-TP GLSS-SM-R107-R1 GLSS-SM-R107-TP GLSS-SM-R108-R1 GLSS-SM-R108-TP GLSS-SM-R109-R1 GLSS-SM-R109-TP GLSS-SM-R110-R1 GLSS-SM-R110-TP GLSS-SM-R111-R1 GLSS-SM-R111-TP GLSS-SM-R112-R1 GLSS-SM-R112-TP GLSS-SN-0001-R1 GLSS-SN-0001-TP GLSS-SN-0002-R1 GLSS-SN-0002-TP GLSS-SN-0003-R1 GLSS-SN-0003-TP GLSS-SN-0004-R1 GLSS-SN-0004-TP GLSS-SN-0006-R1 GLSS-SN-0006-TP GLSS-SN-0007-R1 GLSS-SN-0007-R2 GLSS-SN-0008-R1 GLSS-SN-0008-TP GLSS-SN-0009-R1 GLSS-SN-0009-TP GLSS-SN-0010-R1 GLSS-SN-0010-TP GLSS-SN-0013-R2 GLSS-SN-0013-TP GLSS-SN-0015-R2 GLSS-SN-0015-TP GLSS-SN-0016-R1 GLSS-SN-0016-TP GLSS-SN-0017-R1 GLSS-SN-0017-TP TCGA-06-0125-R1 TCGA-06-0125-TP TCGA-06-0190-R1 TCGA-06-0190-TP TCGA-06-0210-R1 TCGA-06-0210-TP TCGA-06-0211-R1 TCGA-06-0211-TP TCGA-14-1034-R1 TCGA-14-1034-TP TCGA-19-0957-R1 TCGA-19-0957-TP TCGA-19-4065-R1 TCGA-19-4065-TP TCGA-DH-A669-R1 TCGA-DH-A669-TP TCGA-DU-5870-R1 TCGA-DU-5870-TP TCGA-DU-5872-R1 TCGA-DU-5872-TP TCGA-DU-6397-R1 TCGA-DU-6397-TP TCGA-DU-6404-R1 TCGA-DU-6404-R2 TCGA-DU-6404-TP TCGA-DU-6407-R1 TCGA-DU-6407-R2 TCGA-DU-6407-TP TCGA-DU-7304-R1 TCGA-DU-7304-TP TCGA-FG-5963-R1 TCGA-FG-5963-TP TCGA-FG-5965-R1 TCGA-FG-5965-R2 TCGA-FG-5965-TP TCGA-FG-A4MT-R1 TCGA-FG-A4MT-TP TCGA-TM-A7CF-R1 TCGA-TM-A7CF-TP TCGA-TQ-A7RK-R1 TCGA-TQ-A7RK-R2 TCGA-TQ-A7RK-TP TCGA-TQ-A7RV-R1 TCGA-TQ-A7RV-TP TCGA-TQ-A8XE-R1 TCGA-TQ-A8XE-TP |
Oops, something went wrong.