Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need to update samples fillout workflow output #84

Open
2 of 3 tasks
stevekm opened this issue Jan 4, 2022 · 4 comments
Open
2 of 3 tasks

need to update samples fillout workflow output #84

stevekm opened this issue Jan 4, 2022 · 4 comments
Labels
enhancement New feature or request high priority revisit this later non-critical, non-breaking change to consider

Comments

@stevekm
Copy link
Member

stevekm commented Jan 4, 2022

@timosong @svural

@timosong
Copy link
Collaborator

timosong commented Apr 7, 2022

need to generate data mutations uncalled output (#63)
Current output should be 4 maf files:

  1. "unfiltered" maf. Has no filters that have distinction between clinical and research samples. This should end up in analysis folder
  2. "filtered" maf. Germline filters, has distinction between clinical and research. This should end up in analysis folder
  3. data_mutations_extended.txt. This file has fewer columns than file 2 filtered. This should end up in portal folder
  4. data_mutations_uncalled.txt. This file has fewer columns than file 2 filtered. This should end up in portal folder

Number of rows of file#3 and file#4 equal number of rows in file 2.

need to update all portal files to include data for all new DMP samples included in output.

The following files need to be updated.

  1. data_mutations_extended.txt. Need to find out what columns can be filled from DMP data and what can be left blank. The only necessary columns are GENE_PANEL, PATIENT_ID, SAMPLE_ID.
  2. case_lists files need to have new DMP ids appended. (tab delimited at end of case_list_ids)
    case_lists/cases_all.txt
    case_lists/cases_cnaseq.txt
    case_lists/cases_cna.txt
    case_lists/cases_sequenced.txt

new files need to be created:

  1. meta_mutations_uncalled.txt
    cancer_study_identifier: pilot_msk_melpcm
    data_filename: data_mutations_extended.txt
    datatype: MAF
    genetic_alteration_type: MUTATION_EXTENDED
    profile_description: Mutation data
    profile_name: Mutations
    show_profile_in_analysis_tab: true
    stable_id: mutations
    namespaces: ASCN

Caveats.
Nice to have is DMP Copy number merged in if fillout was ever performed.
This also means that there may be DMP ids present in copy number, but not in fillout and vice versa. So we need to create a union of all dmp ids and then add that to the above files (case_lists, data_clinical_sample)

@stevekm stevekm added the enhancement New feature or request label Apr 7, 2022
@stevekm
Copy link
Member Author

stevekm commented Apr 15, 2022

@timosong I think the "uncalled" mutations files should be a separate issue, are they required for fillout data import?

@stevekm
Copy link
Member Author

stevekm commented Apr 18, 2022

  • case list files should not contain duplicate sample ID's
  • the data mutations uncalled file is required for fillout delivery to cBioPortal
  • the fillout output .maf mutation data should be merged into the data_mutations_extended.txt which is currently generated by the portal-workflow.cwl for cBioPortal import

@stevekm
Copy link
Member Author

stevekm commented May 5, 2022

case list fixes are implemented

@svural svural added the revisit this later non-critical, non-breaking change to consider label May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high priority revisit this later non-critical, non-breaking change to consider
Projects
None yet
Development

No branches or pull requests

3 participants