Skip to content

Commit

Permalink
Filter wham-only DELs and scramble-only SVAs in CleanVcf & docs updat…
Browse files Browse the repository at this point in the history
…es (#740)
  • Loading branch information
epiercehoffman authored Oct 31, 2024
1 parent 6ea99cf commit aef6ac9
Show file tree
Hide file tree
Showing 24 changed files with 357 additions and 390 deletions.
9 changes: 9 additions & 0 deletions .github/.dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,15 @@ workflows:
tags:
- /.*/

- subclass: WDL
name: VisualizeCnvs
primaryDescriptorPath: /wdl/VisualizeCnvs.wdl
filters:
branches:
- main
tags:
- /.*/

- subclass: WDL
name: SingleSamplePipeline
primaryDescriptorPath: /wdl/GATKSVPipelineSingleSample.wdl
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

A structural variation discovery pipeline for Illumina short-read whole-genome sequencing (WGS) data.

For technical documentation on GATK-SV, including how to run the pipeline, please refer to our website.
For technical documentation on GATK-SV, including how to run the pipeline, please refer to our [website](https://broadinstitute.github.io/gatk-sv/).

## Repository structure
* `/carrot`: [Carrot](https://github.com/broadinstitute/carrot) tests
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"VisualizeCnvs.vcf_or_bed": "${this.filtered_vcf}",
"VisualizeCnvs.prefix": "${this.sample_set_set_id}",
"VisualizeCnvs.median_files": "${this.sample_sets.median_cov}",
"VisualizeCnvs.rd_files": "${this.sample_sets.merged_bincov}",
"VisualizeCnvs.ped_file": "${workspace.cohort_ped_file}",
"VisualizeCnvs.min_size": 50000,
"VisualizeCnvs.flags": "-s 999999999",
"VisualizeCnvs.sv_pipeline_docker": "${workspace.sv_pipeline_docker}"
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@
"VisualizeCnvs.rd_files": [{{ test_batch.merged_coverage_file | tojson }}],
"VisualizeCnvs.ped_file": {{ test_batch.ped_file | tojson }},
"VisualizeCnvs.min_size": 50000,
"VisualizeCnvs.flags": "",
"VisualizeCnvs.flags": "-s 999999999",
"VisualizeCnvs.sv_pipeline_docker": {{ dockers.sv_pipeline_docker | tojson }}
}
2 changes: 1 addition & 1 deletion scripts/test/terra_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def main():
parser.add_argument("-j", "--womtool-jar", help="Path to womtool jar", required=True)
parser.add_argument("-n", "--num-input-jsons",
help="Number of Terra input JSONs expected",
required=False, default=25, type=int)
required=False, default=26, type=int)
parser.add_argument("--log-level",
help="Specify level of logging information, ie. info, warning, error (not case-sensitive)",
required=False, default="INFO")
Expand Down
63 changes: 62 additions & 1 deletion wdl/CleanVcfChromosome.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ workflow CleanVcfChromosome {
RuntimeAttr? runtime_override_stitch_fragmented_cnvs
RuntimeAttr? runtime_override_final_cleanup
RuntimeAttr? runtime_override_rescue_me_dels
RuntimeAttr? runtime_attr_add_high_fp_rate_filters
# Clean vcf 1b
RuntimeAttr? runtime_attr_override_subset_large_cnvs_1b
Expand Down Expand Up @@ -299,9 +300,17 @@ workflow CleanVcfChromosome {
runtime_attr_override = runtime_override_rescue_me_dels
}
call FinalCleanup {
call AddHighFDRFilters {
input:
vcf=RescueMobileElementDeletions.out,
prefix="~{prefix}.high_fdr_filtered",
sv_pipeline_docker=sv_pipeline_docker,
runtime_attr_override=runtime_attr_add_high_fp_rate_filters
}
call FinalCleanup {
input:
vcf=AddHighFDRFilters.out,
contig=contig,
prefix="~{prefix}.final_cleanup",
sv_pipeline_docker=sv_pipeline_docker,
Expand Down Expand Up @@ -799,6 +808,58 @@ task StitchFragmentedCnvs {
}
}

# Add FILTER status for pockets of variants with high FP rate: wham-only DELs and Scramble-only SVAs with HIGH_SR_BACKGROUND
task AddHighFDRFilters {
input {
File vcf
String prefix
String sv_pipeline_docker
RuntimeAttr? runtime_attr_override
}
Float input_size = size(vcf, "GiB")
RuntimeAttr runtime_default = object {
mem_gb: 3.75,
disk_gb: ceil(10.0 + input_size * 3.0),
cpu_cores: 1,
preemptible_tries: 3,
max_retries: 1,
boot_disk_gb: 10
}
RuntimeAttr runtime_override = select_first([runtime_attr_override, runtime_default])
runtime {
memory: "~{select_first([runtime_override.mem_gb, runtime_default.mem_gb])} GB"
disks: "local-disk ~{select_first([runtime_override.disk_gb, runtime_default.disk_gb])} HDD"
cpu: select_first([runtime_override.cpu_cores, runtime_default.cpu_cores])
preemptible: select_first([runtime_override.preemptible_tries, runtime_default.preemptible_tries])
maxRetries: select_first([runtime_override.max_retries, runtime_default.max_retries])
docker: sv_pipeline_docker
bootDiskSizeGb: select_first([runtime_override.boot_disk_gb, runtime_default.boot_disk_gb])
}
command <<<
set -euo pipefail
python <<CODE
import pysam
with pysam.VariantFile("~{vcf}", 'r') as fin:
header = fin.header
header.add_line("##FILTER=<ID=HIGH_ALGORITHM_FDR,Description=\"Categories of variants with low precision including Wham-only deletions and certain Scramble SVAs\">")
with pysam.VariantFile("~{prefix}.vcf.gz", 'w', header=header) as fo:
for record in fin:
if (record.info['ALGORITHMS'] == ('wham',) and record.info['SVTYPE'] == 'DEL') or \
(record.info['ALGORITHMS'] == ('scramble',) and record.info['HIGH_SR_BACKGROUND'] and record.alts == ('<INS:ME:SVA>',)):
record.filter.add('HIGH_ALGORITHM_FDR')
fo.write(record)
CODE
>>>
output {
File out = "~{prefix}.vcf.gz"
}
}



# Final VCF cleanup
task FinalCleanup {
Expand Down
1 change: 1 addition & 0 deletions website/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
# Generated files
.docusaurus
.cache-loader
package-lock.json

# Misc
.DS_Store
Expand Down
2 changes: 1 addition & 1 deletion website/docs/advanced/cromwell/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Google Cloud Platform (GCP).

# Cromwell Server

There are two option to communicate with a running Cromwell server:
There are two options to communicate with a running Cromwell server:
[REST API](https://cromwell.readthedocs.io/en/stable/tutorials/ServerMode/), and
[Cromshell](https://github.com/broadinstitute/cromshell) which is a command line tool
to interface with a Cromwell server. We recommend using Cromshell due to its simplicity
Expand Down
4 changes: 2 additions & 2 deletions website/docs/best_practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ description: Guide for using GATK-SV
sidebar_position: 4
---

A comprehensive guide for the single-sample calling mode is available in [GATK Best Practices for Structural Variation
Discovery on Single Samples](https://gatk.broadinstitute.org/hc/en-us/articles/9022653744283-GATK-Best-Practices-for-Structural-Variation-Discovery-on-Single-Samples).
A comprehensive guide for the single-sample [calling mode](/docs/gs/calling_modes) is available in
[GATK Best Practices for Structural Variation Discovery on Single Samples](https://gatk.broadinstitute.org/hc/en-us/articles/9022653744283-GATK-Best-Practices-for-Structural-Variation-Discovery-on-Single-Samples).
This material covers basic concepts of structural variant calling, specifics of SV VCF formatting, and
advanced troubleshooting that also apply to the joint calling mode as well. This guide is intended to supplement
documentation found here.
Expand Down
Loading

0 comments on commit aef6ac9

Please sign in to comment.