Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter wham-only DELs and scramble-only SVAs in CleanVcf & docs updates #740

Merged
merged 8 commits into from
Oct 31, 2024
Merged
9 changes: 9 additions & 0 deletions .github/.dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,15 @@ workflows:
tags:
- /.*/

- subclass: WDL
name: VisualizeCnvs
primaryDescriptorPath: /wdl/VisualizeCnvs.wdl
filters:
branches:
- main
tags:
- /.*/

- subclass: WDL
name: SingleSamplePipeline
primaryDescriptorPath: /wdl/GATKSVPipelineSingleSample.wdl
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

A structural variation discovery pipeline for Illumina short-read whole-genome sequencing (WGS) data.

For technical documentation on GATK-SV, including how to run the pipeline, please refer to our website.
For technical documentation on GATK-SV, including how to run the pipeline, please refer to our [website](https://broadinstitute.github.io/gatk-sv/).

## Repository structure
* `/carrot`: [Carrot](https://github.com/broadinstitute/carrot) tests
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"VisualizeCnvs.vcf_or_bed": "${this.filtered_vcf}",
"VisualizeCnvs.prefix": "${this.sample_set_set_id}",
"VisualizeCnvs.median_files": "${this.sample_sets.median_cov}",
"VisualizeCnvs.rd_files": "${this.sample_sets.merged_bincov}",
"VisualizeCnvs.ped_file": "${workspace.cohort_ped_file}",
"VisualizeCnvs.min_size": 50000,
"VisualizeCnvs.flags": "-s 999999999",
"VisualizeCnvs.sv_pipeline_docker": "${workspace.sv_pipeline_docker}"
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@
"VisualizeCnvs.rd_files": [{{ test_batch.merged_coverage_file | tojson }}],
"VisualizeCnvs.ped_file": {{ test_batch.ped_file | tojson }},
"VisualizeCnvs.min_size": 50000,
"VisualizeCnvs.flags": "",
"VisualizeCnvs.flags": "-s 999999999",
"VisualizeCnvs.sv_pipeline_docker": {{ dockers.sv_pipeline_docker | tojson }}
}
2 changes: 1 addition & 1 deletion scripts/test/terra_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def main():
parser.add_argument("-j", "--womtool-jar", help="Path to womtool jar", required=True)
parser.add_argument("-n", "--num-input-jsons",
help="Number of Terra input JSONs expected",
required=False, default=25, type=int)
required=False, default=26, type=int)
parser.add_argument("--log-level",
help="Specify level of logging information, ie. info, warning, error (not case-sensitive)",
required=False, default="INFO")
Expand Down
63 changes: 62 additions & 1 deletion wdl/CleanVcfChromosome.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ workflow CleanVcfChromosome {
RuntimeAttr? runtime_override_stitch_fragmented_cnvs
RuntimeAttr? runtime_override_final_cleanup
RuntimeAttr? runtime_override_rescue_me_dels
RuntimeAttr? runtime_attr_add_high_fp_rate_filters

# Clean vcf 1b
RuntimeAttr? runtime_attr_override_subset_large_cnvs_1b
Expand Down Expand Up @@ -299,9 +300,17 @@ workflow CleanVcfChromosome {
runtime_attr_override = runtime_override_rescue_me_dels
}

call FinalCleanup {
call AddHighFDRFilters {
input:
vcf=RescueMobileElementDeletions.out,
prefix="~{prefix}.high_fdr_filtered",
sv_pipeline_docker=sv_pipeline_docker,
runtime_attr_override=runtime_attr_add_high_fp_rate_filters
}

call FinalCleanup {
input:
vcf=AddHighFDRFilters.out,
contig=contig,
prefix="~{prefix}.final_cleanup",
sv_pipeline_docker=sv_pipeline_docker,
Expand Down Expand Up @@ -799,6 +808,58 @@ task StitchFragmentedCnvs {
}
}

# Add FILTER status for pockets of variants with high FP rate: wham-only DELs and Scramble-only SVAs with HIGH_SR_BACKGROUND
task AddHighFDRFilters {
input {
File vcf
String prefix
String sv_pipeline_docker
RuntimeAttr? runtime_attr_override
}

Float input_size = size(vcf, "GiB")
RuntimeAttr runtime_default = object {
mem_gb: 3.75,
disk_gb: ceil(10.0 + input_size * 3.0),
cpu_cores: 1,
preemptible_tries: 3,
max_retries: 1,
boot_disk_gb: 10
}
RuntimeAttr runtime_override = select_first([runtime_attr_override, runtime_default])
runtime {
memory: "~{select_first([runtime_override.mem_gb, runtime_default.mem_gb])} GB"
disks: "local-disk ~{select_first([runtime_override.disk_gb, runtime_default.disk_gb])} HDD"
cpu: select_first([runtime_override.cpu_cores, runtime_default.cpu_cores])
preemptible: select_first([runtime_override.preemptible_tries, runtime_default.preemptible_tries])
maxRetries: select_first([runtime_override.max_retries, runtime_default.max_retries])
docker: sv_pipeline_docker
bootDiskSizeGb: select_first([runtime_override.boot_disk_gb, runtime_default.boot_disk_gb])
}

command <<<
set -euo pipefail

python <<CODE
import pysam
with pysam.VariantFile("~{vcf}", 'r') as fin:
header = fin.header
header.add_line("##FILTER=<ID=HIGH_ALGORITHM_FDR,Description=\"Categories of variants with low precision including Wham-only deletions and certain Scramble SVAs\">")
with pysam.VariantFile("~{prefix}.vcf.gz", 'w', header=header) as fo:
for record in fin:
if (record.info['ALGORITHMS'] == ('wham',) and record.info['SVTYPE'] == 'DEL') or \
(record.info['ALGORITHMS'] == ('scramble',) and record.info['HIGH_SR_BACKGROUND'] and record.alts == ('<INS:ME:SVA>',)):
record.filter.add('HIGH_ALGORITHM_FDR')
fo.write(record)
CODE
>>>

output {
File out = "~{prefix}.vcf.gz"
}
}



# Final VCF cleanup
task FinalCleanup {
Expand Down
2 changes: 1 addition & 1 deletion website/docs/advanced/cromwell/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Google Cloud Platform (GCP).

# Cromwell Server

There are two option to communicate with a running Cromwell server:
There are two options to communicate with a running Cromwell server:
[REST API](https://cromwell.readthedocs.io/en/stable/tutorials/ServerMode/), and
[Cromshell](https://github.com/broadinstitute/cromshell) which is a command line tool
to interface with a Cromwell server. We recommend using Cromshell due to its simplicity
Expand Down
4 changes: 2 additions & 2 deletions website/docs/best_practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ description: Guide for using GATK-SV
sidebar_position: 4
---

A comprehensive guide for the single-sample calling mode is available in [GATK Best Practices for Structural Variation
Discovery on Single Samples](https://gatk.broadinstitute.org/hc/en-us/articles/9022653744283-GATK-Best-Practices-for-Structural-Variation-Discovery-on-Single-Samples).
A comprehensive guide for the single-sample [calling mode](/docs/gs/calling_modes) is available in
[GATK Best Practices for Structural Variation Discovery on Single Samples](https://gatk.broadinstitute.org/hc/en-us/articles/9022653744283-GATK-Best-Practices-for-Structural-Variation-Discovery-on-Single-Samples).
This material covers basic concepts of structural variant calling, specifics of SV VCF formatting, and
advanced troubleshooting that also apply to the joint calling mode as well. This guide is intended to supplement
documentation found here.
Expand Down
Loading
Loading