Skip to content
This repository has been archived by the owner on Oct 2, 2020. It is now read-only.

[WIP] Hiveplots workflow demo & tool descriptions #44

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
33 changes: 33 additions & 0 deletions tools/hall-lab-svtools-vcftobedpe.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/usr/bin/env cwl-runner

cwlVersion: "cwl:draft-3.dev2"

class: CommandLineTool

description: |
Usage: vcftobedbpe -i <in.vcf> -o [out.bedpe]

requirements:
- "@import": envvar-global.cwl

inputs:
- id: "#input"
type: File
description: |
"Input vcf file."
streamable: true
inputBinding:
prefix: "-i"

stdout:
"output.bedpe"

outputs:
- id: "#bedpe"
type: File
description: "The bedpe file"
streamable: true
outputBinding:
glob: "output.bedpe"

baseCommand: ["vcftobedpe"]
7 changes: 7 additions & 0 deletions tools/jobs/vawk-job.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"input": {
"class": "File",
"path": "../test-files/APGI2049_Tumor-manta.vcf"
},
"cmd": "{ print $1 }"
}
68 changes: 68 additions & 0 deletions tools/vawk.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/usr/bin/env cwl-runner

cwlVersion: "cwl:draft-3.dev2"

class: CommandLineTool

requirements:
- "@import": envvar-global.cwl

description: |
usage: vawk [-h] [-v VAR] [-c INFO_COL] [--header] [--debug] cmd [vcf]
positional arguments:
cmd vawk command syntax is exactly the same as awk syntax with
a few additional features. The INFO field can be split using
the I$ prefix and the SAMPLE field can be split using
the S$ prefix. For example, I$AF prints the allele frequency of
each variant and S$NA12878 prints the entire SAMPLE field for the
NA12878 individual for each variant. S$* returns all samples.
The SAMPLE field can be further split based on the keys in the
FORMAT field of the VCF (column 9). For example, S$NA12877$GT
returns the genotype of the NA12878 individual.
ex: '{ if (I$AF>0.5) print $1,$2,$3,I$AN,S$NA12878,S$NA12877$GT }'
vcf VCF file (default: stdin)
optional arguments:
-h, --help show this help message and exit
-v VAR, --var VAR declare an external variable (e.g.: SIZE=10000)
-c INFO_COL, --col INFO_COL
column of the INFO field [8]
--header print VCF header
--debug debugging level verbosity

inputs:
- id: "#cmd"
type: string
description: |
vawk command syntax is exactly the same as awk syntax with a few
additional features. The INFO field can be split using the I$ prefix
and the SAMPLE field can be split using the S$ prefix. For example,
I$AF prints the allele frequency of each variant and S$NA12878 prints
the entire SAMPLE field for the NA12878 individual for each variant.
S$* returns all samples. The SAMPLE field can be further split based on
the keys in the FORMAT field of the VCF (column 9). For example,
S$NA12877$GT returns the genotype of the NA12878 individual.
ex: '{ if (I$AF>0.5) print $1,$2,$3,I$AN,S$NA12878,S$NA12877$GT }'
inputBinding:
position: 1
streamable: true

- id: "#input"
type: File
description: |
VCF file
inputBinding:
position: 2

stdout:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good practice to provide output file name for stdout
draft2 example:

inputs:
  - id: '#stdoutfile'
    type: string
outputs:
  - id: '#stdoutfile'
    type: File
    outputBinding:
      glob:
        engine: 'cwl:JsonPointer'
        script: /job/stdoutfile
stdout:
  engine: 'cwl:JsonPointer'
  script: /job/stdoutfile

"output.vcf"

outputs:
- id: "#processed"
type: File
description: "The resulting VCF file"
streamable: true
outputBinding:
glob: "output.vcf"


baseCommand: ["vawk"]