Skip to content

Commit 76fce6c

Browse files
committed
update docs to include GLIMPSE Hail Batch example
1 parent c16f68e commit 76fce6c

11 files changed

+127
-20
lines changed
8 Bytes
Binary file not shown.
726 Bytes
Binary file not shown.

docs/_build/doctrees/tutorial.doctree

5.25 KB
Binary file not shown.

docs/_build/html/_sources/imputation.rst.txt

+3-1
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,11 @@ Arguments and options
3838
* - Argument
3939
- Description
4040
* - :code:`--input-file`
41-
- Path to where the VCF for target genotypes paths is
41+
- Path to where the VCF or TSV with target VCF/BAM files is
4242
* - :code:`--vcf-ref`
4343
- Reference panel file to use for imputation
44+
* - :code:`--chromosomes`
45+
- Chromosome(s) to run imputation for. Default is :code:`all`
4446
* - :code:`--local`
4547
- Type of service. Default is Service backend where jobs are executed on a multi-tenant compute cluster in Google Cloud
4648
* - :code:`--billing-project`

docs/_build/html/_sources/tutorial.rst.txt

+35-2
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,21 @@ This is a short tutorial on how to use the different modules of GWASpy.
99
1. Datasets
1010
###########
1111

12-
We will be using simulated test data (on GRCh37) from RICOPILI. Below is how it can be downloaded and copied to a Google bucket
12+
We will be using simulated test data (on GRCh37) from RICOPILI for most of the examples. Below is how it can be downloaded and copied to a Google bucket
1313

1414
.. code-block:: sh
1515
1616
wget https://personal.broadinstitute.org/sawasthi/share_links/UzoZK7Yfd7nTzIxHamCh1rSOiIOSdj_gwas-qcerrors.py/sim_sim1a_eur_sa_merge.miss.{bed,bim,fam} .
1717
gsutil cp sim_sim1a_eur_sa_merge.miss.{bed,bim,fam} gs://my-gcs/bucket/test_data
1818
19+
For low-coverage genotype imputation using GLIMPSE, we will be using the 1X downsampled NA12878 file from the GLIMPSE
20+
tutorial. Below is how it can be downloaded and copied to a Google bucket
21+
22+
.. code-block:: sh
23+
24+
wget wget https://github.com/odelaneau/GLIMPSE/raw/refs/heads/master/tutorial/NA12878_1x_bam/NA12878.{bam,bam.bai} .
25+
gsutil cp NA12878.{bam,bam.bai} gs://my-gcs/bucket/test_data
26+
1927
2. Start a dataproc cluster with GWASpy installed
2028
#################################################
2129

@@ -164,7 +172,32 @@ Now you can easily run both phasing and imputation using the following command
164172
165173
./nextflow run main.nf -c nextflow.config -profile gbatch -params-file params.json
166174
167-
5. Low-coverage WGS imputation using GLIMPSE
175+
6. Low-coverage WGS imputation using GLIMPSE
168176
############################################
169177
178+
**6.1 Hail Batch** (should be ~$0.5 and takes <10 minutes)
179+
180+
Unlike phasing using IMPUTE5, GLIMPSE takes BAM files as input, and since we usually have one BAM file per sample, the
181+
input to the imputation module when using GLIMPSE is a TSV file without a header and has two columns: first column with
182+
sample ID and second column with the actual path to the BAM file. Only one sample/BAM per row is allowed in the TSV.
183+
Below is an example of a file saved as :code:`gs://my-gcs/bucket/test_data/na12878_test.tsv`
184+
185+
.. list-table::
186+
:widths: 15 50
187+
:header-rows: 0
188+
189+
* - NA12878
190+
- gs://my-gcs/bucket/test_data/NA12878.bam
191+
192+
193+
Once you have saved the TSV to a bucket, you can run GLIMPSE phasing and imputation using the following command
194+
195+
.. code-block:: sh
196+
197+
imputation --input-file gs://my-gcs/bucket/test_data/na12878_test.tsv --vcf-ref hgdp1kgp \
198+
--output-filename sim_sim1a_eur_sa_merge.miss_qced.phased.imputed \
199+
--out-dir gs://my-gcs/bucket/test_data/GWASpy/lowcov_imputation --n-samples 1 --n-ref-samples 4091 \
200+
--billing-project my-billing-project --chromosomes 22 --software glimpse2
201+
202+
**6.2. Nextflow**
170203
**COMING VERY SOON**

docs/_build/html/imputation.html

+11-8
Original file line numberDiff line numberDiff line change
@@ -122,30 +122,33 @@ <h2>Arguments and options<a class="headerlink" href="#arguments-and-options" tit
122122
</thead>
123123
<tbody>
124124
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--input-file</span></code></p></td>
125-
<td><p>Path to where the VCF for target genotypes paths is</p></td>
125+
<td><p>Path to where the VCF or TSV with target VCF/BAM files is</p></td>
126126
</tr>
127127
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--vcf-ref</span></code></p></td>
128128
<td><p>Reference panel file to use for imputation</p></td>
129129
</tr>
130-
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--local</span></code></p></td>
130+
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--chromosomes</span></code></p></td>
131+
<td><p>Chromosome(s) to run imputation for. Default is <code class="code docutils literal notranslate"><span class="pre">all</span></code></p></td>
132+
</tr>
133+
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--local</span></code></p></td>
131134
<td><p>Type of service. Default is Service backend where jobs are executed on a multi-tenant compute cluster in Google Cloud</p></td>
132135
</tr>
133-
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--billing-project</span></code></p></td>
136+
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--billing-project</span></code></p></td>
134137
<td><p>Billing project to be used for the jobs</p></td>
135138
</tr>
136-
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--n-samples</span></code></p></td>
139+
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--n-samples</span></code></p></td>
137140
<td><p>Number of target samples to be imputed. We use this to estimate resources for some of the jobs</p></td>
138141
</tr>
139-
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--n-ref-samples</span></code></p></td>
142+
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--n-ref-samples</span></code></p></td>
140143
<td><p>Number of reference samples. We use this to estimate resources for some of the jobs</p></td>
141144
</tr>
142-
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--software</span></code></p></td>
145+
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--software</span></code></p></td>
143146
<td><p>Software to use for phasing. Options: [<code class="code docutils literal notranslate"><span class="pre">beagle5</span></code>, <code class="code docutils literal notranslate"><span class="pre">impute5</span></code>]. Default is <code class="code docutils literal notranslate"><span class="pre">impute5</span></code></p></td>
144147
</tr>
145-
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--output-filename</span></code></p></td>
148+
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--output-filename</span></code></p></td>
146149
<td><p>Output filename without file extension</p></td>
147150
</tr>
148-
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--out-dir</span></code></p></td>
151+
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--out-dir</span></code></p></td>
149152
<td><p>Path to where output files will be saved</p></td>
150153
</tr>
151154
</tbody>

docs/_build/html/index.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ <h1>Contents<a class="headerlink" href="#contents" title="Link to this heading">
124124
<li class="toctree-l2"><a class="reference internal" href="tutorial.html#pre-imputation-qc">3. Pre-imputation QC</a></li>
125125
<li class="toctree-l2"><a class="reference internal" href="tutorial.html#pca">4. PCA</a></li>
126126
<li class="toctree-l2"><a class="reference internal" href="tutorial.html#phasing-and-imputation">5. Phasing and Imputation</a></li>
127-
<li class="toctree-l2"><a class="reference internal" href="tutorial.html#low-coverage-wgs-imputation-using-glimpse">5. Low-coverage WGS imputation using GLIMPSE</a></li>
127+
<li class="toctree-l2"><a class="reference internal" href="tutorial.html#low-coverage-wgs-imputation-using-glimpse">6. Low-coverage WGS imputation using GLIMPSE</a></li>
128128
</ul>
129129
</li>
130130
</ul>

0 commit comments

Comments
 (0)