|
4 | 4 | "cell_type": "markdown",
|
5 | 5 | "metadata": {},
|
6 | 6 | "source": [
|
7 |
| - "# Tutorial of MendelAimSelection\n", |
8 |
| - "### last update: 2/3/2019" |
| 7 | + "# Tutorial of MendelAimSelection" |
9 | 8 | ]
|
10 | 9 | },
|
11 | 10 | {
|
|
24 | 23 | "\n",
|
25 | 24 | "## When to use MendelAimSelection\n",
|
26 | 25 | "\n",
|
27 |
| - "This [Julia](http://julialang.org/) package selects the SNPs that are most informative at predicting ancestry for your data — the best Ancestry Informative Markers (AIMs). MendelAimSelection is one component of the umbrella [OpenMendel](https://openmendel.github.io) project. " |
| 26 | + "This [Julia](http://julialang.org/) package selects the SNPs that are most informative at predicting ancestry for your data — the best Ancestry Informative Markers (AIMs). \n", |
| 27 | + "\n", |
| 28 | + "MendelAimSelection is one component of the umbrella [OpenMendel](https://openmendel.github.io) project." |
| 29 | + ] |
| 30 | + }, |
| 31 | + { |
| 32 | + "cell_type": "markdown", |
| 33 | + "metadata": {}, |
| 34 | + "source": [ |
| 35 | + "## Background\n", |
| 36 | + "Modern genetic studies often include people of many ethnicities or of mixed ethnicity. The potential for confounding ethnicity with disease risk is well known. MendelAimSelection uses an extension of an algorithm described by [Rosenberg et al.](https://www.ncbi.nlm.nih.gov/pubmed/14631557) to quickly find the N most informative AIMs within the data set.\n", |
| 37 | + "\n", |
| 38 | + "*Rosenberg NA, Li LM, Ward R, Pritchard JK (2003) Informativeness of genetic markers\n", |
| 39 | + "for inference of ancestry. Amer J Hum Genet 73:1402–1422.*" |
28 | 40 | ]
|
29 | 41 | },
|
30 | 42 | {
|
|
40 | 52 | },
|
41 | 53 | {
|
42 | 54 | "cell_type": "code",
|
43 |
| - "execution_count": 1, |
| 55 | + "execution_count": null, |
44 | 56 | "metadata": {},
|
45 |
| - "outputs": [ |
46 |
| - { |
47 |
| - "name": "stdout", |
48 |
| - "output_type": "stream", |
49 |
| - "text": [ |
50 |
| - "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m registry at `~/.julia/registries/General`\n", |
51 |
| - "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m git-repo `https://github.com/JuliaRegistries/General.git`\n", |
52 |
| - "\u001b[2K\u001b[?25h[1mFetching:\u001b[22m\u001b[39m [========================================>] 100.0 %.0 %37.2 %> ] 74.6 %\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m git-repo `https://github.com/OpenMendel/MendelAimSelection.jl.git`\n", |
53 |
| - "\u001b[2K\u001b[?25h[1mFetching:\u001b[22m\u001b[39m [========================================>] 100.0 %.0 %\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", |
54 |
| - "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m WeakRefStrings ─ v0.5.6\n", |
55 |
| - "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m `~/.julia/environments/v1.1/Project.toml`\n", |
56 |
| - " \u001b[90m [8a9f0eb4]\u001b[39m\u001b[93m ~ MendelAimSelection v0.5.0 #master (https://github.com/OpenMendel/MendelAimSelection.jl.git) ⇒ v0.5.0 #master (https://github.com/OpenMendel/MendelAimSelection.jl.git)\u001b[39m\n", |
57 |
| - "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m `~/.julia/environments/v1.1/Manifest.toml`\n", |
58 |
| - " \u001b[90m [8a9f0eb4]\u001b[39m\u001b[93m ~ MendelAimSelection v0.5.0 #master (https://github.com/OpenMendel/MendelAimSelection.jl.git) ⇒ v0.5.0 #master (https://github.com/OpenMendel/MendelAimSelection.jl.git)\u001b[39m\n", |
59 |
| - " \u001b[90m [ea10d353]\u001b[39m\u001b[93m ↑ WeakRefStrings v0.5.5 ⇒ v0.5.6\u001b[39m\n" |
60 |
| - ] |
61 |
| - } |
62 |
| - ], |
| 57 | + "outputs": [], |
63 | 58 | "source": [
|
64 | 59 | "] add https://github.com/OpenMendel/MendelAimSelection.jl.git"
|
65 | 60 | ]
|
|
80 | 75 | "metadata": {},
|
81 | 76 | "source": [
|
82 | 77 | "## Input Files\n",
|
83 |
| - "The MendelAimSelection analysis package uses the following input files. Example input files can be found in the [data](https://github.com/OpenMendel/MendelAimSelection.jl/tree/master/data) subfolder of the MendelAimSelection project. An analysis won't always need every file type below. The input for all examples in this tutorial were obtained from the 1000 genome project.\n", |
| 78 | + "The MendelAimSelection analysis package accepts the following input files. Example input files can be found in the [data](https://github.com/OpenMendel/MendelAimSelection.jl/tree/master/data) subfolder of the MendelAimSelection project. An analysis won't always need every file type below. The input for all examples in this tutorial were obtained from the 1000 genome project.\n", |
84 | 79 | "\n",
|
85 |
| - "* [Control File](https://openmendel.github.io/MendelAimSelection.jl/#control-file): Specifies the names of your data input and output files and any optional parameters (*keywords*) for the analysis. (For a list of common keywords, see [Keywords Table](https://openmendel.github.io/MendelBase.jl/#keywords-table)).\n", |
| 80 | + "* [Control File](https://openmendel.github.io/MendelAimSelection.jl/#control-file): Specifies the names of your data input and output files and any optional parameters (*keywords*) for the analysis. (For a list of common keywords, see [Keywords Table](https://openmendel.github.io/MendelBase.jl/#keywords-table)). The Control file is optional. If you don't use a Control file you will enter your keywords directly in the command line.\n", |
86 | 81 | "* [Locus File](https://openmendel.github.io/MendelBase.jl/#locus-file): Names and describes the genetic loci in your data.\n",
|
87 | 82 | "* [Pedigree File](https://openmendel.github.io/MendelBase.jl/#pedigree-file): Gives information about your individuals, such as name, sex, family structure, and ancestry.\n",
|
88 | 83 | "* [Phenotype File](https://openmendel.github.io/MendelBase.jl/#phenotype-file): Lists the available phenotypes.\n",
|
|
161 | 156 | "*Note: The package is called* MendelAimSelection *but the analysis function is called simply* AimSelection."
|
162 | 157 | ]
|
163 | 158 | },
|
| 159 | + { |
| 160 | + "cell_type": "markdown", |
| 161 | + "metadata": {}, |
| 162 | + "source": [ |
| 163 | + "## Output Files\n", |
| 164 | + "Each option will create output files specific to that option, and will save them to the same directory that holds the input data files." |
| 165 | + ] |
| 166 | + }, |
164 | 167 | {
|
165 | 168 | "cell_type": "markdown",
|
166 | 169 | "metadata": {},
|
|
173 | 176 | "metadata": {},
|
174 | 177 | "source": [
|
175 | 178 | "### Step 0: Load the OpenMendel package and then go to the directory containing the data files:\n",
|
176 |
| - "In this example, we go to the directory containing the example data files that come with this package." |
| 179 | + "First we load the MendelEstimateFrequencies package." |
| 180 | + ] |
| 181 | + }, |
| 182 | + { |
| 183 | + "cell_type": "code", |
| 184 | + "execution_count": null, |
| 185 | + "metadata": {}, |
| 186 | + "outputs": [], |
| 187 | + "source": [ |
| 188 | + "using MendelAimSelection" |
| 189 | + ] |
| 190 | + }, |
| 191 | + { |
| 192 | + "cell_type": "markdown", |
| 193 | + "metadata": {}, |
| 194 | + "source": [ |
| 195 | + "In this example we go to the directory containing the example data files that come with this package." |
177 | 196 | ]
|
178 | 197 | },
|
179 | 198 | {
|
|
182 | 201 | "metadata": {},
|
183 | 202 | "outputs": [],
|
184 | 203 | "source": [
|
185 |
| - "using MendelAimSelection\n", |
186 | 204 | "cd(MendelAimSelection.datadir())\n",
|
187 | 205 | "pwd()"
|
188 | 206 | ]
|
|
197 | 215 | },
|
198 | 216 | {
|
199 | 217 | "cell_type": "code",
|
200 |
| - "execution_count": 2, |
| 218 | + "execution_count": null, |
201 | 219 | "metadata": {
|
202 | 220 | "scrolled": true
|
203 | 221 | },
|
204 |
| - "outputs": [ |
205 |
| - { |
206 |
| - "name": "stdout", |
207 |
| - "output_type": "stream", |
208 |
| - "text": [ |
209 |
| - "Person Sex Ethnic\n", |
210 |
| - "HG00403 1 CHS\n", |
211 |
| - "HG00404 2 CHS\n", |
212 |
| - "HG00406 1 CHS\n", |
213 |
| - "HG00407 2 CHS\n", |
214 |
| - "HG00409 1 CHS\n", |
215 |
| - "HG00410 2 CHS\n", |
216 |
| - "HG00419 2 CHS\n", |
217 |
| - "HG00421 1 CHS\n", |
218 |
| - "HG00422 2 CHS\n" |
219 |
| - ] |
220 |
| - } |
221 |
| - ], |
| 222 | + "outputs": [], |
222 | 223 | "source": [
|
223 | 224 | ";head -10 \"1000genomes_chr1_eas.ped\""
|
224 | 225 | ]
|
225 | 226 | },
|
| 227 | + { |
| 228 | + "cell_type": "markdown", |
| 229 | + "metadata": {}, |
| 230 | + "source": [ |
| 231 | + "In this example we have unrelated individuals who were genotyped as part of the [1000 Genomes Project](http://www.internationalgenome.org/about). They come from 5 different ethnic groups, Southern Han Chinese (CHS),Chinese Dai in Xishuangbanna, China (CDX), Kinh in Ho Chi Minh City, Vietnam (KHV), Han Chinese in Beijing, China (CHB) and Japanese in Tokyo, Japan (JPT). We want to find SNPs that differ in their allele frequencies among two or more of these groups." |
| 232 | + ] |
| 233 | + }, |
226 | 234 | {
|
227 | 235 | "cell_type": "markdown",
|
228 | 236 | "metadata": {},
|
229 | 237 | "source": [
|
230 | 238 | "### Step 2: Preparing the control file\n",
|
231 |
| - "A control file gives specific instructions to `MendelAimSelection`. To select the SNPs that are most informative at predicting ancestry for your data — the best Ancestry Informative Markers, a minimal control file looks like the following:" |
| 239 | + "A control file gives specific instructions to `MendelAimSelection`. To select the SNPs that are most informative at predicting ancestry for your data — the best Ancestry Informative Markers, a minimal control file looks like the following (in this example the data come from chromosome 1):" |
232 | 240 | ]
|
233 | 241 | },
|
234 | 242 | {
|
235 | 243 | "cell_type": "code",
|
236 |
| - "execution_count": 3, |
| 244 | + "execution_count": null, |
237 | 245 | "metadata": {},
|
238 |
| - "outputs": [ |
239 |
| - { |
240 |
| - "name": "stdout", |
241 |
| - "output_type": "stream", |
242 |
| - "text": [ |
243 |
| - "#\n", |
244 |
| - "# Input and Output files.\n", |
245 |
| - "#\n", |
246 |
| - "field_separator = ' '\n", |
247 |
| - "pedigree_file = 1000genomes_chr1_eas.ped\n", |
248 |
| - "\n", |
249 |
| - "plink_field_separator = '\t'\n", |
250 |
| - "plink_input_basename = 1000genomes_chr1_eas\n", |
251 |
| - "\n", |
252 |
| - "output_field_separator = ','\n", |
253 |
| - "output_file = 1000genomes_chr1_eas Output.txt\n", |
254 |
| - "#\n", |
255 |
| - "# Analysis parameters for AIM Selection option.\n", |
256 |
| - "#\n" |
257 |
| - ] |
258 |
| - } |
259 |
| - ], |
| 246 | + "outputs": [], |
260 | 247 | "source": [
|
261 | 248 | ";cat \"AIM 1000genomes_chr1 Control.txt\""
|
262 | 249 | ]
|
|
270 | 257 | },
|
271 | 258 | {
|
272 | 259 | "cell_type": "code",
|
273 |
| - "execution_count": 4, |
| 260 | + "execution_count": null, |
| 261 | + "metadata": {}, |
| 262 | + "outputs": [], |
| 263 | + "source": [ |
| 264 | + "AimSelection(\"AIM 1000genomes_chr1 Control.txt\")" |
| 265 | + ] |
| 266 | + }, |
| 267 | + { |
| 268 | + "cell_type": "markdown", |
| 269 | + "metadata": {}, |
| 270 | + "source": [ |
| 271 | + "### Step 4: Output File\n", |
| 272 | + "\n", |
| 273 | + "`MendelAimSelection` should have generated the file `1000genomes_chr1_eas Output.txt` in your local directory. This file lists your markers, and gives an AIMRank for each marker (see below)." |
| 274 | + ] |
| 275 | + }, |
| 276 | + { |
| 277 | + "cell_type": "code", |
| 278 | + "execution_count": null, |
274 | 279 | "metadata": {},
|
275 |
| - "outputs": [ |
276 |
| - { |
277 |
| - "name": "stderr", |
278 |
| - "output_type": "stream", |
279 |
| - "text": [ |
280 |
| - "┌ Info: Recompiling stale cache file /Users/jcpapp/.julia/compiled/v1.1/MendelAimSelection/ku58B.ji for MendelAimSelection [8a9f0eb4-fec1-5450-8bd2-8ca811e15b0d]\n", |
281 |
| - "└ @ Base loading.jl:1184\n" |
282 |
| - ] |
283 |
| - }, |
284 |
| - { |
285 |
| - "name": "stdout", |
286 |
| - "output_type": "stream", |
287 |
| - "text": [ |
288 |
| - " \n", |
289 |
| - " \n", |
290 |
| - " Welcome to OpenMendel's\n", |
291 |
| - " AIM Selection analysis option\n", |
292 |
| - " version 0.5.0\n", |
293 |
| - " \n", |
294 |
| - " \n", |
295 |
| - "Reading the data.\n", |
296 |
| - "\n", |
297 |
| - "The current working directory is \"/Users/jcpapp/Documents/_Files/Projects/Julia-Mendel/Documentation/Tutorials/Jupyter Notebooks/OpenMendel/New\".\n", |
298 |
| - "\n", |
299 |
| - "Keywords modified by the user:\n", |
300 |
| - "\n", |
301 |
| - " control_file = AIM 1000genomes_chr1 Control.txt\n", |
302 |
| - " field_separator = \n", |
303 |
| - " output_field_separator = ,\n", |
304 |
| - " output_file = 1000genomes_chr1_eas Output.txt\n", |
305 |
| - " pedigree_file = 1000genomes_chr1_eas.ped\n", |
306 |
| - " plink_field_separator = \t\n", |
307 |
| - " plink_input_basename = 1000genomes_chr1_eas\n", |
308 |
| - " snpdata_file = 1000genomes_chr1_eas.bed\n", |
309 |
| - " snpdefinition_file = 1000genomes_chr1_eas.bim\n", |
310 |
| - " \n", |
311 |
| - " \n", |
312 |
| - "Analyzing the data.\n", |
313 |
| - "\n", |
314 |
| - "100508×7 DataFrames.DataFrame. Omitted printing of 2 columns\n", |
315 |
| - "│ Row │ Chromosome │ SNP │ CentiMorgans │ Basepairs │ Allele1 │\n", |
316 |
| - "│ │ \u001b[90mString⍰\u001b[39m │ \u001b[90mString⍰\u001b[39m │ \u001b[90mFloat64⍰\u001b[39m │ \u001b[90mInt64⍰\u001b[39m │ \u001b[90mString⍰\u001b[39m │\n", |
317 |
| - "├────────┼────────────┼─────────────┼──────────────┼───────────┼─────────┤\n", |
318 |
| - "│ 1 │ 1 │ rs575272151 │ 0.0 │ 11008 │ G │\n", |
319 |
| - "│ 2 │ 1 │ rs546169444 │ 0.0 │ 14464 │ T │\n", |
320 |
| - "│ 3 │ 1 │ rs541940975 │ 0.0 │ 14604 │ G │\n", |
321 |
| - "│ 4 │ 1 │ rs374029747 │ 0.0 │ 15774 │ A │\n", |
322 |
| - "│ 5 │ 1 │ rs806731 │ 0.0 │ 30923 │ G │\n", |
323 |
| - "│ 6 │ 1 │ rs569128616 │ 0.0 │ 54716 │ T │\n", |
324 |
| - "│ 7 │ 1 │ rs28396308 │ 0.0 │ 55545 │ T │\n", |
325 |
| - "⋮\n", |
326 |
| - "│ 100501 │ 1 │ rs562509172 │ 0.0 │ 249205469 │ C │\n", |
327 |
| - "│ 100502 │ 1 │ rs182792056 │ 0.0 │ 249205601 │ T │\n", |
328 |
| - "│ 100503 │ 1 │ rs35062498 │ 0.0 │ 249206119 │ A │\n", |
329 |
| - "│ 100504 │ 1 │ rs41308182 │ 0.0 │ 249212878 │ G │\n", |
330 |
| - "│ 100505 │ 1 │ rs28826095 │ 0.0 │ 249214913 │ A │\n", |
331 |
| - "│ 100506 │ 1 │ rs3890680 │ 0.0 │ 249222325 │ A │\n", |
332 |
| - "│ 100507 │ 1 │ rs536206700 │ 0.0 │ 249233062 │ A │\n", |
333 |
| - "│ 100508 │ 1 │ rs547624467 │ 0.0 │ 249236765 │ G │ \n", |
334 |
| - " \n", |
335 |
| - "Mendel's analysis is finished.\n", |
336 |
| - "\n" |
337 |
| - ] |
338 |
| - } |
339 |
| - ], |
| 280 | + "outputs": [], |
340 | 281 | "source": [
|
341 |
| - "using MendelAimSelection\n", |
342 |
| - " AimSelection(\"AIM 1000genomes_chr1 Control.txt\")" |
| 282 | + ";cat \"1000genomes_chr1_eas Output.txt\"" |
343 | 283 | ]
|
344 | 284 | },
|
345 | 285 | {
|
346 | 286 | "cell_type": "markdown",
|
347 | 287 | "metadata": {},
|
348 | 288 | "source": [
|
349 |
| - "### Step 4: Interpreting the result\n", |
| 289 | + "### Step 5: Interpreting the result\n", |
350 | 290 | "\n",
|
351 |
| - "`MendelAimSelection` should have generated the file `1000genomes_chr1_eas Output.txt` in your local directory. One can directly open the file, or import into the Julia environment for ease of manipulation using the DataFrames package. " |
| 291 | + "`MendelAimSelection` uses the genotypes and ethnicities in your pedigree to assign a score - the *AIMRank* - to each marker. The higher the AIMRank, the better the marker at differentiating two or more of the ethnic groups. Note that the rankings may change if the ethnic groups in your pedigree change." |
352 | 292 | ]
|
353 | 293 | },
|
354 | 294 | {
|
|
0 commit comments