-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathbootcamp-r-series.Rmd
280 lines (193 loc) · 10.5 KB
/
bootcamp-r-series.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
---
title: "R for Data Science Bootcamp"
---
<style><!--Page specific style-->
h1 {color: #000}
h2 {color: #111}
h3 {color: #222; border-bottom: 1px solid #ccc; width: 75%;}
h4 {color: #666}
.top {color: #ccc; display: inline-block; padding: 25px 0px 5px 0px;}
</style>
<img src="media/img/Rlogo.png" style="width: 100px; float: left; margin: 0px 20px 0px 0px;">
NV INBRE and the Nevada Bioinformatics Center are pleased to announce the R for Data Science Bootcamp. R is a complete, flexible, and open source system for statistical analysis and graphics, and has become a tool of choice for biologists and biomedical scientists.
These five sessions will introduce R and RStudio through hands-on learning activities. **No prior data analysis experience is necessary!** Participants will receive example data sets to practice data manipulation, organization, and graphical exploration. Participants are also encouraged to bring their own data and data challenges. In addition, major concepts of data science such as reproducibility will be addressed. Participants will gain the skills necessary to manage and organize data, run basic analyses, and generate professional documents and figures in R.
### Audience and Prerequisites
No prior knowledge of data analytics or coding are necessary! This bootcamp is addressed to beginners wanting to become familiar with the R syntax, environment, and the most common commands to start using R to explore, interpret, and present their data.
Although there are no prerequisites for this bootcamp, you are encouraged to go through some of the R documentation available here.
Please also note this course **is NOT a training on statistics** but rather a training on how to use R to perform different tasks.
### Application to Participate
Application is required to be considered to participate. Students and faculty from TMCC, WNC, SNU, and UNR interested in R are encouraged to apply. Applicants may sign up for individual workshops, but priority will be given to those that sign up for the whole series.
**For students:**
<span style="font-size: 14pt;"><a href="https://nvideaoffice.formstack.com/forms/bootcamp_application">Click here to be taken to the application form</a></span>
Application requires:
- Copy of unofficial transcript
- Academic references
- Personal statements
- Tell us about yourself and why you are interested (1000-character limit)
- How will this workshop affect your academic and professional career (1000-character limit)
- Any research you may currently be involved in (1000-character limit)
- Anything else you want to share (1000-character limit)
Application is open until **May 14th at 3pm**, but will close as soon as spots are filled with qualifying participants.
You will receive an email to confirm participation confirmation by no later than Monday, May 17th. Upon reception of the confirmation email, participants will be asked to confirm their attendance within 2 days.
**For faculty:**
Please contact Dr. Juli Petereit directly at <[email protected]> (priority will be given to students, but we are planning a faculty-specific workshop, and your interest/feedback would be appreciated).
### Additional Information
Coordinator: Juli Petereit
Non-UNR affiliates will receive a parking pass sponsored by NV INBRE.
Participants attending all section will receive a certificate of completion.
Important:
- Applications from TMCC, WNC, and SNU will receive priority
- Attendance is required in-person
---
## Schedule {#schedule}
<table class="tg">
<thead>
<tr>
<th>Workshop</th>
<th>Date and Time</th>
<th>Speaker(s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>[Introduction to R Part 1](#intro1)</td>
<td>May 20, 2021, 1-4pm</td>
<td>Alex, Lucas, and Juli</td>
</tr>
<tr>
<td>[Introduction to R Part 2](#intro2)</td>
<td>May 27, 2021, 1-4pm</td>
<td>Alex, Lucas, and Juli</td>
</tr>
<tr>
<td>[Reproducible Data Science](#reproduce)</td>
<td>June 3, 2021, 1-4pm</td>
<td>Alex, Lucas, and Juli</td>
</tr>
<tr>
<td>[Tidy R](#tidy)</td>
<td>June 10, 2021, 1-4pm</td>
<td>Alex, Lucas, and Juli</td>
</tr>
<tr>
<td>[Producing Clean Documents, Graphs, and Tables](#docs)</td>
<td>June 17, 2021, 1-4pm</td>
<td>Alex, Lucas, and Juli</td>
</tr>
<tr>
<td>Open R Lab</td>
<td>June 24, 2021, 1-4pm</td>
<td>Lucas and Juli</td>
</tr>
</tbody>
</table>
---
## Details
### Introduction to R Part 1 {#intro1}
This workshop serves as a gentle introduction to programming with R, with a focus on application to data science and research. R basics will be covered, such as using the command line as a calculator, storing values in a variable, basic data structures (vector, matrix, list, data frame, etc.), and data types. We will show how to load basic data sets and access rows/columns, and summarize data with statistics and plots. We will also cover some considerations when choosing R over another language.
#### Objectives
Participants will be able to:
- open the R command line interface (CLI)
- access the built-in help docs
- perform basic arithmetic using R
- store data in variables
- list the environment variables
- understand the basic data types in R
- convert between the different data types
- understand and use the basic data structures in R
- access rows, columns, and elements of matrices and data frames
- read data from a `csv` into a data frame
- fit simple linear models
- summarize data and linear models
- create simple plots of 1-D and 2-D data
- understand the similarities and differences between R, Python, Julia, and other programming languages
#### Background Knowledge
Participants should know:
- how to operate a computer
- arithmetic and order of operations
- basic statistics (summary statistics, simple linear models)
- basic ideas of plots and charts
#### Session Notes
[Click here for the session notes](bootcamp-r-series-010-intro-to-r-pt1.html)
<a href='#schedule' class='top'>back to top</a>
### Introduction to R Part 2 {#intro2}
This workshop introduces scripting in R, including control flow, writing functions, and installing/loading libraries (and the differences between CRAN, GitHub, and Bioconductor packages). We will go over the `-apply` family of functions, simple graphs, and good practices in R (style guide, organization, etc.).
#### Objectives
Participants will be able to:
- open up RStudio and identify key parts of the IDE
- create, open, and save R scripts
- run R scripts line by line or all at once
- understand and use the `-apply` family of functions
- use control flow to execute complex tasks
- write functions and understand when to use them
- install and load libraries
- understand the differences between CRAN, Bioconductor, and GitHub
- understand what a style guide is and why it is important
- use good practices for writing and organizing code
#### Background Knowledge
Participants should know:
- how to open an application on a computer
- the basic functions in R
- the basic ideas of conditional statements
#### Session Notes
[Click here for the session notes](bootcamp-r-series-020-intro-to-r-pt2.html)
<a href='#schedule' class='top'>back to top</a>
### Reproducible Data Science {#reproduce}
We will discuss the problems we commonly see and introduce what it means to conduct reproducible data science. During the workshop, we will go through an example of setting up a new project folder, loading "messy" data, organizing scripts, and creating basic Rmarkdown reports. In reference to the messy data, we will go over what "tidy" data is, and how to use `tidyr` and other `tidyverse` libraries.
#### Objectives
Participants will be able to:
- understand the principles of reproducible [data] science
- identify bad project management habits
- organize a new project folder
- write code that is human-readable
- identify helpful libraries for data wrangling
- understand and implement the principles of immutable data
- create reports and other media using Rmarkdown
- use GitHub to track project changes
#### Background Knowledge
Participants should know:
- scripting with R or a similar programming language
- how to create directories and folders
- basic git or a git GUI
- basic [markdown syntax](https://www.markdownguide.org/cheat-sheet/)
#### Session Notes
[Click here for the session notes](bootcamp-r-series-030-reproducible.html)
<a href='#schedule' class='top'>back to top</a>
### Tidy R {#tidy}
This will be a workshop fully devoted to data wrangling, including loading and cleaning "dirty" data, formatting "messy" data, and transforming "tidy" data to suit the task. There will be an emphasis on `tidyverse` libraries, but we will mention `data.table` as a faster alternative with fewer dependencies but with the trade-off of being less "readable".
#### Objectives
Participants will be able to:
- understand what "tidy" data is
- understand and use the various tidyverse packages
- read in data from various formats
- clean data using the tidyverse
- reshape wide data into long format, and long data into wide format
#### Background Knowledge
Participants should know:
- how to program with R or some other scripting language
- how to navigate RStudio
- how to access the help documentation in R
#### Session Notes
[Click here for the session notes](bootcamp-r-series-040-tidy.html)
<a href='#schedule' class='top'>back to top</a>
### Producing Clean Documents, Graphs, and Tables {#docs}
In this workshop, we will go over some principles of the visual display of quantitative data, and go in-depth with plotting in R with `base` and `ggplot2`. Additionally, since not all data is well represented in a graph, we will go over creating pretty tables using the `kable` and `kableExtra` packages.
#### Objectives
Participants will be able to
- understand the basic principles of the visual display of quantitative information
- produce clean and professional graphs using base `R`
- produce clean and professional graphs using `ggplot2`
- produce clean tables using `kable` and `kableExtra`
- create clean documents using `Rmarkdown`
#### Background Knowledge
Participants should know:
- how to program with R or some other scripting language
- how to navigate RStudio
- basic markdown syntax
- some LaTex or mathjax
- how to search for additional information
#### Session Notes
[Click here for the session notes](bootcamp-r-series-050-documents.html)
<a href='#schedule' class='top'>back to top</a>
## Acknowledgements
This bootcamp was made possible by a grant from the National Institute of General Medical Sciences (GM103440) from the National Institutes of Health.