Skip to content
hankflury edited this page Aug 29, 2019 · 7 revisions
  1. How do I read in a table from a web page?
    • In R, you can use the rvest package, which has the read_table() function.
  2. How do I automate reading in tables from multiple web pages (or multiple files from disk) and combine them?
    • You can use a looping structure (or an *apply function in R) to iterate through a set of values (like filenames) and process each item using a function that receives the values you supply as function parameters.
  3. How do I reshape and merge datasets? (What is the difference between the several types of joins and when do I use them?)
    • There are many ways to reshape and/or merge datasets in R; it comes down to what you need to do and what you are comfortable with. The dplyr packages is a good starting place. All of the "join" functions return some combination of the rows and columns of the two datasets that were given to it, and the difference between the functions is what rows and columns they return. The different functions are documented here.
  4. What is a regular expression and how do I compose one?
    • A regular expression, or regex, is a sequence of characters that tell a script what characters to search for. For example, the regex "\d" may find any digits that appear in the character. You can learn more about regexs, and test your own here
  5. How do I automate the collection of data from the web on a schedule to compile data over time?
    • You can write a script and set up a scheduled task (e.g., in "cron"), which can append it's output to a CSV file.
  6. What are the most important Tidyverse functions and packages and how/when do I use them?
  7. How do I write portable code that won't break if someone else tries to run it?
  8. How do I create a Markdown file from a regular script? (Or the other way around.)
  9. How do I avoid copying and pasting code if I want to repeat something with slight variations?
    • You can create a function and pass in the items you might want to vary. A function is simply a way to run a segment of code, while still being able to specify the different variables you may have.
  10. How do I combine the results of outputs from functions like summary (of lm, glm, etc.) run multiple times into a single data set?
  11. Coding is hard/scary/time-consuming/tedious. How do I become more confident/proficient?
  12. I am about to graduate. What language/skill/technology should I learn to help me get a good job?
  13. What do all of those "apply" functions do and how do I use them?
  • "apply" functions are used to apply functions to multiple values at a time. The difference between the functions is in what form they return the data: lapply() returns a list, sapply() returns a vector, and mapply() returns a matrix.
  1. What is a "tidy" dataset and how do I create one?
  • A "tidy" dataset is a dataset that follows the rules of the "Tidyverse". The Tidyverse is a set of standards and packages created to help make data analysis clear and concise. You can learn more about Tidyverse on their website tidyverse.org
  1. Why won't this code I got from someone else run on my computer?
  2. What is a "list" and how do I convert it to a data frame?
  • A list is an object that can store objects of different types. For instance, a list can store numbers, data frames, characters, plots, and even other lists.
  1. I've taken several courses that use R. Why am I not getting any better at it?
  2. Why doesn't this work? (This is the #1 question, really. It leads to most of the others.)
Clone this wiki locally