Skip to content

Commit

Permalink
Change filenames.
Browse files Browse the repository at this point in the history
  • Loading branch information
Brian Fannin authored and Brian Fannin committed Aug 1, 2016
1 parent 580fa04 commit d060ca8
Show file tree
Hide file tree
Showing 15 changed files with 310 additions and 187 deletions.
5 changes: 2 additions & 3 deletions 90_AdvancedVisualization.Rmd → AdvancedVisualization.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* ggplot2
* Maps

### ggplot2
## ggplot2

ggplot2 developed by Hadley Wickham, based on the "grammar of graphics"

Expand Down Expand Up @@ -127,7 +127,7 @@ Non-data elements are things like labels. Here's a sample of a few:

We won't cover this here.

## Questions
## Exercises

1. Create a scatter plot for policy year and number of claims
2. Color each point based on region
Expand Down Expand Up @@ -230,7 +230,6 @@ plt
```


## Summary

* `ggplot2` is difficult at first, but will repay your investment.
Expand Down
18 changes: 14 additions & 4 deletions 45_BasicVisualization.Rmd → BasicVisualization.Rmd
Original file line number Diff line number Diff line change
@@ -1,13 +1,23 @@
# Basic Visualization

It's impossible to overstate the importance of visualization in data analysis.
In this chapter, we're going to talk about data visualization. By the end of this chapter, you will be able to:

* Create a scatter plot
* Display categorical information in a bar plot
* Visualize univariate sample data with histograms and density plots
* Emphasize outliers
* Alter visual characteristics based on your dataf

## Overview

It's impossible to overstate the importance of visualization in data analysis. Rendering quantitative information visually is a critical aid in understanding our data, helping to model it and communicating our results. For a non-technical audience, it's

* Helps us explore data
* Suggest a model
* Assess the validity of a model and its parameters
* Vital for a non-technical audience

### Visualization in R
Although R has very powerful capabilities, its basic visualization is, well, basic. However, R's flexibility has allowed users to develop additional plotting engines that can produce some dazzling displays.

4 plotting engines (at least)

Expand All @@ -16,8 +26,6 @@ It's impossible to overstate the importance of visualization in data analysis.
* ggplot2
* rCharts

We'll look at the base plotting system now and ggplot2 after lunch.

### Common geometric objects

* scatter
Expand All @@ -28,6 +36,8 @@ We'll look at the base plotting system now and ggplot2 after lunch.
* barplot
* dotplot

## The `plot` function

plot is the most basic graphics command. There are several dozen options that you can set. Spend a lot of time reading the documentation and experimenting.

Open your first script.
Expand Down
2 changes: 1 addition & 1 deletion 40_Data.Rmd → Data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ df = read.csv("../data-raw/StateData.csv")
View(df)
```

### Questions
### Exercises

* Load the data from "StateData.csv" into a data frame.
* Which state has the most premium?
Expand Down
131 changes: 131 additions & 0 deletions GettingStarted.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Getting started

> "R isn't software. It's a community."
> --- John Chambers
This chapter will give you a short tour through R.

* Enter a few basic commands

## The Operating Environment

Right. So, you've got R installed. Now what? Among the first differences you'll encounter relative to Excel is that you now have several different options when it comes to using R. R is an engine designed to process R commands. Where you store those commands and how you deal with that output is something over which you have a great deal of control. Terrible, frighening control. Here are those options in a nutshell:

* Command-line interface (CLI)
* RGui
* RStudio
* Others

### Command-line interface

R, like S before it, presumed that users would interact with the program from the command line. And, if you invoke the R command from a terminal, that's exactly what you'll get. The image below is from my

![R at the command-line](images/R_CommandLine.png)

Throughout this book, I will assume that you're using RStudio. You don't have to, but I will strongly recommend it. Why?

* Things are easier with RStudio

RStudio, keeps track of all the variables in memory

* Everyone else is using it.

OK, not much of an argument. This is the exact opposite of the logic our parents used to try and discourage us from smoking. However, in this case, it makes sense. When you're talking with other people and trying to reproduce your problem or share your awesome code, they're probably using RStudio. Using the same tool reduces the amount of effort needed to communicate.

## Entering Commands

Now that you've got an is environment, you're ready to go. That cursor is blinking and waiting for you to tell it what to do! So what's the first thing you'll accomplish?

Well, not much. We'll get into more fun stuff in the next chapter, but for now let's play it safe. You can use R a basic calculator, so take a few minutes to enter some basic mathematical expressions.

```{r eval=TRUE, echo=TRUE}
1 + 1
pi
2*pi*4^2
```

* I can't find the console

In RStudio, the console may be reached by pressing CTRL-2 (Command-2 on Mac).

## Getting help

```{r eval=FALSE, echo=TRUE, size='tiny'}
?plot
??cluster
```

Within RStudio, the TAB key will autocomplete

## The working directory

The source of much frustration when starting out.

Where am I?

```{r eval=TRUE, echo=TRUE, size='tiny'}
getwd()
```

How do I get somewhere else?

```{r eval=FALSE, results='hide', size='tiny'}
setwd("~/SomeNewDirectory/SomeSubfolder")
```

Try to stick with relative pathnames. This makes work portable.

### Directory paths

R prefers *nix style directories, i.e. "/", NOT "\\". Windows prefers "\\".

"\\" is an "escape" character, used for things like tabs and newline characters. To get a single slash, just type it twice.

More on file operations in the handout.

### Source files

Typing, editing and debugging at the command line will get tedious quickly.

A source file (file extension .R) contains a sequence of commands.

Analogous to the formulae entered in a spreadsheet (but so much more powerful!)

## Your first script

```{r}
N <- 100
B0 <- 5
B1 <- 1.5
set.seed(1234)
e <- rnorm(N, mean = 0, sd = 1)
X1 <- rep(seq(1,10),10)
Y <- B0 + B1 * X1 + e
myFit <- lm(Y ~ X1)
```

Save this file.

CTRL-S on Windows/Linux, CMD-S on Mac.

### Executing a script

Either:

1. Open the file and execute the lines one at a time, or

2. Use the "source" function.

```{r eval=FALSE}
source("SomefileName.R")
```

Within RStudio, you may also click the "Source" button in the upper right hand corner.

129 changes: 35 additions & 94 deletions 25_GettingStarted.Rmd → LanguageElements.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,37 @@
library(pander)
```

# Getting started
# Elements of the Language

There are certain concepts common to virtually all programming languages. Those elements are: variables, functions and operators. This chapter will discuss what those are and how they're implemented in R. By the end of this chapter, you will be able to answer the following:

* What is a variable and how do I create and modify them?
* How do functions work?
*

If you're familiar with other languages like Visual Basic, Python or Java Script, you may be tempted to skip this section. If you do, you'll survive, but I'd suggest giving it a quick read. You may learn something about how R differs from those other languages.

## Variables

Programming languages work by assigning values to space in you computer's memory. Those values are then available for computation. Because the value of what's stored in memory may change, we call these things "variables". Think of a cell in a spreadsheet. Before we put something in it, it's just an empty box. We can fill it with whatever we like, be it a person's name, their birthdate, their age, whatever.

### Assignment

Assignment will create a variable which contains a value. This value may be used later.

```{r}
r <- 4
r + 2
```

Both "<-" and "=" will work for assignment.

### Data types

To a human, the difference between something numeric- like a person's age- and something textual - like their name - isn't a big deal. To a computer, however, this matters a lot. In order to ensure that there is sufficient memory to store the information and to ensure that it may be used in an operation, the computer needs to know what type of data it's working with. In other words: 5 + "Steve" = Huh?

## Operators

### Mathematical Operators

Expand All @@ -25,19 +55,7 @@ df = data.frame(Operator = c("&", "|", "!", "==", "!=", "<", "<=", ">", ">="
myTable = pandoc.table(df)
```

### Assignment

Assignment will create a variable which contains a value. This value may be used later.

```{r}
r <- 4
r + 2
```

Both "<-" and "=" will work for assignment.

### Functions
## Functions

Functions in R are very similar to functions in a spreadsheet. The function takes in arguments and returns a result.

Expand Down Expand Up @@ -65,85 +83,6 @@ sqrt(exp(sin(pi)))
* cos, sin, tan (and many others)
* lgamma, gamma, digamma, trigamma

## Getting help

```{r eval=FALSE, echo=TRUE, size='tiny'}
?plot
??cluster
```

Within RStudio, the TAB key will autocomplete

## The working directory

The source of much frustration when starting out.

Where am I?

```{r eval=TRUE, echo=TRUE, size='tiny'}
getwd()
```

How do I get somewhere else?

```{r eval=FALSE, results='hide', size='tiny'}
setwd("~/SomeNewDirectory/SomeSubfolder")
```

Try to stick with relative pathnames. This makes work portable.

### Directory paths

R prefers *nix style directories, i.e. "/", NOT "\\". Windows prefers "\\".

"\\" is an "escape" character, used for things like tabs and newline characters. To get a single slash, just type it twice.

More on file operations in the handout.

### Source files

Typing, editing and debugging at the command line will get tedious quickly.

A source file (file extension .R) contains a sequence of commands.

Analogous to the formulae entered in a spreadsheet (but so much more powerful!)

## Your first script

```{r}
N <- 100
B0 <- 5
B1 <- 1.5
set.seed(1234)
e <- rnorm(N, mean = 0, sd = 1)
X1 <- rep(seq(1,10),10)
Y <- B0 + B1 * X1 + e
myFit <- lm(Y ~ X1)
```

Save this file.

CTRL-S on Windows/Linux, CMD-S on Mac.

### Executing a script

Either:

1. Open the file and execute the lines one at a time, or

2. Use the "source" function.

```{r eval=FALSE}
source("SomefileName.R")
```

Within RStudio, you may also click the "Source" button in the upper right hand corner.

### Comments

R uses the hash/pound character "#" to indicate comments.
Expand All @@ -155,13 +94,15 @@ Comment early and often!
Comments should describe "why", not "what".

#### Bad comment

```{r eval=FALSE}
# Take the ratio of loss to premium to determine the loss ratio
lossRatio <- Losses / Premium
```

#### Good comment

```{r eval=FALSE}
# Because this is a retrospective view of
# profitability, these losses have been
Expand All @@ -170,7 +111,7 @@ lossRatio <- Losses / Premium
lossRatio <- Losses / Premium
```

## Quiz
## Exercises

* What is the area of a cylinder with radius = e and height = pi?
* What arguments are listed for the "plot" function?
Expand Down
4 changes: 2 additions & 2 deletions 35_Lists.Rmd → Lists.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ summary(x)
str(x)
```

### Lists
## Lists Overview

```{r echo=FALSE}
make_block(x)
Expand Down Expand Up @@ -93,7 +93,7 @@ Two reasons:

Because lists are arbitrary, we can't expect functions like `sum` or `mean` to work. Use `lapply` to summarize particular list elements.

## Questions
## Exercises

* Create a list with two elements. Have the first element be a vector with 100 numbers. Have the second element be a vector with 100 dates. Give your list the names: "Claim" and "AccidentDate".
* What is the average value of a claim?
Expand Down
Loading

0 comments on commit d060ca8

Please sign in to comment.