Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Brian Fannin authored and Brian Fannin committed Apr 12, 2016
1 parent 9815db8 commit 74f4a8a
Show file tree
Hide file tree
Showing 9 changed files with 1,939 additions and 0 deletions.
110 changes: 110 additions & 0 deletions 20_Setup.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@

```{r echo=FALSE, results='hide'}
library(pander)
```

# Setup

By the end of this chapter, you should have mastered the following:

* Install R installed
* Choose and possibly install an operating environment
* Enter a few basic commands

## Installation

R may be used on any of the popular operating systems available today. I've used R on a Windows system, a Mac and in a Linux environment. The experience is pretty much the same everywhere, which is one of the fantastic features of the software. In each case, what you'll do is download a file from the internet and then follow the standard process you go through to install software on whichever system you're using. For the most part, installation is quick and painless, but there may be limitations placed on you by your IT department. I have a few suggestions which I hope can help overcome any difficulties you might experience.

* Installing R

The first place to look for installation is cran.r-project.org. From there, you will see links to downloads for Windows, Mac and Linux. Clicking on the appropriate link will take you to the page that's relevant for your operating system. You may see lots of bizarre, arcane language around binaries, source and tarballs. If those words (in this context) mean nothing to you, don't panic. Some folk like to build their own version of R directly from the source code. If you're reading these instructions, you're probably not one of those people.

I reccommend getting familiar with the CRAN website and reading the documentation there. If you get totally lost, try the links below which should take you directly to the download site for Windows and Mac. (If you're running Linux, I can't imagine you need my help.)

* [Windows install](http://cran.revolutionanalytics.com/bin/windows/base/)
* [Mac install](http://cran.revolutionanalytics.com/bin/macosx/)

It's possible that you'll be asked to identify a "mirror". R is hosted on a number of servers throughout the world. It's all the same R, but distributing it in this way helps to minimized load on servers which host the files.

* Installing R Studio

Installing R is most of the battle. Depending on the sort of person you are, it may even be all of the battle (see the following section on environments). R comes with a fairly spartan user interface, which is sufficient to get work done. However, most folk find that they enjoy using an Integrated Development Environment (IDE). This allows one to work on several source files at the same time, read help, observe console output, see what variables are loaded in memory, etc. There are a few options, but I've not yet found anything better than RStudio.

RStudio's main website may be found at [www.rstudio.com](www.rstudio.com). At the time of writing, the download page may be found at [http://www.rstudio.com/products/rstudio/download/](http://www.rstudio.com/products/rstudio/download/). Here you will find links to specific systems. The browser will even attempt to detect what operating system you're using and suggest a link for you. Cool, huh?

### IT

I don't think I've ever met anyone who's made it through a white-collar existence without at least one or two frustrating exchanges with a corporate IT department. If you work for a large- or even small- company, you likely have a staff of folks who keep the network running and handle software requests from every user in the company. To ensure that your company's network is free from malicious attack or well-intentioned, but careless or imperfect users, most computers have sensitive areas restricted. This means that if you want to install software, you need an administrator to do it for you.

What this also means is that your IT department might not be as delighted as you are to install open-source software on the company laptop. This might be a problem that they're not inclined to solve for you and you may find your interaction with IT folks to a bit frustrating and they may seem as though they're not at all helpful.

The first thing to bear in mind is that, despite any appearance to the contrary, your IT staff is there to help you. Moreover, they're people. They have families who love them, possibly small children who think their moms and dads are awesome, pets who miss them and lives outside of work. They have to deal with ridiculous hours to accommodate you and they get far more complaints than they do praise. Be nice to them and you may be surprised how supportive they can be.

With that understood, there are several situations you may find yourself in.:

* You have- or can talk your way into- admin rights to your computer

Lucky you. Also lucky me as this is the happy situation that I enjoy. How do you handle this situation? Don't blow it! Be careful what you download, don't greedily consume bandwidth, server space or any of your company's other scarce resources and be VERY NICE to your IT staff. Acknowledge that you're grateful to have been given such trust and pledge not to do anything to have it removed.

* You don't have admin rights to your computer

What to do? Request to be given admin rights. Explain why, in detail and don't be evasive or vague. Trust and mutual respect help. Talk to other folks in your department and get them on board. Present a strong business case for why use of this software will permit you and your department to work more efficiently. Show them this book and underline the parts where I tell folk to be nice and respectful towards IT staff.

* IT won't give you admin rights to your computer

In this case, you may ask them to install it for you. Pool your resources. Talk to other actuaries and analysts in your company. Talk to your boss.

* IT won't install the software.

Solution? Install the software to a memory stick. Yes, it is often (but not always!) possible to do this. This is is obviously not a preferred option, but it will get you up and running and enable you to attend the workshop.

* Memory sticks are locked down

In this case, your IT department really wants you all to be running terminals. OK. Suggest an install of RStudio Server. This enables R to run on a server with controlled user access. This is quite a lot more work for your IT staff and you'll need to make a strong use case for it. If you're a large organization which has a predictive modelling or analytics area, they'll likely want this software. This won't allow you to use R remotely, so getting the most out of the workshop will be tough. However, there's still one more option.

* The nuclear option

Your IT staff won't run R on a server, won't give you a laptop with R installed. They're really against this software. I'd like to advise you to get another job, but that's defeatist. This is where we reach the nuclear option, which is to use your own computer. This will drive folks at your company nuts. Now you're transferring data from a secure machine to one which you use for personal e-mail, Facebook, sports, personal finance and other activities that we needn't dwell on here. This is an absolute last resort and the overheard of moving stuff from one device to another will obviate most of the efficiency gains that open source software will provide. Here's how to make it work: produce work that is ONLY POSSIBLE using R, or Python or any of the tools which we will discuss. Show a killer visual and then patiently explain to your boss why it can't be done in Excel and why you can't share it with other departments and why it can't be done every quarter. This is a tall order, but it just might get someone's attention.

## The Operating Environment

Right. So, you've got R installed. Now what? Among the first differences you'll encounter relative to Excel is that you now have several different options when it comes to using R. R is an engine designed to process R commands. Where you store those commands and how you deal with that output is something over which you have a great deal of control. Terrible, frighening control. Here are those options in a nutshell:

* Command-line interface (CLI)
* RGui
* RStudio
* Others

### Command-line interface

R, like S before it, presumed that users would interact with the program from the command line. And, if you invoke the R command from a terminal, that's exactly what you'll get. The image below is from my

![R at the command-line](images/R_CommandLine.png)

Throughout this book, I will assume that you're using RStudio. You don't have to, but I will strongly recommend it. Why?

* Things are easier with RStudio

RStudio, keeps track of all the variables in memory

* Everyone else is using it.

OK, not much of an argument. This is the exact opposite of the logic our parents used to try and discourage us from smoking. However, in this case, it makes sense. When you're talking with other people and trying to reproduce your problem or share your awesome code, they're probably using RStudio. Using the same tool reduces the amount of effort needed to communicate.

## Entering Commands

Now that you've got an is environment, you're ready to go. That cursor is blinking and waiting for you to tell it what to do! So what's the first thing you'll accomplish?

Well, not much. We'll get into more fun stuff in the next chapter, but for now let's play it safe. You can use R a basic calculator, so take a few minutes to enter some basic mathematical expressions.

```{r eval=TRUE, echo=TRUE}
1 + 1
pi
2*pi*4^2
```

* I can't find the console

In RStudio, the console may be reached by pressing CTRL-2 (Command-2 on Mac).
174 changes: 174 additions & 0 deletions 25_GettingStarted.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Getting started

### Mathematical Operators

```{r echo=FALSE, results='asis'}
df = data.frame(Operator = c("+", "-", "*", "/", "^")
, Operation = c("Addition", "Subtraction", "Multiplication"
, "Division", "Exponentiation"))
myTable = pandoc.table(df)
```

### Logical Operators

```{r echo=FALSE, results='asis'}
df = data.frame(Operator = c("&", "|", "!", "==", "!=", "<", "<=", ">", ">="
, "xor()", "&&", "||")
, Operation = c("and", "or", "not", "equal", "not equal"
, "less than", "less than or equal"
, "greater than", "greater than or equal"
, "exclusive or", "non-vector and", "non-vector or"))
myTable = pandoc.table(df)
```

### Assignment

Assignment will create a variable which contains a value. This value may be used later.

```{r}
r <- 4
r + 2
```

Both "<-" and "=" will work for assignment.

### Functions

Functions in R are very similar to functions in a spreadsheet. The function takes in arguments and returns a result.

```{r }
sqrt(4)
```

Functions may be composed by having the output of one serve as the input to another.

$\sqrt{e^{sin{\pi}}}$

```{r }
sqrt(exp(sin(pi)))
```

### A few mathematical functions

```{r eval=FALSE}
?S3groupGeneric
```

* abs, sign
* floor, ceiling, trunc, round, signif
* sqrt, exp, log
* cos, sin, tan (and many others)
* lgamma, gamma, digamma, trigamma

## Getting help

```{r eval=FALSE, echo=TRUE, size='tiny'}
?plot
??cluster
```

Within RStudio, the TAB key will autocomplete

## The working directory

The source of much frustration when starting out.

Where am I?

```{r eval=TRUE, echo=TRUE, size='tiny'}
getwd()
```

How do I get somewhere else?

```{r eval=FALSE, results='hide', size='tiny'}
setwd("~/SomeNewDirectory/SomeSubfolder")
```

Try to stick with relative pathnames. This makes work portable.

### Directory paths

R prefers *nix style directories, i.e. "/", NOT "\\". Windows prefers "\\".

"\\" is an "escape" character, used for things like tabs and newline characters. To get a single slash, just type it twice.

More on file operations in the handout.

### Source files

Typing, editing and debugging at the command line will get tedious quickly.

A source file (file extension .R) contains a sequence of commands.

Analogous to the formulae entered in a spreadsheet (but so much more powerful!)

## Your first script

```{r}
N <- 100
B0 <- 5
B1 <- 1.5
set.seed(1234)
e <- rnorm(N, mean = 0, sd = 1)
X1 <- rep(seq(1,10),10)
Y <- B0 + B1 * X1 + e
myFit <- lm(Y ~ X1)
```

Save this file.

CTRL-S on Windows/Linux, CMD-S on Mac.

### Executing a script

Either:

1. Open the file and execute the lines one at a time, or

2. Use the "source" function.

```{r eval=FALSE}
source("SomefileName.R")
```

Within RStudio, you may also click the "Source" button in the upper right hand corner.

### Comments

R uses the hash/pound character "#" to indicate comments.

SQL or C++ style multiline comments are not supported.

Comment early and often!

Comments should describe "why", not "what".

#### Bad comment
```{r eval=FALSE}
# Take the ratio of loss to premium to determine the loss ratio
lossRatio <- Losses / Premium
```

#### Good comment
```{r eval=FALSE}
# Because this is a retrospective view of
# profitability, these losses have been
# developed, but not trended to a future
# point in time
lossRatio <- Losses / Premium
```

## Quiz

* What is the area of a cylinder with radius = e and height = pi?
* What arguments are listed for the "plot" function?
* Find the help file for a generalized linear model
* Create a script which calculates the area of a cylinder. From a new script, assign the value 4 to a variable and source the other file. Assign the value 8 to your variable and source again. What happened?
Loading

0 comments on commit 74f4a8a

Please sign in to comment.