From d060ca8f2ca33a65c46df39680e92064870b393c Mon Sep 17 00:00:00 2001 From: Brian Fannin Date: Sun, 31 Jul 2016 21:51:46 -0400 Subject: [PATCH] Change filenames. --- ...alization.Rmd => AdvancedVisualization.Rmd | 5 +- ...isualization.Rmd => BasicVisualization.Rmd | 18 ++- 40_Data.Rmd => Data.Rmd | 2 +- GettingStarted.Rmd | 131 ++++++++++++++++++ 25_GettingStarted.Rmd => LanguageElements.Rmd | 129 +++++------------ 35_Lists.Rmd => Lists.Rmd | 4 +- ...Distributions.Rmd => LossDistributions.Rmd | 8 +- Packages.Rmd | 29 ++++ References.Rmd | 6 + Resources.Rmd | 3 + 20_Setup.Rmd => Setup.Rmd | 84 ++++------- 60_Simulation.Rmd => Simulation.Rmd | 7 +- 30_Vectors.Rmd => Vectors.Rmd | 17 +-- _bookdown.yml | 20 ++- index.Rmd | 34 +++-- 15 files changed, 310 insertions(+), 187 deletions(-) rename 90_AdvancedVisualization.Rmd => AdvancedVisualization.Rmd (99%) rename 45_BasicVisualization.Rmd => BasicVisualization.Rmd (72%) rename 40_Data.Rmd => Data.Rmd (99%) create mode 100644 GettingStarted.Rmd rename 25_GettingStarted.Rmd => LanguageElements.Rmd (61%) rename 35_Lists.Rmd => Lists.Rmd (99%) rename 50_LossDistributions.Rmd => LossDistributions.Rmd (98%) create mode 100644 Packages.Rmd create mode 100644 References.Rmd create mode 100644 Resources.Rmd rename 20_Setup.Rmd => Setup.Rmd (61%) rename 60_Simulation.Rmd => Simulation.Rmd (98%) rename 30_Vectors.Rmd => Vectors.Rmd (89%) diff --git a/90_AdvancedVisualization.Rmd b/AdvancedVisualization.Rmd similarity index 99% rename from 90_AdvancedVisualization.Rmd rename to AdvancedVisualization.Rmd index fda2a5b..ddf4ada 100644 --- a/90_AdvancedVisualization.Rmd +++ b/AdvancedVisualization.Rmd @@ -3,7 +3,7 @@ * ggplot2 * Maps -### ggplot2 +## ggplot2 ggplot2 developed by Hadley Wickham, based on the "grammar of graphics" @@ -127,7 +127,7 @@ Non-data elements are things like labels. Here's a sample of a few: We won't cover this here. -## Questions +## Exercises 1. Create a scatter plot for policy year and number of claims 2. Color each point based on region @@ -230,7 +230,6 @@ plt ``` - ## Summary * `ggplot2` is difficult at first, but will repay your investment. diff --git a/45_BasicVisualization.Rmd b/BasicVisualization.Rmd similarity index 72% rename from 45_BasicVisualization.Rmd rename to BasicVisualization.Rmd index d9dca7f..ac3f426 100644 --- a/45_BasicVisualization.Rmd +++ b/BasicVisualization.Rmd @@ -1,13 +1,23 @@ # Basic Visualization -It's impossible to overstate the importance of visualization in data analysis. +In this chapter, we're going to talk about data visualization. By the end of this chapter, you will be able to: + +* Create a scatter plot +* Display categorical information in a bar plot +* Visualize univariate sample data with histograms and density plots +* Emphasize outliers +* Alter visual characteristics based on your dataf + +## Overview + +It's impossible to overstate the importance of visualization in data analysis. Rendering quantitative information visually is a critical aid in understanding our data, helping to model it and communicating our results. For a non-technical audience, it's * Helps us explore data * Suggest a model * Assess the validity of a model and its parameters * Vital for a non-technical audience -### Visualization in R +Although R has very powerful capabilities, its basic visualization is, well, basic. However, R's flexibility has allowed users to develop additional plotting engines that can produce some dazzling displays. 4 plotting engines (at least) @@ -16,8 +26,6 @@ It's impossible to overstate the importance of visualization in data analysis. * ggplot2 * rCharts -We'll look at the base plotting system now and ggplot2 after lunch. - ### Common geometric objects * scatter @@ -28,6 +36,8 @@ We'll look at the base plotting system now and ggplot2 after lunch. * barplot * dotplot +## The `plot` function + plot is the most basic graphics command. There are several dozen options that you can set. Spend a lot of time reading the documentation and experimenting. Open your first script. diff --git a/40_Data.Rmd b/Data.Rmd similarity index 99% rename from 40_Data.Rmd rename to Data.Rmd index 8100e1b..a74361f 100644 --- a/40_Data.Rmd +++ b/Data.Rmd @@ -232,7 +232,7 @@ df = read.csv("../data-raw/StateData.csv") View(df) ``` -### Questions +### Exercises * Load the data from "StateData.csv" into a data frame. * Which state has the most premium? diff --git a/GettingStarted.Rmd b/GettingStarted.Rmd new file mode 100644 index 0000000..26b2287 --- /dev/null +++ b/GettingStarted.Rmd @@ -0,0 +1,131 @@ +# Getting started + +> "R isn't software. It's a community." +> --- John Chambers + +This chapter will give you a short tour through R. + +* Enter a few basic commands + +## The Operating Environment + +Right. So, you've got R installed. Now what? Among the first differences you'll encounter relative to Excel is that you now have several different options when it comes to using R. R is an engine designed to process R commands. Where you store those commands and how you deal with that output is something over which you have a great deal of control. Terrible, frighening control. Here are those options in a nutshell: + +* Command-line interface (CLI) +* RGui +* RStudio +* Others + +### Command-line interface + +R, like S before it, presumed that users would interact with the program from the command line. And, if you invoke the R command from a terminal, that's exactly what you'll get. The image below is from my + +![R at the command-line](images/R_CommandLine.png) + +Throughout this book, I will assume that you're using RStudio. You don't have to, but I will strongly recommend it. Why? + +* Things are easier with RStudio + +RStudio, keeps track of all the variables in memory + +* Everyone else is using it. + +OK, not much of an argument. This is the exact opposite of the logic our parents used to try and discourage us from smoking. However, in this case, it makes sense. When you're talking with other people and trying to reproduce your problem or share your awesome code, they're probably using RStudio. Using the same tool reduces the amount of effort needed to communicate. + +## Entering Commands + +Now that you've got an is environment, you're ready to go. That cursor is blinking and waiting for you to tell it what to do! So what's the first thing you'll accomplish? + +Well, not much. We'll get into more fun stuff in the next chapter, but for now let's play it safe. You can use R a basic calculator, so take a few minutes to enter some basic mathematical expressions. + +```{r eval=TRUE, echo=TRUE} +1 + 1 + +pi + +2*pi*4^2 +``` + +* I can't find the console + +In RStudio, the console may be reached by pressing CTRL-2 (Command-2 on Mac). + +## Getting help + +```{r eval=FALSE, echo=TRUE, size='tiny'} +?plot + +??cluster +``` + +Within RStudio, the TAB key will autocomplete + +## The working directory + +The source of much frustration when starting out. + +Where am I? + +```{r eval=TRUE, echo=TRUE, size='tiny'} +getwd() +``` + +How do I get somewhere else? + +```{r eval=FALSE, results='hide', size='tiny'} +setwd("~/SomeNewDirectory/SomeSubfolder") +``` + +Try to stick with relative pathnames. This makes work portable. + +### Directory paths + +R prefers *nix style directories, i.e. "/", NOT "\\". Windows prefers "\\". + +"\\" is an "escape" character, used for things like tabs and newline characters. To get a single slash, just type it twice. + +More on file operations in the handout. + +### Source files + +Typing, editing and debugging at the command line will get tedious quickly. + +A source file (file extension .R) contains a sequence of commands. + +Analogous to the formulae entered in a spreadsheet (but so much more powerful!) + +## Your first script + +```{r} +N <- 100 +B0 <- 5 +B1 <- 1.5 + +set.seed(1234) + +e <- rnorm(N, mean = 0, sd = 1) +X1 <- rep(seq(1,10),10) + +Y <- B0 + B1 * X1 + e + +myFit <- lm(Y ~ X1) +``` + +Save this file. + +CTRL-S on Windows/Linux, CMD-S on Mac. + +### Executing a script + +Either: + +1. Open the file and execute the lines one at a time, or + +2. Use the "source" function. + +```{r eval=FALSE} +source("SomefileName.R") +``` + +Within RStudio, you may also click the "Source" button in the upper right hand corner. + diff --git a/25_GettingStarted.Rmd b/LanguageElements.Rmd similarity index 61% rename from 25_GettingStarted.Rmd rename to LanguageElements.Rmd index 8f2d501..db6ce6c 100644 --- a/25_GettingStarted.Rmd +++ b/LanguageElements.Rmd @@ -2,7 +2,37 @@ library(pander) ``` -# Getting started +# Elements of the Language + +There are certain concepts common to virtually all programming languages. Those elements are: variables, functions and operators. This chapter will discuss what those are and how they're implemented in R. By the end of this chapter, you will be able to answer the following: + +* What is a variable and how do I create and modify them? +* How do functions work? +* + +If you're familiar with other languages like Visual Basic, Python or Java Script, you may be tempted to skip this section. If you do, you'll survive, but I'd suggest giving it a quick read. You may learn something about how R differs from those other languages. + +## Variables + +Programming languages work by assigning values to space in you computer's memory. Those values are then available for computation. Because the value of what's stored in memory may change, we call these things "variables". Think of a cell in a spreadsheet. Before we put something in it, it's just an empty box. We can fill it with whatever we like, be it a person's name, their birthdate, their age, whatever. + +### Assignment + +Assignment will create a variable which contains a value. This value may be used later. + +```{r} +r <- 4 + +r + 2 +``` + +Both "<-" and "=" will work for assignment. + +### Data types + +To a human, the difference between something numeric- like a person's age- and something textual - like their name - isn't a big deal. To a computer, however, this matters a lot. In order to ensure that there is sufficient memory to store the information and to ensure that it may be used in an operation, the computer needs to know what type of data it's working with. In other words: 5 + "Steve" = Huh? + +## Operators ### Mathematical Operators @@ -25,19 +55,7 @@ df = data.frame(Operator = c("&", "|", "!", "==", "!=", "<", "<=", ">", ">=" myTable = pandoc.table(df) ``` -### Assignment - -Assignment will create a variable which contains a value. This value may be used later. - -```{r} -r <- 4 - -r + 2 -``` - -Both "<-" and "=" will work for assignment. - -### Functions +## Functions Functions in R are very similar to functions in a spreadsheet. The function takes in arguments and returns a result. @@ -65,85 +83,6 @@ sqrt(exp(sin(pi))) * cos, sin, tan (and many others) * lgamma, gamma, digamma, trigamma -## Getting help - -```{r eval=FALSE, echo=TRUE, size='tiny'} -?plot - -??cluster -``` - -Within RStudio, the TAB key will autocomplete - -## The working directory - -The source of much frustration when starting out. - -Where am I? - -```{r eval=TRUE, echo=TRUE, size='tiny'} -getwd() -``` - -How do I get somewhere else? - -```{r eval=FALSE, results='hide', size='tiny'} -setwd("~/SomeNewDirectory/SomeSubfolder") -``` - -Try to stick with relative pathnames. This makes work portable. - -### Directory paths - -R prefers *nix style directories, i.e. "/", NOT "\\". Windows prefers "\\". - -"\\" is an "escape" character, used for things like tabs and newline characters. To get a single slash, just type it twice. - -More on file operations in the handout. - -### Source files - -Typing, editing and debugging at the command line will get tedious quickly. - -A source file (file extension .R) contains a sequence of commands. - -Analogous to the formulae entered in a spreadsheet (but so much more powerful!) - -## Your first script - -```{r} -N <- 100 -B0 <- 5 -B1 <- 1.5 - -set.seed(1234) - -e <- rnorm(N, mean = 0, sd = 1) -X1 <- rep(seq(1,10),10) - -Y <- B0 + B1 * X1 + e - -myFit <- lm(Y ~ X1) -``` - -Save this file. - -CTRL-S on Windows/Linux, CMD-S on Mac. - -### Executing a script - -Either: - -1. Open the file and execute the lines one at a time, or - -2. Use the "source" function. - -```{r eval=FALSE} -source("SomefileName.R") -``` - -Within RStudio, you may also click the "Source" button in the upper right hand corner. - ### Comments R uses the hash/pound character "#" to indicate comments. @@ -155,6 +94,7 @@ Comment early and often! Comments should describe "why", not "what". #### Bad comment + ```{r eval=FALSE} # Take the ratio of loss to premium to determine the loss ratio @@ -162,6 +102,7 @@ lossRatio <- Losses / Premium ``` #### Good comment + ```{r eval=FALSE} # Because this is a retrospective view of # profitability, these losses have been @@ -170,7 +111,7 @@ lossRatio <- Losses / Premium lossRatio <- Losses / Premium ``` -## Quiz +## Exercises * What is the area of a cylinder with radius = e and height = pi? * What arguments are listed for the "plot" function? diff --git a/35_Lists.Rmd b/Lists.Rmd similarity index 99% rename from 35_Lists.Rmd rename to Lists.Rmd index 78a0b4e..37591c5 100644 --- a/35_Lists.Rmd +++ b/Lists.Rmd @@ -23,7 +23,7 @@ summary(x) str(x) ``` -### Lists +## Lists Overview ```{r echo=FALSE} make_block(x) @@ -93,7 +93,7 @@ Two reasons: Because lists are arbitrary, we can't expect functions like `sum` or `mean` to work. Use `lapply` to summarize particular list elements. -## Questions +## Exercises * Create a list with two elements. Have the first element be a vector with 100 numbers. Have the second element be a vector with 100 dates. Give your list the names: "Claim" and "AccidentDate". * What is the average value of a claim? diff --git a/50_LossDistributions.Rmd b/LossDistributions.Rmd similarity index 98% rename from 50_LossDistributions.Rmd rename to LossDistributions.Rmd index a65d886..d69c4c2 100644 --- a/50_LossDistributions.Rmd +++ b/LossDistributions.Rmd @@ -7,7 +7,7 @@ By the end of this chapter, you will know the following: * How to fit a loss distributions * Goodness of fit -### Packages we'll use +## Packages we'll use * `MASS` (MASS = Modern Applied Statistics in S) * `fitdistr` will fit a distribution to a loss distribution function @@ -218,7 +218,7 @@ Direct optimization allows us to create another objective function to maximize, Note that optimization is a general, solved problem. Things like the simplex method already have package solutions in R. You don't need to reinvent the wheel! -###Questions +## Exercises * Plot a lognormal distribution with a mean of $10,000 and a CV of 30%. * For that distribution, what is the probability of seeing a claim greater than $100,000? @@ -229,7 +229,7 @@ Note that optimization is a general, solved problem. Things like the simplex met * Assuming that losses are Poisson distributed, with expected value of 200, estimate the aggregate loss distribution. * What is the cost of a $50,000 xs $50,000 layer of reinsurance? -###Answers +### Answers ```{r } severity <- 10000 @@ -243,7 +243,7 @@ plot(function(x) dlnorm(x), mu, sigma, ylab="LN f(x)") What is the cost of a 1 million xs 1 million layer? -A few other questions: +A few more exercises: * How many times has the layer been hit? * What is the average cost when it has been? diff --git a/Packages.Rmd b/Packages.Rmd new file mode 100644 index 0000000..876f3ef --- /dev/null +++ b/Packages.Rmd @@ -0,0 +1,29 @@ +# Packages + +Mumble, mumble + +By the end of this chapter you will: + +* Understand what packages are and where they come from +* Install a packages +* Understand the difference between loading and installing a package + +## What are packages? + +Packages are one the killer features of R and one of the key elements of the language's success. Rather than having a software company hire coders to develop new capabilities, R encourages users to develop their own. There was no support for the chain ladder reserving method in R when it was originally built, but there is now, thanks to the `ChainLadder` package. + +### Where are packages stored? + +Packages are hosted on a + +## How to install a package + +```{r eval=FALSE, echo=TRUE} +install.packages("raw") +``` + +```{r eval=FALSE, echo=TRUE} +install.packages("raw") +``` + +### Where does the package get installed? \ No newline at end of file diff --git a/References.Rmd b/References.Rmd new file mode 100644 index 0000000..5e4f2b5 --- /dev/null +++ b/References.Rmd @@ -0,0 +1,6 @@ +# References {-} + +```{r include=FALSE} +knitr::write_bib(c(.packages(), 'bookdown', 'knitr', 'rmarkdown', 'webshot', 'servr') + , 'packages.bib') +``` diff --git a/Resources.Rmd b/Resources.Rmd new file mode 100644 index 0000000..50f7f86 --- /dev/null +++ b/Resources.Rmd @@ -0,0 +1,3 @@ +> "R isn't software. It's a community." +> --- John Chambers + diff --git a/20_Setup.Rmd b/Setup.Rmd similarity index 61% rename from 20_Setup.Rmd rename to Setup.Rmd index 0f59213..46d5c63 100644 --- a/20_Setup.Rmd +++ b/Setup.Rmd @@ -1,3 +1,6 @@ +# First Steps {-} + +The first part of this book will get you started using R. Unless you've been using R comfortably for several months, resist the tempation to skip through this quickly. We've found that one of the biggest hurdles to getting users comfortable with R is getting the software installed and running with a minimum of hassle. The next step is getting folks reconciled to the idea that R is very different from Excel. You can like that, or you can hate that, but you can't change that. Hopefully, by the end of this section, you'll be open to the idea that R can accomplish some tasks and it's something you can add to your tool kit. ```{r echo=FALSE, results='hide'} library(pander) @@ -5,28 +8,44 @@ library(pander) # Setup -By the end of this chapter, you should have mastered the following: +By the end of this chapter, you will have done the following: + +* Install R +* Install RStudio +* Install the `raw` package + +## Operating systems + +Although R was developed primarily on Unix-based operating systems, it may be used on many different platforms. As I write these words, there are three major systems in use: Windows, Mac OS and Linux. I've used R and RStudio on all three and the experience is pretty much the same. This is one of the fantastic features of the software. It's meant to be as widely used and portable as possible to maximize its use. -* Install R installed -* Choose and possibly install an operating environment -* Enter a few basic commands +There are one or two operating system quirks, but in general I won't need to OS differences again, apart from one preliminary note. When refrring to keystroke combinationsn, I will only refer to the CTRL key. Mac OS users will understand that this key is CMD on their keyboards. -## Installation +If you're curious, the version of R and system architecture being used to write this book are noted below: -R may be used on any of the popular operating systems available today. I've used R on a Windows system, a Mac and in a Linux environment. The experience is pretty much the same everywhere, which is one of the fantastic features of the software. In each case, what you'll do is download a file from the internet and then follow the standard process you go through to install software on whichever system you're using. For the most part, installation is quick and painless, but there may be limitations placed on you by your IT department. I have a few suggestions which I hope can help overcome any difficulties you might experience. +```{r echo = FALSE} +sessionInfo()$platform +``` + +### Installing R -* Installing R +In each case, what you'll do is download a file from the internet and then follow the standard process you go through to install software on whichever system you're using. For the most part, installation is quick and painless, but there may be limitations placed on you by your IT department. I have a few suggestions which I hope can help overcome any difficulties you might experience. -The first place to look for installation is cran.r-project.org. From there, you will see links to downloads for Windows, Mac and Linux. Clicking on the appropriate link will take you to the page that's relevant for your operating system. You may see lots of bizarre, arcane language around binaries, source and tarballs. If those words (in this context) mean nothing to you, don't panic. Some folk like to build their own version of R directly from the source code. If you're reading these instructions, you're probably not one of those people. +The first place to look for installation is [cran.r-project.org]. From there, you will see links to downloads for Windows, Mac and Linux. Clicking on the appropriate link will take you to the page that's relevant for your operating system. You may see lots of bizarre, arcane language around binaries, source and tarballs. If those words (in this context) mean nothing to you, don't panic. Some folk like to build their own version of R directly from the source code. If you're reading these instructions, you're probably not one of those people. -I reccommend getting familiar with the CRAN website and reading the documentation there. If you get totally lost, try the links below which should take you directly to the download site for Windows and Mac. (If you're running Linux, I can't imagine you need my help.) +I recommend getting familiar with the CRAN website and reading the documentation there. If you get totally lost, try the links below which should take you directly to the download site for Windows and Mac. (If you're running Linux, I can't imagine you need my help.) * [Windows install](http://cran.revolutionanalytics.com/bin/windows/base/) * [Mac install](http://cran.revolutionanalytics.com/bin/macosx/) +https://cran.r-project.org/bin/windows/base/ + It's possible that you'll be asked to identify a "mirror". R is hosted on a number of servers throughout the world. It's all the same R, but distributing it in this way helps to minimized load on servers which host the files. -* Installing R Studio +### Bitness + +32 vs 64 + +### Installing R Studio Installing R is most of the battle. Depending on the sort of person you are, it may even be all of the battle (see the following section on environments). R comes with a fairly spartan user interface, which is sufficient to get work done. However, most folk find that they enjoy using an Integrated Development Environment (IDE). This allows one to work on several source files at the same time, read help, observe console output, see what variables are loaded in memory, etc. There are a few options, but I've not yet found anything better than RStudio. @@ -64,47 +83,4 @@ In this case, your IT department really wants you all to be running terminals. O * The nuclear option -Your IT staff won't run R on a server, won't give you a laptop with R installed. They're really against this software. I'd like to advise you to get another job, but that's defeatist. This is where we reach the nuclear option, which is to use your own computer. This will drive folks at your company nuts. Now you're transferring data from a secure machine to one which you use for personal e-mail, Facebook, sports, personal finance and other activities that we needn't dwell on here. This is an absolute last resort and the overheard of moving stuff from one device to another will obviate most of the efficiency gains that open source software will provide. Here's how to make it work: produce work that is ONLY POSSIBLE using R, or Python or any of the tools which we will discuss. Show a killer visual and then patiently explain to your boss why it can't be done in Excel and why you can't share it with other departments and why it can't be done every quarter. This is a tall order, but it just might get someone's attention. - -## The Operating Environment - -Right. So, you've got R installed. Now what? Among the first differences you'll encounter relative to Excel is that you now have several different options when it comes to using R. R is an engine designed to process R commands. Where you store those commands and how you deal with that output is something over which you have a great deal of control. Terrible, frighening control. Here are those options in a nutshell: - -* Command-line interface (CLI) -* RGui -* RStudio -* Others - -### Command-line interface - -R, like S before it, presumed that users would interact with the program from the command line. And, if you invoke the R command from a terminal, that's exactly what you'll get. The image below is from my - -![R at the command-line](images/R_CommandLine.png) - -Throughout this book, I will assume that you're using RStudio. You don't have to, but I will strongly recommend it. Why? - -* Things are easier with RStudio - -RStudio, keeps track of all the variables in memory - -* Everyone else is using it. - -OK, not much of an argument. This is the exact opposite of the logic our parents used to try and discourage us from smoking. However, in this case, it makes sense. When you're talking with other people and trying to reproduce your problem or share your awesome code, they're probably using RStudio. Using the same tool reduces the amount of effort needed to communicate. - -## Entering Commands - -Now that you've got an is environment, you're ready to go. That cursor is blinking and waiting for you to tell it what to do! So what's the first thing you'll accomplish? - -Well, not much. We'll get into more fun stuff in the next chapter, but for now let's play it safe. You can use R a basic calculator, so take a few minutes to enter some basic mathematical expressions. - -```{r eval=TRUE, echo=TRUE} -1 + 1 - -pi - -2*pi*4^2 -``` - -* I can't find the console - -In RStudio, the console may be reached by pressing CTRL-2 (Command-2 on Mac). +Your IT staff won't run R on a server, won't give you a laptop with R installed. They're really against this software. I'd like to advise you to get another job, but that's defeatist. This is where we reach the nuclear option, which is to use your own computer. This will drive folks at your company nuts. Now you're transferring data from a secure machine to one which you use for personal e-mail, Facebook, sports, personal finance and other activities that we needn't dwell on here. This is an absolute last resort and the overheard of moving stuff from one device to another will obviate most of the efficiency gains that open source software will provide. Here's how to make it work: produce work that is ONLY POSSIBLE using R, or Python or any of the tools which we will discuss. Show a killer visual and then patiently explain to your boss why it can't be done in Excel and why you can't share it with other departments and why it can't be done every quarter. This is a tall order, but it just might get someone's attention. \ No newline at end of file diff --git a/60_Simulation.Rmd b/Simulation.Rmd similarity index 98% rename from 60_Simulation.Rmd rename to Simulation.Rmd index 8e9627f..c9e94f1 100644 --- a/60_Simulation.Rmd +++ b/Simulation.Rmd @@ -3,7 +3,8 @@ * Probability distributions * Random samples -### Probability distributions +## Probability distributions + All probability distributions have four basic functions: * d dist - Density function @@ -100,7 +101,7 @@ class <- rep(class, numClaims) dfClaims <- data.frame(class, severity = unlist(severity)) ``` -## Questions +## Exercises * Draw a lognormal distribution with a mean of $10,000 and a CV of 30%. * For that distribution, what is the probability of seeing a claim greater than $100,000? @@ -118,8 +119,6 @@ mu <- log(severity) - sigma^2/2 plot(function(x) dlnorm(x), mu, sigma, ylab="LN f(x)") ``` -## - ```{r } set.seed(1234) claims = rlnorm(100, meanlog=log(30000), sdlog=1) diff --git a/30_Vectors.Rmd b/Vectors.Rmd similarity index 89% rename from 30_Vectors.Rmd rename to Vectors.Rmd index 48fbae4..d27a19e 100644 --- a/30_Vectors.Rmd +++ b/Vectors.Rmd @@ -1,6 +1,8 @@ +# Data {-} + # Vectors -In this chapter, we're going to learn about vectors, one of the key building blocks of R programming. By the end of this chapter, you will know: +In this chapter, we're going to learn about vectors, which are the key building blocks of R programming. By the end of this chapter, you will know: * What is a vector? * How are vectors created? @@ -18,17 +20,17 @@ This makes a bit of sense. We entered 2 and we got back 2. But what's that 1 in ![Console returning more than one value](images/ConsoleVector2.png) -Now there's not only a 1 in brackets, there's also a 16 on the second line. (Note that your console may appear a bit different than mine.) You're probably clever enough to have figured out that the numbers in brackets have something to do with the number of outputs generated. In the second case, "p" is the 16th letter of the alphabet and the bracketed 16 helps us know where we are in the sequence when it spills onto multiple lines. +Now there's not only a 1 in brackets, there's also a 16 on the second line. (Note that your console may appear a bit different than mine.) You're probably clever enough to have figured out that the numbers in brackets have something to do with the number of outputs generated. In the second case, "p" is the 16th letter of the alphabet and the bracketed 16 helps us know where we are in the sequence when it spills onto multiple lines. So, the bracketed figures are there to indicate how many numbers have been returned. OK, cool. So what? -So everything! In R, every variable is a vector. When we entered the number 2 at the console, we were creating (briefly) a vector which had a length of 1."letters" is a special vector with one element for each letter of the English alphabet. Vectors allow us to reason about a lot of data at once. The variable "letters" for instance enables us to store 26 values in one place. Further, it allows us to make changes to all of the elements of the vector at the same time. For example: +So everything! In R, every variable is a vector. Think back to [] When we entered the number 2 at the console, we were creating (briefly) a vector which had a length of 1. "letters" is a special vector with one element for each letter of the English alphabet. Vectors allow us to reason about a lot of data at once. The variable "letters" for instance enables us to store 26 values in one place. Further, it allows us to make changes to all of the elements of the vector at the same time. For example: ```{r } paste("Letter", letters) ``` -Using the `paste0` command, we took each element of "letters" and prefixed it with the text "Letter". This is similar to applyinig the same function to a set of contiguous cells in a spreadsheet. But in this case, I didn't need to copy and paste something 26 times. I didn't even need to worry about how many times the command needed to be repeated. Vectors can grow and shrink automatically. No need to move cells around on a sheet. No need to copy formulas or change named ranges. R just did it. (Note that by default the `paste` function will automatically add a blank space between elements. The function `paste0` will concatenate elements without a space. Try it.) +Using the `paste` command, we took each element of "letters" and prefixed it with the text "Letter". This is similar to applying the same function to a set of contiguous cells in a spreadsheet. But in this case, I didn't need to copy and paste something 26 times. I didn't even need to worry about how many times the command needed to be repeated. Vectors can grow and shrink automatically. No need to move cells around on a sheet. No need to copy formulas or change named ranges. R just did it. (Note that by default the `paste` function will automatically add a blank space between elements. The function `paste0` will concatenate elements without a space. Try it.) ### Vector properties @@ -268,7 +270,7 @@ is.element(1941, y) ### Summarization -Loads of functions take vector input and return scalar output. Translation of a large sest of numbers into a few, informative values is one of the cornerstones of statistics. +Loads of functions take vector input and return scalar output. Translation of a large set of numbers into a few, informative values is one of the cornerstones of statistics. ```{r eval=FALSE} x = 1:50 @@ -439,7 +441,7 @@ Now that you know what they are, you can spend the next few months avoiding fact If characters aren't behaving the way you expect them to, check the variables with `is.factor`. Convert them with `as.character` and you'll be back on the road to happiness. -### Questions +## Exercises * Create a logical, integer, double and character variable. * Can you create a vector with both logical and character values? @@ -545,8 +547,7 @@ colMeans(myMatrix) Like more than two dimensions? Shine on you crazy diamond. - -## Exercise +## Exercises Create a vector of length 10, with years starting from 1980. diff --git a/_bookdown.yml b/_bookdown.yml index b926691..b00468f 100644 --- a/_bookdown.yml +++ b/_bookdown.yml @@ -1,2 +1,20 @@ book_filename: "raw" -chapter_name: "Chapter " +# chapter_name: "Chapter " + +rmd_files: [ + "index.Rmd", + + "Setup.Rmd", + "GettingStarted.Rmd", + "LanguageElements.Rmd", + + "Vectors.Rmd", + "Lists.Rmd", + "Data.Rmd", + + "BasicVisualization.Rmd", + "LossDistributions.Rmd", + + "References.Rmd", + +] diff --git a/index.Rmd b/index.Rmd index 9679b32..f134468 100644 --- a/index.Rmd +++ b/index.Rmd @@ -4,25 +4,35 @@ author: "Brian A. Fannin, ACAS" date: "`r Sys.Date()`" site: bookdown::bookdown_site documentclass: book -bibliography: [packages.bib] -biblio-style: apalike -link-citations: yes +bibliography: [packages.bib, references.bib] +biblio-style: alpha +link-citations: true github-repo: PirateGrunt/raw_book description: "This is a companion to actuarial workshops." knit: "bookdown::render_book" +cover-image: raw_cover.png --- -```{r include=FALSE} -# automatically create a bib database for R packages -knitr::write_bib(c(.packages(), 'bookdown', 'knitr', 'rmarkdown'), 'packages.bib') -``` +# Introduction {-} -# Introduction +Cover image -## Why does this book exist? +Hello! Very happy to have you here and I hope you find this useful. This book is meant to serve as a companion to any of the R training sessions that I'm involved in. A few quick notes before we proceed. -## Who should use this? +#### Why does this book exist? {-} -## Other stuff +For over three years now, I've joined other actuaries in teaching R at events sponsored by the Casualty Actuarial Society[^CAS_CYA]. I've learned a lot about what questions get asked, where folks get stuck and what content matters most. We've reached the place where, to be honest, attendees can get more out of the live sessions if they come in having done some preliminary work. That should give us more opportunity for hands on instruction. At a minimum, this book should serve as a handy reference before, during and after a live training to reinforce what we're trying to teach. -So, that's all the preliminaries. Away we go! \ No newline at end of file +That understood, this book is hardly the only game in town. There are loads of good books about R and I can easily recommend many of them. [@Matloff] is a great one. Go check them out. I have. + +#### This is an organic book {-} + +You'll not find this in Barnes & Nobel or Amazon and I'll strongly suggest that you resist the temptation to print this. I fully expect that there will corrections, additions and updates as the technology changes. The book will live on the internet as long as I can support it and it's probably best to check it out there. By all means, download the PDF if you'd like a local copy, but do check back for updates. + +#### You don't need to read this from start to finish {-} + +Though I've done my best to give this book a clear flow, I've had to make a few sacrifices in order to get in all the material that I needed. This means that there are some, sorry, boring bits like a page about data types or how to write loops and such. Some people will find this stuff fascinating, some people will find this ... necessary. Feel free to treat this like a reference text and not like a Michael Chabon novel and you're likely to get more out of it. + +So, that's all the preliminaries. Away we go! + +[^CAS_CYA]: The CAS has not sponsored this publication and no one should construe my involvement with the CAS as constituting their endorsement of the material presented here. \ No newline at end of file