diff --git a/02_RProgramming/ControlStructures/index.Rmd b/02_RProgramming/ControlStructures/index.Rmd index 02571ce40..6214d166d 100644 --- a/02_RProgramming/ControlStructures/index.Rmd +++ b/02_RProgramming/ControlStructures/index.Rmd @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} @@ -243,4 +243,4 @@ Summary - Infinite loops should generally be avoided, even if they are theoretically correct. -- Control structures mentiond here are primarily useful for writing programs; for command-line interactive work, the *apply functions are more useful. \ No newline at end of file +- Control structures mentiond here are primarily useful for writing programs; for command-line interactive work, the *apply functions are more useful. diff --git a/02_RProgramming/ControlStructures/index.html b/02_RProgramming/ControlStructures/index.html index 88194caca..3879ea60b 100644 --- a/02_RProgramming/ControlStructures/index.html +++ b/02_RProgramming/ControlStructures/index.html @@ -8,46 +8,46 @@ - - + - - - - + + - - + - + + +
+

Introduction to the R Language

+

Control Structures

+

Roger Peng, Associate Professor
Johns Hopkins Bloomberg School of Public Health

+
+
+
- - - - -
-

Introduction to the R Language

-

Control Structures

-

Roger Peng, Associate Professor
Johns Hopkins Bloomberg School of Public Health

-
-
- - +

Control Structures

-
+

Control structures in R allow you to control the flow of execution of the program, depending on runtime conditions. Common structures are

    @@ -66,11 +66,11 @@

    Control Structures

    - +

    Control Structures: if

    -
    +
    if(<condition>) {
             ## do something
     } else {
    @@ -89,11 +89,11 @@ 

    Control Structures: if

    - +

    if

    -
    +

    This is a valid if/else structure.

    if(x > 3) {
    @@ -116,11 +116,11 @@ 

    if

    - +

    if

    -
    +

    Of course, the else clause is not necessary.

    if(<condition1>) {
    @@ -136,11 +136,11 @@ 

    if

    - +

    for

    -
    +

    for loops take an interator variable and assign it successive values from a sequence or vector. For loops are most commonly used for iterating over the elements of an object (list, vector, etc.)

    for(i in 1:10) {
    @@ -154,11 +154,11 @@ 

    for

    - +

    for

    -
    +

    These three loops have the same behavior.

    x <- c("a", "b", "c", "d")
    @@ -182,11 +182,11 @@ 

    for

    - +

    Nested for loops

    -
    +

    for loops can be nested.

    x <- matrix(1:6, 2, 3)
    @@ -204,11 +204,11 @@ 

    Nested for loops

    - +

    while

    -
    +

    While loops begin by testing a condition. If it is true, then they execute the loop body. Once the loop body is executed, the condition is tested again, and so forth.

    count <- 0
    @@ -224,11 +224,11 @@ 

    while

    - +

    while

    -
    +

    Sometimes there will be more than one condition in the test.

    z <- 5
    @@ -251,11 +251,11 @@ 

    while

    - +

    repeat

    -
    +

    Repeat initiates an infinite loop; these are not commonly used in statistical applications but they do have their uses. The only way to exit a repeat loop is to call break.

    x0 <- 1
    @@ -276,22 +276,22 @@ 

    repeat

    - +

    repeat

    -
    +

    The loop in the previous slide is a bit dangerous because there’s no guarantee it will stop. Better to set a hard limit on the number of iterations (e.g. using a for loop) and then report whether convergence was achieved or not.

    - +

    next, return

    -
    +

    next is used to skip an iteration of a loop

    for(i in 1:100) {
    @@ -309,11 +309,11 @@ 

    next, return

    - +

    Control Structures

    -
    +

    Summary

      @@ -328,34 +328,113 @@

      Control Structures

      - - - - - - - - - - - + + + - - - - \ No newline at end of file + + + + + \ No newline at end of file diff --git a/02_RProgramming/ControlStructures/index.md b/02_RProgramming/ControlStructures/index.md index 991e50c6e..6214d166d 100644 --- a/02_RProgramming/ControlStructures/index.md +++ b/02_RProgramming/ControlStructures/index.md @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} diff --git a/02_RProgramming/DataTypes/Introduction to the R Language.pdf b/02_RProgramming/DataTypes/Introduction to the R Language.pdf index b9e916bff..b8f1bc492 100644 Binary files a/02_RProgramming/DataTypes/Introduction to the R Language.pdf and b/02_RProgramming/DataTypes/Introduction to the R Language.pdf differ diff --git a/02_RProgramming/DataTypes/index.Rmd b/02_RProgramming/DataTypes/index.Rmd index 65eb1ce54..19f8f1af4 100644 --- a/02_RProgramming/DataTypes/index.Rmd +++ b/02_RProgramming/DataTypes/index.Rmd @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} @@ -200,7 +200,9 @@ NAs introduced by coercion > as.logical(x) [1] NA NA NA > as.complex(x) -[1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i +[1] NA NA NA +Warning message: +NAs introduced by coercion ``` --- @@ -472,4 +474,4 @@ Data Types - data frames -- names \ No newline at end of file +- names diff --git a/02_RProgramming/DataTypes/index.html b/02_RProgramming/DataTypes/index.html index 9b50617cb..00c65c081 100644 --- a/02_RProgramming/DataTypes/index.html +++ b/02_RProgramming/DataTypes/index.html @@ -8,46 +8,46 @@ - - + - - - - + + - - + - + + +
      +

      Introduction to the R Language

      +

      Data Types and Basic Operations

      +

      Roger Peng, Associate Professor
      Johns Hopkins Bloomberg School of Public Health

      +
      +
      +
      - - - - -
      -

      Introduction to the R Language

      -

      Data Types and Basic Operations

      -

      Roger Peng, Associate Professor
      Johns Hopkins Bloomberg School of Public Health

      -
      -
      - - +

      Objects

      -
      +

      R has five basic or “atomic” classes of objects:

        @@ -73,11 +73,11 @@

        Objects

        - +

        Numbers

        -
        +
        • Numbers in R a generally treated as numeric objects (i.e. double precision real numbers)

        • @@ -97,11 +97,11 @@

          Numbers

          - +

          Attributes

          -
          +

          R objects can have attributes

            @@ -119,11 +119,11 @@

            Attributes

            - +

            Entering Input

            -
            +

            At the R prompt we type expressions. The <- symbol is the assignment operator.

            > x <- 1
            @@ -145,11 +145,11 @@ 

            Entering Input

            - +

            Evaluation

            -
            +

            When a complete expression is entered at the prompt, it is evaluated and the result of the evaluated expression is returned. The result may be auto-printed.

            > x <- 5  ## nothing printed
            @@ -165,11 +165,11 @@ 

            Evaluation

            - +

            Printing

            -
            +
            > x <- 1:20 
             > x
              [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
            @@ -182,11 +182,11 @@ 

            Printing

            - +

            Creating Vectors

            -
            +

            The c() function can be used to create vectors of objects.

            > x <- c(0.5, 0.6)       ## numeric
            @@ -208,11 +208,11 @@ 

            Creating Vectors

            - +

            Mixing Objects

            -
            +

            What about the following?

            > y <- c(1.7, "a")   ## character
            @@ -226,11 +226,11 @@ 

            Mixing Objects

            - +

            Explicit Coercion

            -
            +

            Objects can be explicitly coerced from one class to another using the as.* functions, if available.

            > x <- 0:6
            @@ -248,11 +248,11 @@ 

            Explicit Coercion

            - +

            Explicit Coercion

            -
            +

            Nonsensical coercion results in NAs.

            > x <- c("a", "b", "c")
            @@ -263,18 +263,20 @@ 

            Explicit Coercion

            > as.logical(x) [1] NA NA NA > as.complex(x) -[1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i +[1] NA NA NA +Warning message: +NAs introduced by coercion
            - +

            Matrices

            -
            +

            Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (nrow, ncol)

            > m <- matrix(nrow = 2, ncol = 3) 
            @@ -293,11 +295,11 @@ 

            Matrices

            - +

            Matrices (cont’d)

            -
            +

            Matrices are constructed column-wise, so entries can be thought of starting in the “upper left” corner and running down the columns.

            > m <- matrix(1:6, nrow = 2, ncol = 3) 
            @@ -311,11 +313,11 @@ 

            Matrices (cont’d)

            - +

            Matrices (cont’d)

            -
            +

            Matrices can also be created directly from vectors by adding a dimension attribute.

            > m <- 1:10 
            @@ -332,11 +334,11 @@ 

            Matrices (cont’d)

            - +

            cbind-ing and rbind-ing

            -
            +

            Matrices can be created by column-binding or row-binding with cbind() and rbind().

            > x <- 1:3
            @@ -356,11 +358,11 @@ 

            cbind-ing and rbind-ing

            - +

            Lists

            -
            +

            Lists are a special type of vector that can contain elements of different classes. Lists are a very important data type in R and you should get to know them well.

            > x <- list(1, "a", TRUE, 1 + 4i) 
            @@ -382,11 +384,11 @@ 

            Lists

            - +

            Factors

            -
            +

            Factors are used to represent categorical data. Factors can be unordered or ordered. One can think of a factor as an integer vector where each integer has a label.

              @@ -398,11 +400,11 @@

              Factors

              - +

              Factors

              -
              +
              > x <- factor(c("yes", "yes", "no", "yes", "no")) 
               > x
               [1] yes yes no yes no
              @@ -421,11 +423,11 @@ 

              Factors

              - +

              Factors

              -
              +

              The order of the levels can be set using the levels argument to factor(). This can be important in linear modelling because the first level is used as the baseline level.

              > x <- factor(c("yes", "yes", "no", "yes", "no"),
              @@ -439,11 +441,11 @@ 

              Factors

              - +

              Missing Values

              -
              +

              Missing values are denoted by NA or NaN for undefined mathematical operations.

                @@ -457,11 +459,11 @@

                Missing Values

                - +

                Missing Values

                -
                +
                > x <- c(1, 2, NA, 10, 3)
                 > is.na(x)
                 [1] FALSE FALSE  TRUE FALSE FALSE
                @@ -478,11 +480,11 @@ 

                Missing Values

                - +

                Data Frames

                -
                +

                Data frames are used to store tabular data

                  @@ -498,11 +500,11 @@

                  Data Frames

                  - +

                  Data Frames

                  -
                  +
                  > x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) 
                   > x
                     foo   bar
                  @@ -520,11 +522,11 @@ 

                  Data Frames

                  - +

                  Names

                  -
                  +

                  R objects can also have names, which is very useful for writing readable code and self-describing objects.

                  > x <- 1:3
                  @@ -542,11 +544,11 @@ 

                  Names

                  - +

                  Names

                  -
                  +

                  Lists can also have names.

                  > x <- list(a = 1, b = 2, c = 3) 
                  @@ -565,11 +567,11 @@ 

                  Names

                  - +

                  Names

                  -
                  +

                  And matrices.

                  > m <- matrix(1:4, nrow = 2, ncol = 2)
                  @@ -584,11 +586,11 @@ 

                  Names

                  - +

                  Summary

                  -
                  +

                  Data Types

                    @@ -606,34 +608,191 @@

                    Summary

                    - - - - - - - - - - - + + + - - - - \ No newline at end of file + + + + + \ No newline at end of file diff --git a/02_RProgramming/DataTypes/index.md b/02_RProgramming/DataTypes/index.md index ccd9ff364..19f8f1af4 100644 --- a/02_RProgramming/DataTypes/index.md +++ b/02_RProgramming/DataTypes/index.md @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} @@ -200,7 +200,9 @@ NAs introduced by coercion > as.logical(x) [1] NA NA NA > as.complex(x) -[1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i +[1] NA NA NA +Warning message: +NAs introduced by coercion ``` --- diff --git a/02_RProgramming/Dates/DatesTimes.pdf b/02_RProgramming/Dates/Dates.pdf similarity index 62% rename from 02_RProgramming/Dates/DatesTimes.pdf rename to 02_RProgramming/Dates/Dates.pdf index 8ab0982ca..e1c135458 100644 Binary files a/02_RProgramming/Dates/DatesTimes.pdf and b/02_RProgramming/Dates/Dates.pdf differ diff --git a/02_RProgramming/Dates/index.Rmd b/02_RProgramming/Dates/index.Rmd index fff705469..d435714c4 100644 --- a/02_RProgramming/Dates/index.Rmd +++ b/02_RProgramming/Dates/index.Rmd @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} @@ -92,15 +92,14 @@ p$sec ## Times in R -Finally, there is the `strptime` function in case your dates are written in a different format +Finally, there is the `strptime` function in case your dates are +written in a different format -```r -datestring <- c("January 10, 2012 10:40", "December 9, 2011 +```{r} +datestring <- c("January 10, 2012 10:40", "December 9, 2011 9:10") x <- strptime(datestring, "%B %d, %Y %H:%M") x -## [1] "2012-01-10 10:40:00" "2011-12-09 09:10:00" class(x) -## [1] "POSIXlt" "POSIXt" ``` I can _never_ remember the formatting strings. Check `?strptime` for details. diff --git a/02_RProgramming/Dates/index.html b/02_RProgramming/Dates/index.html index b0fe8124f..c3d93923d 100644 --- a/02_RProgramming/Dates/index.html +++ b/02_RProgramming/Dates/index.html @@ -8,51 +8,46 @@ - - + - - - - + + - - - - - - - + - + + +
                    +

                    Dates and Times in R

                    +

                    +

                    Roger D. Peng, Associate Professor of Biostatistics
                    Johns Hopkins Bloomberg School of Public Health

                    +
                    +
                    +
                    - - - - -
                    -

                    Dates and Times in R

                    -

                    -

                    Roger D. Peng, Associate Professor of Biostatistics
                    Johns Hopkins Bloomberg School of Public Health

                    -
                    -
                    - - +

                    Dates and Times in R

                    -
                    +

                    R has developed a special representation of dates and times

                      @@ -66,11 +61,11 @@

                      Dates and Times in R

                      - +

                      Dates in R

                      -
                      +

                      Dates are represented by the Date class and can be coerced from a character string using the as.Date() function.

                      x <- as.Date("1970-01-01")
                      @@ -86,11 +81,11 @@ 

                      Dates in R

                      - +

                      Times in R

                      -
                      +

                      Times are represented using the POSIXct or the POSIXlt class

                        @@ -110,11 +105,11 @@

                        Times in R

                        - +

                        Times in R

                        -
                        +

                        Times can be coerced from a character string using the as.POSIXlt or as.POSIXct function.

                        x <- Sys.time()
                        @@ -132,11 +127,11 @@ 

                        Times in R

                        - +

                        Times in R

                        -
                        +

                        You can also use the POSIXct format.

                        x <- Sys.time()
                        @@ -155,19 +150,26 @@ 

                        Times in R

                        - +

                        Times in R

                        -
                        -

                        Finally, there is the strptime function in case your dates are written in a different format

                        +
                        +

                        Finally, there is the strptime function in case your dates are +written in a different format

                        -
                        datestring <- c("January 10, 2012 10:40", "December 9, 2011
                        +
                        datestring <- c("January 10, 2012 10:40", "December 9, 2011 9:10")
                         x <- strptime(datestring, "%B %d, %Y %H:%M")
                         x
                        -## [1] "2012-01-10 10:40:00" "2011-12-09 09:10:00"
                        -class(x)
                        -## [1] "POSIXlt" "POSIXt"
                        +
                        + +
                        ## [1] "2012-01-10 10:40:00 EST" "2011-12-09 09:10:00 EST"
                        +
                        + +
                        class(x)
                        +
                        + +
                        ## [1] "POSIXlt" "POSIXt"
                         

                        I can never remember the formatting strings. Check ?strptime for details.

                        @@ -176,11 +178,11 @@

                        Times in R

                        - +

                        Operations on Dates and Times

                        -
                        +

                        You can use mathematical operations on dates and times. Well, really just + and -. You can do comparisons too (i.e. ==, <=)

                        x <- as.Date("2012-01-01")
                        @@ -198,11 +200,11 @@ 

                        Operations on Dates and Times

                        - +

                        Operations on Dates and Times

                        -
                        +

                        Even keeps track of leap years, leap seconds, daylight savings, and time zones.

                        x <- as.Date("2012-03-01") y <- as.Date("2012-02-28") 
                        @@ -218,11 +220,11 @@ 

                        Operations on Dates and Times

                        - +

                        Summary

                        -
                        +
                        • Dates and times have special classes in R that allow for numerical and statistical calculations
                        • Dates use the Date class
                        • @@ -236,34 +238,89 @@

                          Summary

                          - - - - - - - - - - - + + + - - - - \ No newline at end of file + + + + + \ No newline at end of file diff --git a/02_RProgramming/Dates/index.md b/02_RProgramming/Dates/index.md index fff705469..5e6c6789c 100644 --- a/02_RProgramming/Dates/index.md +++ b/02_RProgramming/Dates/index.md @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} @@ -92,14 +92,25 @@ p$sec ## Times in R -Finally, there is the `strptime` function in case your dates are written in a different format +Finally, there is the `strptime` function in case your dates are +written in a different format + ```r -datestring <- c("January 10, 2012 10:40", "December 9, 2011 +datestring <- c("January 10, 2012 10:40", "December 9, 2011 9:10") x <- strptime(datestring, "%B %d, %Y %H:%M") x -## [1] "2012-01-10 10:40:00" "2011-12-09 09:10:00" +``` + +``` +## [1] "2012-01-10 10:40:00 EST" "2011-12-09 09:10:00 EST" +``` + +```r class(x) +``` + +``` ## [1] "POSIXlt" "POSIXt" ``` I can _never_ remember the formatting strings. Check `?strptime` for details. diff --git a/02_RProgramming/Dates/slides/datetime_slide01.png b/02_RProgramming/Dates/slides/datetime_slide01.png index 5de9e7587..a9ab292db 100644 Binary files a/02_RProgramming/Dates/slides/datetime_slide01.png and b/02_RProgramming/Dates/slides/datetime_slide01.png differ diff --git a/02_RProgramming/Dates/slides/datetime_slide02.png b/02_RProgramming/Dates/slides/datetime_slide02.png index da030098f..a008dffec 100644 Binary files a/02_RProgramming/Dates/slides/datetime_slide02.png and b/02_RProgramming/Dates/slides/datetime_slide02.png differ diff --git a/02_RProgramming/Dates/slides/datetime_slide03.png b/02_RProgramming/Dates/slides/datetime_slide03.png index e5820f3e4..a542ecd3c 100644 Binary files a/02_RProgramming/Dates/slides/datetime_slide03.png and b/02_RProgramming/Dates/slides/datetime_slide03.png differ diff --git a/02_RProgramming/Dates/slides/datetime_slide04.png b/02_RProgramming/Dates/slides/datetime_slide04.png index 6822f157b..9ad7c1484 100644 Binary files a/02_RProgramming/Dates/slides/datetime_slide04.png and b/02_RProgramming/Dates/slides/datetime_slide04.png differ diff --git a/02_RProgramming/Dates/slides/datetime_slide05.png b/02_RProgramming/Dates/slides/datetime_slide05.png index ce312f427..f34296cf8 100644 Binary files a/02_RProgramming/Dates/slides/datetime_slide05.png and b/02_RProgramming/Dates/slides/datetime_slide05.png differ diff --git a/02_RProgramming/Dates/slides/datetime_slide06.png b/02_RProgramming/Dates/slides/datetime_slide06.png index b34bb0510..f1912344d 100644 Binary files a/02_RProgramming/Dates/slides/datetime_slide06.png and b/02_RProgramming/Dates/slides/datetime_slide06.png differ diff --git a/02_RProgramming/Dates/slides/datetime_slide07.png b/02_RProgramming/Dates/slides/datetime_slide07.png index d08186b25..4780feb3e 100644 Binary files a/02_RProgramming/Dates/slides/datetime_slide07.png and b/02_RProgramming/Dates/slides/datetime_slide07.png differ diff --git a/02_RProgramming/Dates/slides/datetime_slide08.png b/02_RProgramming/Dates/slides/datetime_slide08.png index d0c0de647..8b6774ca4 100644 Binary files a/02_RProgramming/Dates/slides/datetime_slide08.png and b/02_RProgramming/Dates/slides/datetime_slide08.png differ diff --git a/02_RProgramming/Dates/slides/datetime_slide09.png b/02_RProgramming/Dates/slides/datetime_slide09.png index f25e62cb8..c79060797 100644 Binary files a/02_RProgramming/Dates/slides/datetime_slide09.png and b/02_RProgramming/Dates/slides/datetime_slide09.png differ diff --git a/02_RProgramming/Dates/slides/datetime_slide10.png b/02_RProgramming/Dates/slides/datetime_slide10.png index e7bd86141..262d4681f 100644 Binary files a/02_RProgramming/Dates/slides/datetime_slide10.png and b/02_RProgramming/Dates/slides/datetime_slide10.png differ diff --git a/02_RProgramming/Subsetting/index.Rmd b/02_RProgramming/Subsetting/index.Rmd index 88816007c..ac32e0ad9 100644 --- a/02_RProgramming/Subsetting/index.Rmd +++ b/02_RProgramming/Subsetting/index.Rmd @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} @@ -102,8 +102,6 @@ Similarly, subsetting a single column or a single row will give you a vector, no ## Subsetting Lists ```r -6/14 -Subsetting Lists > x <- list(foo = 1:4, bar = 0.6) > x[1] $foo diff --git a/02_RProgramming/Subsetting/index.html b/02_RProgramming/Subsetting/index.html index e907844ac..4ffcdf611 100644 --- a/02_RProgramming/Subsetting/index.html +++ b/02_RProgramming/Subsetting/index.html @@ -8,51 +8,46 @@ - - + - - - - + + - - - - - - - + - + + +
                          +

                          Introduction to the R Language

                          +

                          Data Types and Basic Operations

                          +

                          Roger Peng, Associate Professor
                          Johns Hopkins Bloomberg School of Public Health

                          +
                          +
                          +
                          - - - - -
                          -

                          Introduction to the R Language

                          -

                          Data Types and Basic Operations

                          -

                          Roger Peng, Associate Professor
                          Johns Hopkins Bloomberg School of Public Health

                          -
                          -
                          - - +

                          Subsetting

                          -
                          +

                          There are a number of operators that can be used to extract subsets of R objects.

                            @@ -65,11 +60,11 @@

                            Subsetting

                            - +

                            Subsetting

                            -
                            +
                            > x <- c("a", "b", "c", "c", "d", "a")
                             > x[1]
                             [1] "a"
                            @@ -90,11 +85,11 @@ 

                            Subsetting

                            - +

                            Subsetting a Matrix

                            -
                            +

                            Matrices can be subsetted in the usual way with (i,j) type indices.

                            > x <- matrix(1:6, 2, 3)
                            @@ -116,11 +111,11 @@ 

                            Subsetting a Matrix

                            - +

                            Subsetting a Matrix

                            -
                            +

                            By default, when a single element of a matrix is retrieved, it is returned as a vector of length 1 rather than a 1 × 1 matrix. This behavior can be turned off by setting drop = FALSE.

                            > x <- matrix(1:6, 2, 3)
                            @@ -135,11 +130,11 @@ 

                            Subsetting a Matrix

                            - +

                            Subsetting a Matrix

                            -
                            +

                            Similarly, subsetting a single column or a single row will give you a vector, not a matrix (by default).

                            > x <- matrix(1:6, 2, 3)
                            @@ -154,14 +149,12 @@ 

                            Subsetting a Matrix

                            - +

                            Subsetting Lists

                            -
                            -
                            6/14
                            -Subsetting Lists
                            -> x <- list(foo = 1:4, bar = 0.6)
                            +  
                            +
                            > x <- list(foo = 1:4, bar = 0.6)
                             > x[1]
                             $foo
                             [1] 1 2 3 4
                            @@ -182,11 +175,11 @@ 

                            Subsetting Lists

                            - +

                            Subsetting Lists

                            -
                            +
                            > x <- list(foo = 1:4, bar = 0.6, baz = "hello")
                             > x[c(1, 3)]
                             $foo
                            @@ -200,11 +193,11 @@ 

                            Subsetting Lists

                            - +

                            Subsetting Lists

                            -
                            +

                            The [[ operator can be used with computed indices; $ can only be used with literal names.

                            > x <- list(foo = 1:4, bar = 0.6, baz = "hello")
                            @@ -221,11 +214,11 @@ 

                            Subsetting Lists

                            - +

                            Subsetting Nested Elements of a List

                            -
                            +

                            The [[ can take an integer sequence.

                            > x <- list(a = list(10, 12, 14), b = c(3.14, 2.81))
                            @@ -242,11 +235,11 @@ 

                            Subsetting Nested Elements of a List

                            - +

                            Partial Matching

                            -
                            +

                            Partial matching of names is allowed with [[ and $.

                            > x <- list(aardvark = 1:5)
                            @@ -262,11 +255,11 @@ 

                            Partial Matching

                            - +

                            Removing NA Values

                            -
                            +

                            A common task is to remove missing values (NAs).

                            > x <- c(1, 2, NA, 4, NA, 5)
                            @@ -279,11 +272,11 @@ 

                            Removing NA Values

                            - +

                            Removing NA Values

                            -
                            +

                            What if there are multiple things and you want to take the subset with no missing values?

                            > x <- c(1, 2, NA, 4, NA, 5)
                            @@ -301,11 +294,11 @@ 

                            Removing NA Values

                            - +

                            Removing NA Values

                            -
                            +
                            > airquality[1:6, ]
                               Ozone Solar.R Wind Temp Month Day
                             1    41     190  7.4   67     5   1
                            @@ -330,34 +323,113 @@ 

                            Removing NA Values

                            - - - - - - - - - - - + + + - - - - \ No newline at end of file + + + + + \ No newline at end of file diff --git a/02_RProgramming/Subsetting/index.md b/02_RProgramming/Subsetting/index.md index 88816007c..ac32e0ad9 100644 --- a/02_RProgramming/Subsetting/index.md +++ b/02_RProgramming/Subsetting/index.md @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} @@ -102,8 +102,6 @@ Similarly, subsetting a single column or a single row will give you a vector, no ## Subsetting Lists ```r -6/14 -Subsetting Lists > x <- list(foo = 1:4, bar = 0.6) > x[1] $foo diff --git a/02_RProgramming/profiler/Profiling R Code.pdf b/02_RProgramming/profiler/Profiling R Code.pdf new file mode 100644 index 000000000..f71c720b0 Binary files /dev/null and b/02_RProgramming/profiler/Profiling R Code.pdf differ diff --git a/03_GettingData/02_02_readingHDF5/index.Rmd b/03_GettingData/02_02_readingHDF5/index.Rmd index b1740bf30..1816d3da0 100644 --- a/03_GettingData/02_02_readingHDF5/index.Rmd +++ b/03_GettingData/02_02_readingHDF5/index.Rmd @@ -36,7 +36,7 @@ knit_hooks$set(plot = knitr:::hook_plot_html) * Used for storing large data sets * Supports storing a range of data types -* Heirarchical data format +* Hierarchical data format * _groups_ containing zero or more data sets and metadata * Have a _group header_ with group name and list of attributes * Have a _group symbol table_ with a list of objects in group @@ -134,4 +134,4 @@ h5read("example.h5","foo/A") * hdf5 can be used to optimize reading/writing from disc in R * The rhdf5 tutorial: * [http://www.bioconductor.org/packages/release/bioc/vignettes/rhdf5/inst/doc/rhdf5.pdf](http://www.bioconductor.org/packages/release/bioc/vignettes/rhdf5/inst/doc/rhdf5.pdf) -* The HDF group has informaton on HDF5 in general [http://www.hdfgroup.org/HDF5/](http://www.hdfgroup.org/HDF5/) \ No newline at end of file +* The HDF group has informaton on HDF5 in general [http://www.hdfgroup.org/HDF5/](http://www.hdfgroup.org/HDF5/) diff --git a/03_GettingData/dplyr/chicago.rds b/03_GettingData/dplyr/chicago.rds new file mode 100644 index 000000000..bf10aafe8 Binary files /dev/null and b/03_GettingData/dplyr/chicago.rds differ diff --git a/03_GettingData/dplyr/chicago.zip b/03_GettingData/dplyr/chicago.zip new file mode 100644 index 000000000..d0eb3a7fd Binary files /dev/null and b/03_GettingData/dplyr/chicago.zip differ diff --git a/03_GettingData/dplyr/dplyr.Rmd b/03_GettingData/dplyr/dplyr.Rmd new file mode 100644 index 000000000..7f1a33026 --- /dev/null +++ b/03_GettingData/dplyr/dplyr.Rmd @@ -0,0 +1,233 @@ +% Managing Data Frames with `dplyr` +% + +```{r, echo=FALSE, results="hide"} +options(width = 50) +``` + +# dplyr + +The data frame is a key data structure in statistics and in R. + +* There is one observation per row + +* Each column represents a variable or measure or characteristic + +* Primary implementation that you will use is the default R + implementation + +* Other implementations, particularly relational databases systems + + +# dplyr + +* Developed by Hadley Wickham of RStudio + +* An optimized and distilled version of `plyr` package (also by Hadley) + +* Does not provide any "new" functionality per se, but **greatly** + simplifies existing functionality in R + +* Provides a "grammar" (in particular, verbs) for data manipulation + +* Is **very** fast, as many key operations are coded in C++ + + +# dplyr Verbs + +* `select`: return a subset of the columns of a data frame + +* `filter`: extract a subset of rows from a data frame based on + logical conditions + +* `arrange`: reorder rows of a data frame + + +* `rename`: rename variables in a data frame + +* `mutate`: add new variables/columns or transform existing variables + +* `summarise` / `summarize`: generate summary statistics of different + variables in the data frame, possibly within strata + +There is also a handy `print` method that prevents you from printing a +lot of data to the console. + + + +# dplyr Properties + +* The first argument is a data frame. + +* The subsequent arguments describe what to do with it, and you can + refer to columns in the data frame directly without using the $ + operator (just use the names). + +* The result is a new data frame + +* Data frames must be properly formatted and annotated for this to all + be useful + + +# Load the `dplyr` package + + +This step is important! + +```{r} +library(dplyr) +``` + + +# `select` + +```{r} +chicago <- readRDS("chicago.rds") +dim(chicago) +head(select(chicago, 1:5)) +``` + + +# `select` + +```{r} +names(chicago)[1:3] +head(select(chicago, city:dptp)) +``` + +# `select` + +In dplyr you can do + +```{r,eval=FALSE} +head(select(chicago, -(city:dptp))) +``` + +Equivalent base R + +```{r,eval=FALSE} +i <- match("city", names(chicago)) +j <- match("dptp", names(chicago)) +head(chicago[, -(i:j)]) +``` + + + +# `filter` + +```{r} +chic.f <- filter(chicago, pm25tmean2 > 30) +head(select(chic.f, 1:3, pm25tmean2), 10) +``` + +# `filter` + +```{r} +chic.f <- filter(chicago, pm25tmean2 > 30 & tmpd > 80) +head(select(chic.f, 1:3, pm25tmean2, tmpd), 10) +``` + + +# `arrange` + +Reordering rows of a data frame (while preserving corresponding order +of other columns) is normally a pain to do in R. + +```{r} +chicago <- arrange(chicago, date) +head(select(chicago, date, pm25tmean2), 3) +tail(select(chicago, date, pm25tmean2), 3) +``` + +# `arrange` + +Columns can be arranged in descending order too. + +```{r} +chicago <- arrange(chicago, desc(date)) +head(select(chicago, date, pm25tmean2), 3) +tail(select(chicago, date, pm25tmean2), 3) +``` + + +# `rename` + +Renaming a variable in a data frame in R is surprising hard to do! + +```{r,tidy=FALSE} +head(chicago[, 1:5], 3) +chicago <- rename(chicago, dewpoint = dptp, + pm25 = pm25tmean2) +head(chicago[, 1:5], 3) +``` + + +# `mutate` + +```{r, tidy=FALSE} +chicago <- mutate(chicago, + pm25detrend=pm25-mean(pm25, na.rm=TRUE)) +head(select(chicago, pm25, pm25detrend)) +``` + +# `group_by` + +Generating summary statistics by stratum + +```{r, tidy=FALSE} +chicago <- mutate(chicago, + tempcat = factor(1 * (tmpd > 80), + labels = c("cold", "hot"))) +hotcold <- group_by(chicago, tempcat) +summarize(hotcold, pm25 = mean(pm25, na.rm = TRUE), + o3 = max(o3tmean2), + no2 = median(no2tmean2)) +``` + + +# `group_by` + +Generating summary statistics by stratum + +```{r, tidy=FALSE} +chicago <- mutate(chicago, + year = as.POSIXlt(date)$year + 1900) +years <- group_by(chicago, year) +summarize(years, pm25 = mean(pm25, na.rm = TRUE), + o3 = max(o3tmean2, na.rm = TRUE), + no2 = median(no2tmean2, na.rm = TRUE)) +``` + +```{r,echo=FALSE} +chicago$year <- NULL ## Can't use mutate to create an existing variable +``` + + +# `%>%` + +```{r,tidy=FALSE,eval=FALSE} +chicago %>% mutate(month = as.POSIXlt(date)$mon + 1) + %>% group_by(month) + %>% summarize(pm25 = mean(pm25, na.rm = TRUE), + o3 = max(o3tmean2, na.rm = TRUE), + no2 = median(no2tmean2, na.rm = TRUE)) +``` + +```{r,echo=FALSE} +chicago %>% mutate(month = as.POSIXlt(date)$mon + 1) %>% group_by(month) %>% +summarize(pm25 = mean(pm25, na.rm = TRUE), o3 = max(o3tmean2, na.rm = TRUE), no2 = median(no2tmean2, na.rm = TRUE)) + +``` + + +# dplyr + +Once you learn the dplyr "grammar" there are a few additional benefits + +* dplyr can work with other data frame "backends" + +* `data.table` for large fast tables + +* SQL interface for relational databases via the DBI package + + diff --git a/03_GettingData/dplyr/dplyr.md b/03_GettingData/dplyr/dplyr.md new file mode 100644 index 000000000..dff0db010 --- /dev/null +++ b/03_GettingData/dplyr/dplyr.md @@ -0,0 +1,426 @@ +% Managing Data Frames with `dplyr` +% + + + +# dplyr + +The data frame is a key data structure in statistics and in R. + +* There is one observation per row + +* Each column represents a variable or measure or characteristic + +* Primary implementation that you will use is the default R + implementation + +* Other implementations, particularly relational databases systems + + +# dplyr + +* Developed by Hadley Wickham of RStudio + +* An optimized and distilled version of `plyr` package (also by Hadley) + +* Does not provide any "new" functionality per se, but **greatly** + simplifies existing functionality in R + +* Provides a "grammar" (in particular, verbs) for data manipulation + +* Is **very** fast, as many key operations are coded in C++ + + +# dplyr Verbs + +* `select`: return a subset of the columns of a data frame + +* `filter`: extract a subset of rows from a data frame based on + logical conditions + +* `arrange`: reorder rows of a data frame + + +* `rename`: rename variables in a data frame + +* `mutate`: add new variables/columns or transform existing variables + +* `summarise` / `summarize`: generate summary statistics of different + variables in the data frame, possibly within strata + +There is also a handy `print` method that prevents you from printing a +lot of data to the console. + + + +# dplyr Properties + +* The first argument is a data frame. + +* The subsequent arguments describe what to do with it, and you can + refer to columns in the data frame directly without using the $ + operator (just use the names). + +* The result is a new data frame + +* Data frames must be properly formatted and annotated for this to all + be useful + + +# Load the `dplyr` package + + +This step is important! + + +```r +library(dplyr) +``` + +``` +## +## Attaching package: 'dplyr' +## +## The following object is masked from 'package:stats': +## +## filter +## +## The following objects are masked from 'package:base': +## +## intersect, setdiff, setequal, union +``` + + +# `select` + + +```r +chicago <- readRDS("chicago.rds") +dim(chicago) +``` + +``` +## [1] 6940 8 +``` + +```r +head(select(chicago, 1:5)) +``` + +``` +## city tmpd dptp date pm25tmean2 +## 1 chic 31.5 31.500 1987-01-01 NA +## 2 chic 33.0 29.875 1987-01-02 NA +## 3 chic 33.0 27.375 1987-01-03 NA +## 4 chic 29.0 28.625 1987-01-04 NA +## 5 chic 32.0 28.875 1987-01-05 NA +## 6 chic 40.0 35.125 1987-01-06 NA +``` + + +# `select` + + +```r +names(chicago)[1:3] +``` + +``` +## [1] "city" "tmpd" "dptp" +``` + +```r +head(select(chicago, city:dptp)) +``` + +``` +## city tmpd dptp +## 1 chic 31.5 31.500 +## 2 chic 33.0 29.875 +## 3 chic 33.0 27.375 +## 4 chic 29.0 28.625 +## 5 chic 32.0 28.875 +## 6 chic 40.0 35.125 +``` + +# `select` + +In dplyr you can do + + +```r +head(select(chicago, -(city:dptp))) +``` + +Equivalent base R + + +```r +i <- match("city", names(chicago)) +j <- match("dptp", names(chicago)) +head(chicago[, -(i:j)]) +``` + + + +# `filter` + + +```r +chic.f <- filter(chicago, pm25tmean2 > 30) +head(select(chic.f, 1:3, pm25tmean2), 10) +``` + +``` +## city tmpd dptp pm25tmean2 +## 1 chic 23 21.9 38.10 +## 2 chic 28 25.8 33.95 +## 3 chic 55 51.3 39.40 +## 4 chic 59 53.7 35.40 +## 5 chic 57 52.0 33.30 +## 6 chic 57 56.0 32.10 +## 7 chic 75 65.8 56.50 +## 8 chic 61 59.0 33.80 +## 9 chic 73 60.3 30.30 +## 10 chic 78 67.1 41.40 +``` + +# `filter` + + +```r +chic.f <- filter(chicago, pm25tmean2 > 30 & tmpd > 80) +head(select(chic.f, 1:3, pm25tmean2, tmpd), 10) +``` + +``` +## city tmpd dptp pm25tmean2 +## 1 chic 81 71.2 39.6000 +## 2 chic 81 70.4 31.5000 +## 3 chic 82 72.2 32.3000 +## 4 chic 84 72.9 43.7000 +## 5 chic 85 72.6 38.8375 +## 6 chic 84 72.6 38.2000 +## 7 chic 82 67.4 33.0000 +## 8 chic 82 63.5 42.5000 +## 9 chic 81 70.4 33.1000 +## 10 chic 82 66.2 38.8500 +``` + + +# `arrange` + +Reordering rows of a data frame (while preserving corresponding order +of other columns) is normally a pain to do in R. + + +```r +chicago <- arrange(chicago, date) +head(select(chicago, date, pm25tmean2), 3) +``` + +``` +## date pm25tmean2 +## 1 1987-01-01 NA +## 2 1987-01-02 NA +## 3 1987-01-03 NA +``` + +```r +tail(select(chicago, date, pm25tmean2), 3) +``` + +``` +## date pm25tmean2 +## 6938 2005-12-29 7.45000 +## 6939 2005-12-30 15.05714 +## 6940 2005-12-31 15.00000 +``` + +# `arrange` + +Columns can be arranged in descending order too. + + +```r +chicago <- arrange(chicago, desc(date)) +head(select(chicago, date, pm25tmean2), 3) +``` + +``` +## date pm25tmean2 +## 1 2005-12-31 15.00000 +## 2 2005-12-30 15.05714 +## 3 2005-12-29 7.45000 +``` + +```r +tail(select(chicago, date, pm25tmean2), 3) +``` + +``` +## date pm25tmean2 +## 6938 1987-01-03 NA +## 6939 1987-01-02 NA +## 6940 1987-01-01 NA +``` + + +# `rename` + +Renaming a variable in a data frame in R is surprising hard to do! + + +```r +head(chicago[, 1:5], 3) +``` + +``` +## city tmpd dptp date pm25tmean2 +## 1 chic 35 30.1 2005-12-31 15.00000 +## 2 chic 36 31.0 2005-12-30 15.05714 +## 3 chic 35 29.4 2005-12-29 7.45000 +``` + +```r +chicago <- rename(chicago, dewpoint = dptp, + pm25 = pm25tmean2) +head(chicago[, 1:5], 3) +``` + +``` +## city tmpd dewpoint date pm25 +## 1 chic 35 30.1 2005-12-31 15.00000 +## 2 chic 36 31.0 2005-12-30 15.05714 +## 3 chic 35 29.4 2005-12-29 7.45000 +``` + + +# `mutate` + + +```r +chicago <- mutate(chicago, + pm25detrend=pm25-mean(pm25, na.rm=TRUE)) +head(select(chicago, pm25, pm25detrend)) +``` + +``` +## pm25 pm25detrend +## 1 15.00000 -1.230958 +## 2 15.05714 -1.173815 +## 3 7.45000 -8.780958 +## 4 17.75000 1.519042 +## 5 23.56000 7.329042 +## 6 8.40000 -7.830958 +``` + +# `group_by` + +Generating summary statistics by stratum + + +```r +chicago <- mutate(chicago, + tempcat = factor(1 * (tmpd > 80), + labels = c("cold", "hot"))) +hotcold <- group_by(chicago, tempcat) +summarize(hotcold, pm25 = mean(pm25, na.rm = TRUE), + o3 = max(o3tmean2), + no2 = median(no2tmean2)) +``` + +``` +## Source: local data frame [3 x 4] +## +## tempcat pm25 o3 no2 +## 1 cold 15.97807 66.587500 24.54924 +## 2 hot 26.48118 62.969656 24.93870 +## 3 NA 47.73750 9.416667 37.44444 +``` + + +# `group_by` + +Generating summary statistics by stratum + + +```r +chicago <- mutate(chicago, + year = as.POSIXlt(date)$year + 1900) +years <- group_by(chicago, year) +summarize(years, pm25 = mean(pm25, na.rm = TRUE), + o3 = max(o3tmean2, na.rm = TRUE), + no2 = median(no2tmean2, na.rm = TRUE)) +``` + +``` +## Source: local data frame [19 x 4] +## +## year pm25 o3 no2 +## 1 1987 NaN 62.96966 23.49369 +## 2 1988 NaN 61.67708 24.52296 +## 3 1989 NaN 59.72727 26.14062 +## 4 1990 NaN 52.22917 22.59583 +## 5 1991 NaN 63.10417 21.38194 +## 6 1992 NaN 50.82870 24.78921 +## 7 1993 NaN 44.30093 25.76993 +## 8 1994 NaN 52.17844 28.47500 +## 9 1995 NaN 66.58750 27.26042 +## 10 1996 NaN 58.39583 26.38715 +## 11 1997 NaN 56.54167 25.48143 +## 12 1998 18.26467 50.66250 24.58649 +## 13 1999 18.49646 57.48864 24.66667 +## 14 2000 16.93806 55.76103 23.46082 +## 15 2001 16.92632 51.81984 25.06522 +## 16 2002 15.27335 54.88043 22.73750 +## 17 2003 15.23183 56.16608 24.62500 +## 18 2004 14.62864 44.48240 23.39130 +## 19 2005 16.18556 58.84126 22.62387 +``` + + + + +# `%>%` + + +```r +chicago %>% mutate(month = as.POSIXlt(date)$mon + 1) + %>% group_by(month) + %>% summarize(pm25 = mean(pm25, na.rm = TRUE), + o3 = max(o3tmean2, na.rm = TRUE), + no2 = median(no2tmean2, na.rm = TRUE)) +``` + + +``` +## Source: local data frame [12 x 4] +## +## month pm25 o3 no2 +## 1 1 17.76996 28.22222 25.35417 +## 2 2 20.37513 37.37500 26.78034 +## 3 3 17.40818 39.05000 26.76984 +## 4 4 13.85879 47.94907 25.03125 +## 5 5 14.07420 52.75000 24.22222 +## 6 6 15.86461 66.58750 25.01140 +## 7 7 16.57087 59.54167 22.38442 +## 8 8 16.93380 53.96701 22.98333 +## 9 9 15.91279 57.48864 24.47917 +## 10 10 14.23557 47.09275 24.15217 +## 11 11 15.15794 29.45833 23.56537 +## 12 12 17.52221 27.70833 24.45773 +``` + + +# dplyr + +Once you learn the dplyr "grammar" there are a few additional benefits + +* dplyr can work with other data frame "backends" + +* `data.table` for large fast tables + +* SQL interface for relational databases via the DBI package + + diff --git a/03_GettingData/dplyr/dplyr.pdf b/03_GettingData/dplyr/dplyr.pdf new file mode 100644 index 000000000..e96071c7f Binary files /dev/null and b/03_GettingData/dplyr/dplyr.pdf differ diff --git a/03_GettingData/dplyr/slides/dplyr01.png b/03_GettingData/dplyr/slides/dplyr01.png new file mode 100644 index 000000000..5dc66db03 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr01.png differ diff --git a/03_GettingData/dplyr/slides/dplyr02.png b/03_GettingData/dplyr/slides/dplyr02.png new file mode 100644 index 000000000..70b9cb38b Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr02.png differ diff --git a/03_GettingData/dplyr/slides/dplyr03.png b/03_GettingData/dplyr/slides/dplyr03.png new file mode 100644 index 000000000..b927ff72e Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr03.png differ diff --git a/03_GettingData/dplyr/slides/dplyr04.png b/03_GettingData/dplyr/slides/dplyr04.png new file mode 100644 index 000000000..ccd89b4f5 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr04.png differ diff --git a/03_GettingData/dplyr/slides/dplyr05.png b/03_GettingData/dplyr/slides/dplyr05.png new file mode 100644 index 000000000..6c67067f9 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr05.png differ diff --git a/03_GettingData/dplyr/slides/dplyr06.png b/03_GettingData/dplyr/slides/dplyr06.png new file mode 100644 index 000000000..0a8bea928 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr06.png differ diff --git a/03_GettingData/dplyr/slides/dplyr07.png b/03_GettingData/dplyr/slides/dplyr07.png new file mode 100644 index 000000000..7ee8af9ae Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr07.png differ diff --git a/03_GettingData/dplyr/slides/dplyr08.png b/03_GettingData/dplyr/slides/dplyr08.png new file mode 100644 index 000000000..bf184fc11 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr08.png differ diff --git a/03_GettingData/dplyr/slides/dplyr09.png b/03_GettingData/dplyr/slides/dplyr09.png new file mode 100644 index 000000000..7d4db5d01 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr09.png differ diff --git a/03_GettingData/dplyr/slides/dplyr10.png b/03_GettingData/dplyr/slides/dplyr10.png new file mode 100644 index 000000000..8b4681145 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr10.png differ diff --git a/03_GettingData/dplyr/slides/dplyr11.png b/03_GettingData/dplyr/slides/dplyr11.png new file mode 100644 index 000000000..22ed7b4c1 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr11.png differ diff --git a/03_GettingData/dplyr/slides/dplyr12.png b/03_GettingData/dplyr/slides/dplyr12.png new file mode 100644 index 000000000..b5f2e330d Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr12.png differ diff --git a/03_GettingData/dplyr/slides/dplyr13.png b/03_GettingData/dplyr/slides/dplyr13.png new file mode 100644 index 000000000..fb78b6895 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr13.png differ diff --git a/03_GettingData/dplyr/slides/dplyr14.png b/03_GettingData/dplyr/slides/dplyr14.png new file mode 100644 index 000000000..538d58e2d Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr14.png differ diff --git a/03_GettingData/dplyr/slides/dplyr15.png b/03_GettingData/dplyr/slides/dplyr15.png new file mode 100644 index 000000000..742c845b0 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr15.png differ diff --git a/03_GettingData/dplyr/slides/dplyr16.png b/03_GettingData/dplyr/slides/dplyr16.png new file mode 100644 index 000000000..940bab3c7 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr16.png differ diff --git a/03_GettingData/dplyr/slides/dplyr17.png b/03_GettingData/dplyr/slides/dplyr17.png new file mode 100644 index 000000000..bb0d1dca5 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr17.png differ diff --git a/03_GettingData/dplyr/slides/dplyr18.png b/03_GettingData/dplyr/slides/dplyr18.png new file mode 100644 index 000000000..7da92a4f6 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr18.png differ diff --git a/03_GettingData/dplyr/slides/dplyr19.png b/03_GettingData/dplyr/slides/dplyr19.png new file mode 100644 index 000000000..f9dce91e3 Binary files /dev/null and b/03_GettingData/dplyr/slides/dplyr19.png differ diff --git a/04_ExploratoryAnalysis/PlottingSystems/Plotting Systems in R.pdf b/04_ExploratoryAnalysis/PlottingSystems/Plotting Systems in R.pdf new file mode 100644 index 000000000..2e961f051 Binary files /dev/null and b/04_ExploratoryAnalysis/PlottingSystems/Plotting Systems in R.pdf differ diff --git a/04_ExploratoryAnalysis/exploratoryGraphs/PM25data.zip b/04_ExploratoryAnalysis/exploratoryGraphs/PM25data.zip new file mode 100644 index 000000000..bba3a4045 Binary files /dev/null and b/04_ExploratoryAnalysis/exploratoryGraphs/PM25data.zip differ diff --git a/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pdf b/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pdf index 092a7e4c4..a04ed124a 100644 Binary files a/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pdf and b/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pdf differ diff --git a/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pptx b/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pptx index 44e98c5d4..7c3ec5979 100644 Binary files a/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pptx and b/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pptx differ diff --git a/04_ExploratoryAnalysis/kmeansClustering/index.Rmd b/04_ExploratoryAnalysis/kmeansClustering/index.Rmd index 89c6a79f9..11a89d1e2 100644 --- a/04_ExploratoryAnalysis/kmeansClustering/index.Rmd +++ b/04_ExploratoryAnalysis/kmeansClustering/index.Rmd @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} @@ -250,7 +250,7 @@ points(kmeansObj$centers,col=1:3,pch=3,cex=3,lwd=3) ```{r, dependson="kmeans",fig.height=4,fig.width=8} set.seed(1234) dataMatrix <- as.matrix(dataFrame)[sample(1:12),] -kmeansObj2 <- kmeans(dataMatrix,centers=3) +kmeansObj <- kmeans(dataMatrix,centers=3) par(mfrow=c(1,2), mar = c(2, 4, 0.1, 0.1)) image(t(dataMatrix)[,nrow(dataMatrix):1],yaxt="n") image(t(dataMatrix)[,order(kmeansObj$cluster)],yaxt="n") diff --git a/04_ExploratoryAnalysis/kmeansClustering/index.html b/04_ExploratoryAnalysis/kmeansClustering/index.html index 2d012c6e6..d15a87c36 100644 --- a/04_ExploratoryAnalysis/kmeansClustering/index.html +++ b/04_ExploratoryAnalysis/kmeansClustering/index.html @@ -8,51 +8,46 @@ - - + - - - - + + - - - - - - - + - + + +
                            +

                            K-means Clustering

                            +

                            +

                            Roger D. Peng, Associate Professor of Biostatistics
                            Johns Hopkins Bloomberg School of Public Health

                            +
                            +
                            +
                            - - - - -
                            -

                            K-means Clustering

                            -

                            -

                            Roger D. Peng, Associate Professor of Biostatistics
                            Johns Hopkins Bloomberg School of Public Health

                            -
                            -
                            - - +

                            Can we find things that are close together?

                            -
                            +
                            • How do we define close?
                            • How do we group things?
                            • @@ -64,11 +59,11 @@

                              Can we find things that are close together?

                              - +

                              How do we define close?

                              -
                              +
                              • Most important step @@ -89,11 +84,11 @@

                                How do we define close?

                                - +

                                K-means clustering

                                -
                                +
                                • A partioning approach @@ -122,91 +117,90 @@

                                  K-means clustering

                                  - +

                                  K-means clustering - example

                                  -
                                  -
                                  set.seed(1234)
                                  -par(mar = c(0, 0, 0, 0))
                                  -x <- rnorm(12, mean = rep(1:3, each = 4), sd = 0.2)
                                  -y <- rnorm(12, mean = rep(c(1, 2, 1), each = 4), sd = 0.2)
                                  -plot(x, y, col = "blue", pch = 19, cex = 2)
                                  -text(x + 0.05, y + 0.05, labels = as.character(1:12))
                                  +  
                                  +
                                  set.seed(1234); par(mar=c(0,0,0,0))
                                  +x <- rnorm(12,mean=rep(1:3,each=4),sd=0.2)
                                  +y <- rnorm(12,mean=rep(c(1,2,1),each=4),sd=0.2)
                                  +plot(x,y,col="blue",pch=19,cex=2)
                                  +text(x+0.05,y+0.05,labels=as.character(1:12))
                                   
                                  -

                                  plot of chunk createData

                                  +

                                  plot of chunk createData

                                  - +

                                  K-means clustering - starting centroids

                                  -
                                  -

                                  plot of chunk unnamed-chunk-1

                                  +
                                  +

                                  plot of chunk unnamed-chunk-1

                                  - +

                                  K-means clustering - assign to closest centroid

                                  -
                                  -

                                  plot of chunk unnamed-chunk-2

                                  +
                                  +

                                  plot of chunk unnamed-chunk-2

                                  - +

                                  K-means clustering - recalculate centroids

                                  -
                                  -

                                  plot of chunk unnamed-chunk-3

                                  +
                                  +

                                  plot of chunk unnamed-chunk-3

                                  - +

                                  K-means clustering - reassign values

                                  -
                                  -

                                  plot of chunk unnamed-chunk-4

                                  +
                                  +

                                  plot of chunk unnamed-chunk-4

                                  - +

                                  K-means clustering - update centroids

                                  -
                                  -

                                  plot of chunk unnamed-chunk-5

                                  +
                                  +

                                  plot of chunk unnamed-chunk-5

                                  - +

                                  kmeans()

                                  -
                                  +
                                  • Important parameters: x, centers, iter.max, nstart
                                  -
                                  dataFrame <- data.frame(x, y)
                                  -kmeansObj <- kmeans(dataFrame, centers = 3)
                                  +
                                  dataFrame <- data.frame(x,y)
                                  +kmeansObj <- kmeans(dataFrame,centers=3)
                                   names(kmeansObj)
                                   
                                  @@ -225,46 +219,46 @@

                                  kmeans()

                                  - +

                                  kmeans()

                                  -
                                  -
                                  par(mar = rep(0.2, 4))
                                  -plot(x, y, col = kmeansObj$cluster, pch = 19, cex = 2)
                                  -points(kmeansObj$centers, col = 1:3, pch = 3, cex = 3, lwd = 3)
                                  +  
                                  +
                                  par(mar=rep(0.2,4))
                                  +plot(x,y,col=kmeansObj$cluster,pch=19,cex=2)
                                  +points(kmeansObj$centers,col=1:3,pch=3,cex=3,lwd=3)
                                   
                                  -

                                  plot of chunk unnamed-chunk-6

                                  +

                                  plot of chunk unnamed-chunk-6

                                  - +

                                  Heatmaps

                                  -
                                  +
                                  set.seed(1234)
                                  -dataMatrix <- as.matrix(dataFrame)[sample(1:12), ]
                                  -kmeansObj2 <- kmeans(dataMatrix, centers = 3)
                                  -par(mfrow = c(1, 2), mar = c(2, 4, 0.1, 0.1))
                                  -image(t(dataMatrix)[, nrow(dataMatrix):1], yaxt = "n")
                                  -image(t(dataMatrix)[, order(kmeansObj$cluster)], yaxt = "n")
                                  +dataMatrix <- as.matrix(dataFrame)[sample(1:12),]
                                  +kmeansObj <- kmeans(dataMatrix,centers=3)
                                  +par(mfrow=c(1,2), mar = c(2, 4, 0.1, 0.1))
                                  +image(t(dataMatrix)[,nrow(dataMatrix):1],yaxt="n")
                                  +image(t(dataMatrix)[,order(kmeansObj$cluster)],yaxt="n")
                                   
                                  -

                                  plot of chunk unnamed-chunk-7

                                  +

                                  plot of chunk unnamed-chunk-7

                                  - +

                                  Notes and further resources

                                  -
                                  +
                                  • K-means requires a number of clusters @@ -289,34 +283,113 @@

                                    Notes and further resources

                                    - - - - - - - - - - - + + + - - - - \ No newline at end of file + + + + + \ No newline at end of file diff --git a/04_ExploratoryAnalysis/kmeansClustering/index.md b/04_ExploratoryAnalysis/kmeansClustering/index.md index 940dc5d43..582da7bb4 100644 --- a/04_ExploratoryAnalysis/kmeansClustering/index.md +++ b/04_ExploratoryAnalysis/kmeansClustering/index.md @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} @@ -60,16 +60,14 @@ mode : selfcontained # {standalone, draft} ```r -set.seed(1234) -par(mar = c(0, 0, 0, 0)) -x <- rnorm(12, mean = rep(1:3, each = 4), sd = 0.2) -y <- rnorm(12, mean = rep(c(1, 2, 1), each = 4), sd = 0.2) -plot(x, y, col = "blue", pch = 19, cex = 2) -text(x + 0.05, y + 0.05, labels = as.character(1:12)) +set.seed(1234); par(mar=c(0,0,0,0)) +x <- rnorm(12,mean=rep(1:3,each=4),sd=0.2) +y <- rnorm(12,mean=rep(c(1,2,1),each=4),sd=0.2) +plot(x,y,col="blue",pch=19,cex=2) +text(x+0.05,y+0.05,labels=as.character(1:12)) ``` -![plot of chunk createData](figure/createData.png) - +![plot of chunk createData](assets/fig/createData-1.png) --- @@ -77,30 +75,26 @@ text(x + 0.05, y + 0.05, labels = as.character(1:12)) ## K-means clustering - starting centroids -![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1.png) - +![plot of chunk unnamed-chunk-1](assets/fig/unnamed-chunk-1-1.png) --- ## K-means clustering - assign to closest centroid -![plot of chunk unnamed-chunk-2](figure/unnamed-chunk-2.png) - +![plot of chunk unnamed-chunk-2](assets/fig/unnamed-chunk-2-1.png) --- ## K-means clustering - recalculate centroids -![plot of chunk unnamed-chunk-3](figure/unnamed-chunk-3.png) - +![plot of chunk unnamed-chunk-3](assets/fig/unnamed-chunk-3-1.png) --- ## K-means clustering - reassign values -![plot of chunk unnamed-chunk-4](figure/unnamed-chunk-4.png) - +![plot of chunk unnamed-chunk-4](assets/fig/unnamed-chunk-4-1.png) @@ -108,8 +102,7 @@ text(x + 0.05, y + 0.05, labels = as.character(1:12)) ## K-means clustering - update centroids -![plot of chunk unnamed-chunk-5](figure/unnamed-chunk-5.png) - +![plot of chunk unnamed-chunk-5](assets/fig/unnamed-chunk-5-1.png) --- @@ -120,8 +113,8 @@ text(x + 0.05, y + 0.05, labels = as.character(1:12)) ```r -dataFrame <- data.frame(x, y) -kmeansObj <- kmeans(dataFrame, centers = 3) +dataFrame <- data.frame(x,y) +kmeansObj <- kmeans(dataFrame,centers=3) names(kmeansObj) ``` @@ -139,20 +132,18 @@ kmeansObj$cluster ## [1] 3 3 3 3 1 1 1 1 2 2 2 2 ``` - --- ## `kmeans()` ```r -par(mar = rep(0.2, 4)) -plot(x, y, col = kmeansObj$cluster, pch = 19, cex = 2) -points(kmeansObj$centers, col = 1:3, pch = 3, cex = 3, lwd = 3) +par(mar=rep(0.2,4)) +plot(x,y,col=kmeansObj$cluster,pch=19,cex=2) +points(kmeansObj$centers,col=1:3,pch=3,cex=3,lwd=3) ``` -![plot of chunk unnamed-chunk-6](figure/unnamed-chunk-6.png) - +![plot of chunk unnamed-chunk-6](assets/fig/unnamed-chunk-6-1.png) --- @@ -161,15 +152,14 @@ points(kmeansObj$centers, col = 1:3, pch = 3, cex = 3, lwd = 3) ```r set.seed(1234) -dataMatrix <- as.matrix(dataFrame)[sample(1:12), ] -kmeansObj2 <- kmeans(dataMatrix, centers = 3) -par(mfrow = c(1, 2), mar = c(2, 4, 0.1, 0.1)) -image(t(dataMatrix)[, nrow(dataMatrix):1], yaxt = "n") -image(t(dataMatrix)[, order(kmeansObj$cluster)], yaxt = "n") +dataMatrix <- as.matrix(dataFrame)[sample(1:12),] +kmeansObj <- kmeans(dataMatrix,centers=3) +par(mfrow=c(1,2), mar = c(2, 4, 0.1, 0.1)) +image(t(dataMatrix)[,nrow(dataMatrix):1],yaxt="n") +image(t(dataMatrix)[,order(kmeansObj$cluster)],yaxt="n") ``` -![plot of chunk unnamed-chunk-7](figure/unnamed-chunk-7.png) - +![plot of chunk unnamed-chunk-7](assets/fig/unnamed-chunk-7-1.png) diff --git a/05_ReproducibleResearch/Checklist/Reproducible Research Checklist.pdf b/05_ReproducibleResearch/Checklist/Reproducible Research Checklist.pdf new file mode 100644 index 000000000..7fb627ba1 Binary files /dev/null and b/05_ReproducibleResearch/Checklist/Reproducible Research Checklist.pdf differ diff --git a/05_ReproducibleResearch/organizingADataAnalysis/index.Rmd b/05_ReproducibleResearch/organizingADataAnalysis/index.Rmd index eae99fc98..e2db82a63 100644 --- a/05_ReproducibleResearch/organizingADataAnalysis/index.Rmd +++ b/05_ReproducibleResearch/organizingADataAnalysis/index.Rmd @@ -110,7 +110,7 @@ mode : selfcontained # {standalone, draft} * Not necessary if you use R markdown * Should contain step-by-step instructions for analysis -* Here is an example [https://github.com/jtleek/swfdr/blob/master/README](https://github.com/jtleek/swfdr/blob/master/README) +* Here is an example [https://github.com/jtleek/swfdr/blob/master/README.md](https://github.com/jtleek/swfdr/blob/master/README.md) --- diff --git a/05_ReproducibleResearch/organizingADataAnalysis/index.html b/05_ReproducibleResearch/organizingADataAnalysis/index.html index 027e1d484..ab751d4e7 100644 --- a/05_ReproducibleResearch/organizingADataAnalysis/index.html +++ b/05_ReproducibleResearch/organizingADataAnalysis/index.html @@ -1,59 +1,171 @@ - Organizing a Data Analysis - - - - - - - - - - - - - - - - - - - - + + +Data analysis files + + + + + + + + + + - - - - - - - - - - - -
                                    -

                                    Organizing a Data Analysis

                                    -

                                    -

                                    Roger D. Peng, Associate Professor of Biostatistics
                                    Johns Hopkins Bloomberg School of Public Health

                                    -
                                    -
                                    - - - -
                                    -

                                    Data analysis files

                                    -
                                    -
                                    -
                                      + + +

                                      Data analysis files

                                      + +
                                      • Data
                                          @@ -81,32 +193,22 @@

                                          Data analysis files

                                      -
                                    - -
                                    +
                                    - -
                                    -

                                    Raw Data

                                    -
                                    -
                                    -

                                    +

                                    Raw Data

                                    + +

                                    • Should be stored in your analysis folder
                                    • If accessed from the web, include url, description, and date accessed in README
                                    -
                                    - -
                                    +
                                    + +

                                    Processed data

                                    - -
                                    -

                                    Processed data

                                    -
                                    -
                                    -

                                    +

                                    • Processed data should be named so it is easy to see which script generated the data.
                                    • @@ -114,32 +216,22 @@

                                      Processed data

                                    • Processed data should be tidy
                                    -
                                    - -
                                    +
                                    + +

                                    Exploratory figures

                                    - -
                                    -

                                    Exploratory figures

                                    -
                                    -
                                    -

                                    +

                                    • Figures made during the course of your analysis, not necessarily part of your final report.
                                    • -
                                    • They do not need to be "pretty"
                                    • +
                                    • They do not need to be “pretty”
                                    -
                                    - -
                                    +
                                    - -
                                    -

                                    Final Figures

                                    -
                                    -
                                    -

                                    +

                                    Final Figures

                                    + +

                                    • Usually a small subset of the original figures
                                    • @@ -147,16 +239,11 @@

                                      Final Figures

                                    • Possibly multiple panels
                                    -
                                    - -
                                    +
                                    + +

                                    Raw scripts

                                    - -
                                    -

                                    Raw scripts

                                    -
                                    -
                                    -

                                    +

                                    • May be less commented (but comments help you!)
                                    • @@ -164,16 +251,11 @@

                                      Raw scripts

                                    • May include analyses that are later discarded
                                    -
                                    - -
                                    +
                                    - -
                                    -

                                    Final scripts

                                    -
                                    -
                                    -

                                    +

                                    Final scripts

                                    + +

                                    • Clearly commented @@ -186,16 +268,11 @@

                                      Final scripts

                                    • Only analyses that appear in the final write-up
                                    -
                                    - -
                                    +
                                    + +

                                    R markdown files

                                    - -
                                    -

                                    R markdown files

                                    -
                                    -
                                    -

                                    +

                                    • R markdown files can be used to generate reproducible reports
                                    • @@ -203,33 +280,23 @@

                                      R markdown files

                                    • Very easy to create in Rstudio
                                    -
                                    - -
                                    +
                                    - -
                                    -

                                    Readme files

                                    -
                                    - - -
                                    +
                                    + +

                                    Text of the document

                                    - -
                                    -

                                    Text of the document

                                    -
                                    -
                                    -

                                    +

                                    • It should include a title, introduction (motivation), methods (statistics you used), results (including measures of uncertainty), and conclusions (including potential problems)
                                    • @@ -238,56 +305,17 @@

                                      Text of the document

                                    • References should be included for statistical methods
                                    -
                                    - -
                                    +
                                    + +

                                    Further resources

                                    - -
                                    -

                                    Further resources

                                    -
                                    - - -
                                    - - -
                                    - - - - - - - - - - - - - - - - \ No newline at end of file + + diff --git a/05_ReproducibleResearch/organizingADataAnalysis/index.md b/05_ReproducibleResearch/organizingADataAnalysis/index.md index eae99fc98..e2db82a63 100644 --- a/05_ReproducibleResearch/organizingADataAnalysis/index.md +++ b/05_ReproducibleResearch/organizingADataAnalysis/index.md @@ -110,7 +110,7 @@ mode : selfcontained # {standalone, draft} * Not necessary if you use R markdown * Should contain step-by-step instructions for analysis -* Here is an example [https://github.com/jtleek/swfdr/blob/master/README](https://github.com/jtleek/swfdr/blob/master/README) +* Here is an example [https://github.com/jtleek/swfdr/blob/master/README.md](https://github.com/jtleek/swfdr/blob/master/README.md) --- diff --git a/06_StatisticalInference/03_05_MultipleTesting/index.pdf b/06_StatisticalInference/03_05_MultipleTesting/index.pdf deleted file mode 100644 index 190c24c34..000000000 Binary files a/06_StatisticalInference/03_05_MultipleTesting/index.pdf and /dev/null differ diff --git a/06_StatisticalInference/01_01_Introduction/index.pdf b/06_StatisticalInference/lectures/01_01_Introduction.pdf similarity index 100% rename from 06_StatisticalInference/01_01_Introduction/index.pdf rename to 06_StatisticalInference/lectures/01_01_Introduction.pdf diff --git a/06_StatisticalInference/01_02_Probability/index.pdf b/06_StatisticalInference/lectures/01_02_Probability.pdf similarity index 100% rename from 06_StatisticalInference/01_02_Probability/index.pdf rename to 06_StatisticalInference/lectures/01_02_Probability.pdf diff --git a/06_StatisticalInference/01_03_Expectations/index.pdf b/06_StatisticalInference/lectures/01_03_Expectations.pdf similarity index 100% rename from 06_StatisticalInference/01_03_Expectations/index.pdf rename to 06_StatisticalInference/lectures/01_03_Expectations.pdf diff --git a/06_StatisticalInference/01_04_Independence/index.pdf b/06_StatisticalInference/lectures/01_04_Independence.pdf similarity index 100% rename from 06_StatisticalInference/01_04_Independence/index.pdf rename to 06_StatisticalInference/lectures/01_04_Independence.pdf diff --git a/06_StatisticalInference/01_05_ConditionalProbability/index.pdf b/06_StatisticalInference/lectures/01_05_ConditionalProbability.pdf similarity index 100% rename from 06_StatisticalInference/01_05_ConditionalProbability/index.pdf rename to 06_StatisticalInference/lectures/01_05_ConditionalProbability.pdf diff --git a/06_StatisticalInference/02_01_CommonDistributions/index.pdf b/06_StatisticalInference/lectures/02_01_CommonDistributions.pdf similarity index 100% rename from 06_StatisticalInference/02_01_CommonDistributions/index.pdf rename to 06_StatisticalInference/lectures/02_01_CommonDistributions.pdf diff --git a/06_StatisticalInference/02_02_Asymptopia/index.pdf b/06_StatisticalInference/lectures/02_02_Asymptopia.pdf similarity index 100% rename from 06_StatisticalInference/02_02_Asymptopia/index.pdf rename to 06_StatisticalInference/lectures/02_02_Asymptopia.pdf diff --git a/06_StatisticalInference/02_03_tCIs/index.pdf b/06_StatisticalInference/lectures/02_03_tCIs.pdf similarity index 100% rename from 06_StatisticalInference/02_03_tCIs/index.pdf rename to 06_StatisticalInference/lectures/02_03_tCIs.pdf diff --git a/06_StatisticalInference/02_04_Likeklihood/index.pdf b/06_StatisticalInference/lectures/02_04_Likeklihood.pdf similarity index 100% rename from 06_StatisticalInference/02_04_Likeklihood/index.pdf rename to 06_StatisticalInference/lectures/02_04_Likeklihood.pdf diff --git a/06_StatisticalInference/02_05_Bayes/index.pdf b/06_StatisticalInference/lectures/02_05_Bayes.pdf similarity index 100% rename from 06_StatisticalInference/02_05_Bayes/index.pdf rename to 06_StatisticalInference/lectures/02_05_Bayes.pdf diff --git a/06_StatisticalInference/03_01_TwoGroupIntervals/index.pdf b/06_StatisticalInference/lectures/03_01_TwoGroupIntervals.pdf similarity index 100% rename from 06_StatisticalInference/03_01_TwoGroupIntervals/index.pdf rename to 06_StatisticalInference/lectures/03_01_TwoGroupIntervals.pdf diff --git a/06_StatisticalInference/03_02_HypothesisTesting/index.pdf b/06_StatisticalInference/lectures/03_02_HypothesisTesting.pdf similarity index 100% rename from 06_StatisticalInference/03_02_HypothesisTesting/index.pdf rename to 06_StatisticalInference/lectures/03_02_HypothesisTesting.pdf diff --git a/06_StatisticalInference/03_03_pValues/index.pdf b/06_StatisticalInference/lectures/03_03_pValues.pdf similarity index 100% rename from 06_StatisticalInference/03_03_pValues/index.pdf rename to 06_StatisticalInference/lectures/03_03_pValues.pdf diff --git a/06_StatisticalInference/03_04_Power/index.pdf b/06_StatisticalInference/lectures/03_04_Power.pdf similarity index 100% rename from 06_StatisticalInference/03_04_Power/index.pdf rename to 06_StatisticalInference/lectures/03_04_Power.pdf diff --git a/06_StatisticalInference/lectures/03_05_MultipleTesting.pdf b/06_StatisticalInference/lectures/03_05_MultipleTesting.pdf new file mode 100644 index 000000000..a82e4a5c4 Binary files /dev/null and b/06_StatisticalInference/lectures/03_05_MultipleTesting.pdf differ diff --git a/06_StatisticalInference/03_06_resampledInference/index.pdf b/06_StatisticalInference/lectures/03_06_resampledInference.pdf similarity index 100% rename from 06_StatisticalInference/03_06_resampledInference/index.pdf rename to 06_StatisticalInference/lectures/03_06_resampledInference.pdf diff --git a/06_StatisticalInference/makefile b/06_StatisticalInference/makefile new file mode 100755 index 000000000..12afe4516 --- /dev/null +++ b/06_StatisticalInference/makefile @@ -0,0 +1,28 @@ +DELAY = 1000 +RMD_FILES = $(wildcard */index.Rmd) +HTML_FILES = $(patsubst %.Rmd, %.html, $(RMD_FILES)) +PDF_FILES = $(patsubst %.html, %.pdf, $(HTML_FILES)) +PDF_FILES2 = $(patsubst %/index.pdf, lectures/%.pdf, $(PDF_FILES)) + +lectures: $(PDF_FILES2) +lectures/%.pdf: %/index.pdf + cp $< $@ + +files: + @echo $(RMD_FILES) + @echo $(HTML_FILES) + @echo $(PDF_FILES) + +html: $(HTML_FILES) +pdf: $(PDF_FILES) +all: html pdf + +zip: $(PDF_FILES) + zip all_pdf_files.zip $^ + +%/index.pdf: %/index.html + casperjs makepdf.js $< $@ $(DELAY) + +%/index.html: %/index.Rmd + cd $(dir $<) && Rscript -e "slidify::slidify('index.Rmd')" && cd .. + diff --git a/06_StatisticalInference/makepdf.js b/06_StatisticalInference/makepdf.js new file mode 100644 index 000000000..c01526f94 --- /dev/null +++ b/06_StatisticalInference/makepdf.js @@ -0,0 +1,10 @@ +var casper = require('casper').create({viewportSize:{width:1500,height:1000}}); +var args = casper.cli.args; +var imgfile = (args[1] || Math.random().toString(36).slice(2)) +casper.start(args[0], function() { + this.wait(args[2], function(){ + this.captureSelector(imgfile, "slides"); + }); +}); + +casper.run(); \ No newline at end of file diff --git a/07_RegressionModels/01_04_rttm/index.Rmd b/07_RegressionModels/01_04_rttm/index.Rmd index 59404ffa5..2759fbb8f 100644 --- a/07_RegressionModels/01_04_rttm/index.Rmd +++ b/07_RegressionModels/01_04_rttm/index.Rmd @@ -40,7 +40,7 @@ knit_hooks$set(plot = knitr:::hook_plot_html) --- ## Regression to the mean * These phenomena are all examples of so-called regression to the mean -* Invented by Francis Galton in the paper "Regression towvards mediocrity in hereditary stature" The Journal of the Anthropological Institute of Great Britain and Ireland , Vol. 15, (1886). +* Invented by Francis Galton in the paper "Regression towards mediocrity in hereditary stature" The Journal of the Anthropological Institute of Great Britain and Ireland , Vol. 15, (1886). * Think of it this way, imagine if you simulated pairs of random normals * The largest first ones would be the largest by chance, and the probability that there are smaller for the second simulation is high. * In other words $P(Y < x | X = x)$ gets bigger as $x$ heads into the very large values. diff --git a/07_RegressionModels/02_01_multivariate/index.Rmd b/07_RegressionModels/02_01_multivariate/index.Rmd index d381d804e..2a8ccbce8 100644 --- a/07_RegressionModels/02_01_multivariate/index.Rmd +++ b/07_RegressionModels/02_01_multivariate/index.Rmd @@ -185,7 +185,7 @@ $$ ## We can tidy it up a bit more, though Note that $$ -X_k = e_{i,X_k|X_p} + \frac{\sum_{i=1}^n X_{ik} X_{ip}}{\sum_{i=1}^n X_{ip^2}} X_p +X_k = e_{i,X_k|X_p} + \frac{\sum_{i=1}^n X_{ik} X_{ip}}{\sum_{i=1}^n X_{ip}^2} X_p $$ and $\sum_{i=1}^n e_{i,X_j | X_p} X_{ip} = 0$. Thus diff --git a/07_RegressionModels/02_04_residuals_variation_diagnostics/index.Rmd b/07_RegressionModels/02_04_residuals_variation_diagnostics/index.Rmd index 35840162e..b1fead2bb 100644 --- a/07_RegressionModels/02_04_residuals_variation_diagnostics/index.Rmd +++ b/07_RegressionModels/02_04_residuals_variation_diagnostics/index.Rmd @@ -74,7 +74,7 @@ Calling a point an outlier is vague. * `hatvalues` - measures of leverage * `dffits` - change in the predicted response when the $i^{th}$ point is deleted in fitting the model. * `dfbetas` - change in individual coefficients when the $i^{th}$ point is deleted in fitting the model. - * `cooks.distance` - overall change in teh coefficients when the $i^{th}$ point is deleted. + * `cooks.distance` - overall change in the coefficients when the $i^{th}$ point is deleted. * `resid` - returns the ordinary residuals * `resid(fit) / (1 - hatvalues(fit))` where `fit` is the linear model fit returns the PRESS residuals, i.e. the leave one out cross validation residuals - the difference in the response and the predicted response at data point $i$, where it was not included in the model fitting. diff --git a/08_PracticalMachineLearning/025combiningPredictors/index.Rmd b/08_PracticalMachineLearning/025combiningPredictors/index.Rmd index 74310dddf..958b2c4d8 100644 --- a/08_PracticalMachineLearning/025combiningPredictors/index.Rmd +++ b/08_PracticalMachineLearning/025combiningPredictors/index.Rmd @@ -196,7 +196,7 @@ sqrt(sum((combPredV-validation$wage)^2)) * Predict the class by majority vote * This can get dramatically more complicated * Simple blending in caret: [caretEnsemble](https://github.com/zachmayer/caretEnsemble) (use at your own risk!) - * Wikipedia [ensemlbe learning](http://en.wikipedia.org/wiki/Ensemble_learning) + * Wikipedia [ensemble learning](http://en.wikipedia.org/wiki/Ensemble_learning) --- @@ -207,4 +207,4 @@ sqrt(sum((combPredV-validation$wage)^2)) [http://www.techdirt.com/blog/innovation/articles/20120409/03412518422/](http://www.techdirt.com/blog/innovation/articles/20120409/03412518422/) -[http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html](http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html) \ No newline at end of file +[http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html](http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html) diff --git a/08_PracticalMachineLearning/lectures.zip b/08_PracticalMachineLearning/lectures.zip new file mode 100644 index 000000000..ce591f382 Binary files /dev/null and b/08_PracticalMachineLearning/lectures.zip differ diff --git a/09_DevelopingDataProducts/RPackages/index.Rmd b/09_DevelopingDataProducts/RPackages/index.Rmd index c24af630b..aadee58a6 100644 --- a/09_DevelopingDataProducts/RPackages/index.Rmd +++ b/09_DevelopingDataProducts/RPackages/index.Rmd @@ -182,7 +182,7 @@ importFrom(graphics, plot) exportClasses("gpc.poly", "gpc.poly.nohole") -exportMethods("show", "get.bbox", "plot", "intersect”, "union”, "setdiff", +exportMethods("show", "get.bbox", "plot", "intersect", "union", "setdiff", "[", "append.poly", "scale.poly", "area.poly", "get.pts", "coerce", "tristrip", "triangulate") ``` diff --git a/09_DevelopingDataProducts/rCharts/index.Rmd b/09_DevelopingDataProducts/rCharts/index.Rmd index e4b5d27e3..3546dccc5 100644 --- a/09_DevelopingDataProducts/rCharts/index.Rmd +++ b/09_DevelopingDataProducts/rCharts/index.Rmd @@ -37,7 +37,7 @@ runif(1) - rCharts is a way to create interactive javascript visualizations using R - So - You don't have to learn complex tools, like D3 - - You simply work in R learning a minimal amount of new syntaxt + - You simply work in R learning a minimal amount of new syntax - rCharts was written by Ramnath Vaidyanathan (friend of the Data Science Series), who also wrote slidify, the framework we use for all of the lectures in the class - This lecture is basically going through (http://ramnathv.github.io/rCharts/) diff --git a/09_DevelopingDataProducts/rStudioPresent/index.Rpres b/09_DevelopingDataProducts/rStudioPresent/index.Rpres index a237721f7..00c9487c7 100644 --- a/09_DevelopingDataProducts/rStudioPresent/index.Rpres +++ b/09_DevelopingDataProducts/rStudioPresent/index.Rpres @@ -1,132 +1,132 @@ -RStudio Presenter -=== -author: Brian Caffo, Jeff Leek Roger Peng -date: `r format(Sys.Date(), format="%B %d %Y")` -transition: rotate - - -Department of Biostatistics -Bloomberg School of Public Health -Johns Hopkins University -Coursera Data Science Specialization - - - -RStudio Presentation -=== -- RStudio created a presentation authoring tool within their -development environment. -- If you are familiar with slidify, you will also be familiar with this tool - - Code is authored in a generalized markdown format that allows for code chunks - - The output is an html5 presentation - - The file index for the presenter file is .Rpres, which gets converted to an .md file and then to an html file if desired - - There's a preview tool in RStudio and GUIs for publishing to Rpubs or viewing/creating an html file - -Authoring content -=== -- This is a fairly complete guide - - http://www.rstudio.com/ide/docs/presentations/overview -- Quick start is - - `file` then `New File` then `R Presentation` - - (`alt-f` then `f` then `p` if you want key strokes) - - Use basically the same R markdown format for authoring as slidify/knitr - - Single quotes for inline code - - Tripple qutoes for block code - - Same options for code evaluation, caching, hiding etcetera - -Compiling and tools -=== -- R Studio auto formats and runs the code when you save the document -- Mathjax JS library is loaded by default so that `$x^2$` yields $x^2$ -- Slide navigation button on the preview; clicking on the notepad icon takes you to that slide in the deck -- Clicking on `more` yields options for - - Clearning the knitr cache - - Viewing in a browser (creates a temporay html file in `AppData/local/temp` for me) - - Create a html file to save where you want) -- A refresh button -- A zoom button that brings up a full window - -Visuals -=== -transition: linear - -- R Studio has made it easy to get some cool html5 effects, like cube transitions -with simple options in YAML-like code after the first slide such as -`transition: rotate` -- You can specify it in a slide-by-slide basis - -Here's the option "linear" -=== -transition: linear - -- Just put `transition: linear` right after the slide creation (three equal signs or more in a row) -- Tansition options - - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation - -Hierarchical organization -=== -type: section -- If you want a hierarchical organization structure, just add a `type: typename` option after the slide -- This changes the default appearance - - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation -- This is of type `section` - -Here's a subsection -=== -type: subsection - -Two columns -=== -- Do whatever for column one -- Then put `***` on a line by itself with blank lines before and after - -*** - -- Then do whatever for column two - - -Changing the slide font -========================================================== -font-import: http://fonts.googleapis.com/css?family=Risque -font-family: 'Risque' - -- Add a `font-family: fontname` option after the slide - - http://www.rstudio.com/ide/docs/presentations/customizing_fonts_and_appearance -- Specified in the same way as css font families - - http://www.w3schools.com/cssref/css_websafe_fonts.asp -- Use `font-import: url` to import fonts -- Important caveats - - Fonts must be present on the system that you're presenting on, or it will go to a fallback font - - You have to be connected to the internet to use an imported font (so don't rely on this for offline presentations) -- This is the `Risque` - - http://fonts.googleapis.com/css?family=Risque - -Really changing things -=== -- If you know html5 and CSS well, then you can basically change whatever you want -- A css file with the same names as your presentation will be autoimported -- You can use `css: file.css` to import a css file -- You have to create named classes and then use `class: classname` to get slide-specific style control from your css - - (Or you can apply then within a ``) -- Ultimately, you have an html file, that you can edit as you wish - - This should be viewed as a last resort, as the whole point is to have reproducible presentations, but may be the easiest way to get the exact style control you want for a final product - -Slidify versus R Studio Presenter -=== -**Slidify** -- Flexible control from the R MD file -- Under rapid ongoing development -- Large user base -- Lots and lots of styles and options -- Steeper learning curve -- More command-line oriented - -*** -**R Studio Presenter** -- Embedded in R Studio -- More GUI oriented -- Very easy to get started -- Smaller set of easy styles and options -- Default styles look very nice -- Ultimately as flexible as slidify with a little CSS and HTML knowledge - +RStudio Presenter +=== +author: Brian Caffo, Jeff Leek Roger Peng +date: `r format(Sys.Date(), format="%B %d %Y")` +transition: rotate + + +Department of Biostatistics +Bloomberg School of Public Health +Johns Hopkins University +Coursera Data Science Specialization + + + +RStudio Presentation +=== +- RStudio created a presentation authoring tool within their +development environment. +- If you are familiar with slidify, you will also be familiar with this tool + - Code is authored in a generalized markdown format that allows for code chunks + - The output is an html5 presentation + - The file index for the presenter file is .Rpres, which gets converted to an .md file and then to an html file if desired + - There's a preview tool in RStudio and GUIs for publishing to Rpubs or viewing/creating an html file + +Authoring content +=== +- This is a fairly complete guide + - http://www.rstudio.com/ide/docs/presentations/overview +- Quick start is + - `file` then `New File` then `R Presentation` + - (`alt-f` then `f` then `p` if you want key strokes) + - Use basically the same R markdown format for authoring as slidify/knitr + - Single quotes for inline code + - Tripple qutoes for block code + - Same options for code evaluation, caching, hiding etcetera + +Compiling and tools +=== +- R Studio auto formats and runs the code when you save the document +- Mathjax JS library is loaded by default so that `$x^2$` yields $x^2$ +- Slide navigation button on the preview; clicking on the notepad icon takes you to that slide in the deck +- Clicking on `more` yields options for + - Clearning the knitr cache + - Viewing in a browser (creates a temporay html file in `AppData/local/temp` for me) + - Create a html file to save where you want) +- A refresh button +- A zoom button that brings up a full window + +Visuals +=== +transition: linear + +- R Studio has made it easy to get some cool html5 effects, like cube transitions +with simple options in YAML-like code after the first slide such as +`transition: rotate` +- You can specify it in a slide-by-slide basis + +Here's the option "linear" +=== +transition: linear + +- Just put `transition: linear` right after the slide creation (three equal signs or more in a row) +- Tansition options + - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation + +Hierarchical organization +=== +type: section +- If you want a hierarchical organization structure, just add a `type: typename` option after the slide +- This changes the default appearance + - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation +- This is of type `section` + +Here's a subsection +=== +type: subsection + +Two columns +=== +- Do whatever for column one +- Then put `***` on a line by itself with blank lines before and after + +*** + +- Then do whatever for column two + + +Changing the slide font +========================================================== +font-import: http://fonts.googleapis.com/css?family=Risque +font-family: 'Risque' + +- Add a `font-family: fontname` option after the slide + - http://www.rstudio.com/ide/docs/presentations/customizing_fonts_and_appearance +- Specified in the same way as css font families + - http://www.w3schools.com/cssref/css_websafe_fonts.asp +- Use `font-import: url` to import fonts +- Important caveats + - Fonts must be present on the system that you're presenting on, or it will go to a fallback font + - You have to be connected to the internet to use an imported font (so don't rely on this for offline presentations) +- This is the `Risque` + - http://fonts.googleapis.com/css?family=Risque + +Really changing things +=== +- If you know html5 and CSS well, then you can basically change whatever you want +- A css file with the same names as your presentation will be autoimported +- You can use `css: file.css` to import a css file +- You have to create named classes and then use `class: classname` to get slide-specific style control from your css + - (Or you can apply then within a ``) +- Ultimately, you have an html file, that you can edit as you wish + - This should be viewed as a last resort, as the whole point is to have reproducible presentations, but may be the easiest way to get the exact style control you want for a final product + +Slidify versus R Studio Presenter +=== +**Slidify** +- Flexible control from the R MD file +- Under rapid ongoing development +- Large user base +- Lots and lots of styles and options +- Steeper learning curve +- More command-line oriented + +*** +**R Studio Presenter** +- Embedded in R Studio +- More GUI oriented +- Very easy to get started +- Smaller set of easy styles and options +- Default styles look very nice +- Ultimately as flexible as slidify with a little CSS and HTML knowledge + diff --git a/09_DevelopingDataProducts/rStudioPresent/index.md b/09_DevelopingDataProducts/rStudioPresent/index.md index 399fb071a..b998542ae 100644 --- a/09_DevelopingDataProducts/rStudioPresent/index.md +++ b/09_DevelopingDataProducts/rStudioPresent/index.md @@ -1,132 +1,132 @@ -RStudio Presenter -=== -author: Brian Caffo, Jeff Leek Roger Peng -date: April 24 2014 -transition: rotate - - -Department of Biostatistics -Bloomberg School of Public Health -Johns Hopkins University -Coursera Data Science Specialization - - - -RStudio Presentation -=== -- RStudio created a presentation authoring tool within their -development environment. -- If you are familiar with slidify, you will also be familiar with this tool - - Code is authored in a generalized markdown format that allows for code chunks - - The output is an html5 presentation - - The file index for the presenter file is .Rpres, which gets converted to an .md file and then to an html file if desired - - There's a preview tool in RStudio and GUIs for publishing to Rpubs or viewing/creating an html file - -Authoring content -=== -- This is a fairly complete guide - - http://www.rstudio.com/ide/docs/presentations/overview -- Quick start is - - `file` then `New File` then `R Presentation` - - (`alt-f` then `f` then `p` if you want key strokes) - - Use basically the same R markdown format for authoring as slidify/knitr - - Single quotes for inline code - - Tripple qutoes for block code - - Same options for code evaluation, caching, hiding etcetera - -Compiling and tools -=== -- R Studio auto formats and runs the code when you save the document -- Mathjax JS library is loaded by default so that `$x^2$` yields $x^2$ -- Slide navigation button on the preview; clicking on the notepad icon takes you to that slide in the deck -- Clicking on `more` yields options for - - Clearning the knitr cache - - Viewing in a browser (creates a temporay html file in `AppData/local/temp` for me) - - Create a html file to save where you want) -- A refresh button -- A zoom button that brings up a full window - -Visuals -=== -transition: linear - -- R Studio has made it easy to get some cool html5 effects, like cube transitions -with simple options in YAML-like code after the first slide such as -`transition: rotate` -- You can specify it in a slide-by-slide basis - -Here's the option "linear" -=== -transition: linear - -- Just put `transition: linear` right after the slide creation (three equal signs or more in a row) -- Tansition options - - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation - -Hierarchical organization -=== -type: section -- If you want a hierarchical organization structure, just add a `type: typename` option after the slide -- This changes the default appearance - - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation -- This is of type `section` - -Here's a subsection -=== -type: subsection - -Two columns -=== -- Do whatever for column one -- Then put `***` on a line by itself with blank lines before and after - -*** - -- Then do whatever for column two - - -Changing the slide font -========================================================== -font-import: http://fonts.googleapis.com/css?family=Risque -font-family: 'Risque' - -- Add a `font-family: fontname` option after the slide - - http://www.rstudio.com/ide/docs/presentations/customizing_fonts_and_appearance -- Specified in the same way as css font families - - http://www.w3schools.com/cssref/css_websafe_fonts.asp -- Use `font-import: url` to import fonts -- Important caveats - - Fonts must be present on the system that you're presenting on, or it will go to a fallback font - - You have to be connected to the internet to use an imported font (so don't rely on this for offline presentations) -- This is the `Risque` - - http://fonts.googleapis.com/css?family=Risque - -Really changing things -=== -- If you know html5 and CSS well, then you can basically change whatever you want -- A css file with the same names as your presentation will be autoimported -- You can use `css: file.css` to import a css file -- You have to create named classes and then use `class: classname` to get slide-specific style control from your css - - (Or you can apply then within a ``) -- Ultimately, you have an html file, that you can edit as you wish - - This should be viewed as a last resort, as the whole point is to have reproducible presentations, but may be the easiest way to get the exact style control you want for a final product - -Slidify versus R Studio Presenter -=== -**Slidify** -- Flexible control from the R MD file -- Under rapid ongoing development -- Large user base -- Lots and lots of styles and options -- Steeper learning curve -- More command-line oriented - -*** -**R Studio Presenter** -- Embedded in R Studio -- More GUI oriented -- Very easy to get started -- Smaller set of easy styles and options -- Default styles look very nice -- Ultimately as flexible as slidify with a little CSS and HTML knowledge - +RStudio Presenter +=== +author: Brian Caffo, Jeff Leek Roger Peng +date: May 21 2014 +transition: rotate + + +Department of Biostatistics +Bloomberg School of Public Health +Johns Hopkins University +Coursera Data Science Specialization + + + +RStudio Presentation +=== +- RStudio created a presentation authoring tool within their +development environment. +- If you are familiar with slidify, you will also be familiar with this tool + - Code is authored in a generalized markdown format that allows for code chunks + - The output is an html5 presentation + - The file index for the presenter file is .Rpres, which gets converted to an .md file and then to an html file if desired + - There's a preview tool in RStudio and GUIs for publishing to Rpubs or viewing/creating an html file + +Authoring content +=== +- This is a fairly complete guide + - http://www.rstudio.com/ide/docs/presentations/overview +- Quick start is + - `file` then `New File` then `R Presentation` + - (`alt-f` then `f` then `p` if you want key strokes) + - Use basically the same R markdown format for authoring as slidify/knitr + - Single quotes for inline code + - Tripple qutoes for block code + - Same options for code evaluation, caching, hiding etcetera + +Compiling and tools +=== +- R Studio auto formats and runs the code when you save the document +- Mathjax JS library is loaded by default so that `$x^2$` yields $x^2$ +- Slide navigation button on the preview; clicking on the notepad icon takes you to that slide in the deck +- Clicking on `more` yields options for + - Clearning the knitr cache + - Viewing in a browser (creates a temporay html file in `AppData/local/temp` for me) + - Create a html file to save where you want) +- A refresh button +- A zoom button that brings up a full window + +Visuals +=== +transition: linear + +- R Studio has made it easy to get some cool html5 effects, like cube transitions +with simple options in YAML-like code after the first slide such as +`transition: rotate` +- You can specify it in a slide-by-slide basis + +Here's the option "linear" +=== +transition: linear + +- Just put `transition: linear` right after the slide creation (three equal signs or more in a row) +- Tansition options + - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation + +Hierarchical organization +=== +type: section +- If you want a hierarchical organization structure, just add a `type: typename` option after the slide +- This changes the default appearance + - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation +- This is of type `section` + +Here's a subsection +=== +type: subsection + +Two columns +=== +- Do whatever for column one +- Then put `***` on a line by itself with blank lines before and after + +*** + +- Then do whatever for column two + + +Changing the slide font +========================================================== +font-import: http://fonts.googleapis.com/css?family=Risque +font-family: 'Risque' + +- Add a `font-family: fontname` option after the slide + - http://www.rstudio.com/ide/docs/presentations/customizing_fonts_and_appearance +- Specified in the same way as css font families + - http://www.w3schools.com/cssref/css_websafe_fonts.asp +- Use `font-import: url` to import fonts +- Important caveats + - Fonts must be present on the system that you're presenting on, or it will go to a fallback font + - You have to be connected to the internet to use an imported font (so don't rely on this for offline presentations) +- This is the `Risque` + - http://fonts.googleapis.com/css?family=Risque + +Really changing things +=== +- If you know html5 and CSS well, then you can basically change whatever you want +- A css file with the same names as your presentation will be autoimported +- You can use `css: file.css` to import a css file +- You have to create named classes and then use `class: classname` to get slide-specific style control from your css + - (Or you can apply then within a ``) +- Ultimately, you have an html file, that you can edit as you wish + - This should be viewed as a last resort, as the whole point is to have reproducible presentations, but may be the easiest way to get the exact style control you want for a final product + +Slidify versus R Studio Presenter +=== +**Slidify** +- Flexible control from the R MD file +- Under rapid ongoing development +- Large user base +- Lots and lots of styles and options +- Steeper learning curve +- More command-line oriented + +*** +**R Studio Presenter** +- Embedded in R Studio +- More GUI oriented +- Very easy to get started +- Smaller set of easy styles and options +- Default styles look very nice +- Ultimately as flexible as slidify with a little CSS and HTML knowledge + diff --git a/09_DevelopingDataProducts/shiny/index.Rmd b/09_DevelopingDataProducts/shiny/index.Rmd index b9c21160c..6d0c27ec6 100644 --- a/09_DevelopingDataProducts/shiny/index.Rmd +++ b/09_DevelopingDataProducts/shiny/index.Rmd @@ -55,7 +55,7 @@ diabetesRisk <- function(glucose) glucose / 200 - Make sure you have the latest release of R installed - If on windows, make sure that you have Rtools installed - `install.packages("shiny")` -- `libray(shiny)` +- `library(shiny)` - Great tutorial at [http://rstudio.github.io/shiny/tutorial/](http://rstudio.github.io/shiny/tutorial/) - Basically, this lecture is walking through that tutorial offering some of our insights diff --git a/09_DevelopingDataProducts/shiny2/index.Rmd b/09_DevelopingDataProducts/shiny2/index.Rmd index a0894e59d..85c9cfdec 100644 --- a/09_DevelopingDataProducts/shiny2/index.Rmd +++ b/09_DevelopingDataProducts/shiny2/index.Rmd @@ -84,7 +84,7 @@ shinyServer( --- ## Try it * type `runApp()` -* Notice hitting refresh incriments `y` but enterting values in the textbox does not +* Notice hitting refresh increments `y` but enterting values in the textbox does not * Notice `x` is always 1 * Watch how it updated `text1` and `text2` as needed. * Doesn't add 1 to text1 every time a new `text2` is input. @@ -92,7 +92,7 @@ shinyServer( --- ## Reactive expressions -* Sometimes to speed up your app, you want reactive operations (those operations that depend on widget input values) to be performed outside of a `render*`1 statement +* Sometimes to speed up your app, you want reactive operations (those operations that depend on widget input values) to be performed outside of a `render*` statement * For example, you want to do some code that gets reused in several `render*` statements and don't want to recalculate it for each * The `reactive` function is made for this purpose diff --git a/09_DevelopingDataProducts/slidify/index.Rmd b/09_DevelopingDataProducts/slidify/index.Rmd index 9435213c9..df45e66cc 100644 --- a/09_DevelopingDataProducts/slidify/index.Rmd +++ b/09_DevelopingDataProducts/slidify/index.Rmd @@ -85,7 +85,7 @@ author("first_deck") ## Getting to know `index.Rmd` : `YAML` -- `index.Rmd` is the R Markdown document which you will use to compose the conent of your presentation. +- `index.Rmd` is the R Markdown document which you will use to compose the content of your presentation. - The first part of an `index.Rmd` file is a bit of `YAML` code which will look like this: ```YAML diff --git a/09_DevelopingDataProducts/yhat/.gitignore b/09_DevelopingDataProducts/yhat/.gitignore new file mode 100644 index 000000000..a99655351 --- /dev/null +++ b/09_DevelopingDataProducts/yhat/.gitignore @@ -0,0 +1 @@ +AP_example.R diff --git a/09_DevelopingDataProducts/yhat/AP_example.R b/09_DevelopingDataProducts/yhat/AP_example.R deleted file mode 100644 index 2b3667f3b..000000000 --- a/09_DevelopingDataProducts/yhat/AP_example.R +++ /dev/null @@ -1,79 +0,0 @@ -## Create dataset of PM and O3 for all US taking year 2013 (annual -## data from EPA) - -## This uses data from -## http://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/download_files.html - -## Read in the 2013 Annual Data - -d <- read.csv("annual_all_2013.csv", nrow = 68210) -sub <- subset(d, Parameter.Name %in% c("PM2.5 - Local Conditions", "Ozone") - & Pullutant.Standard %in% c("Ozone 8-Hour 2008", "PM25 Annual 2006"), - c(Longitude, Latitude, Parameter.Name, Arithmetic.Mean)) - -pollavg <- aggregate(sub[, "Arithmetic.Mean"], - sub[, c("Longitude", "Latitude", "Parameter.Name")], - mean, na.rm = TRUE) -pollavg$Parameter.Name <- factor(pollavg$Parameter.Name, labels = c("ozone", "pm25")) -names(pollavg)[4] <- "level" - -## Remove unneeded objects -rm(d, sub) - -## Write function -monitors <- data.matrix(pollavg[, c("Longitude", "Latitude")]) - -library(fields) - -pollutant <- function(df) { - x <- data.matrix(df[, c("lon", "lat")]) - r <- df$radius - d <- rdist.earth(monitors, x) - use <- lapply(seq_len(ncol(d)), function(i) { - which(d[, i] < r[i]) - }) - levels <- sapply(use, function(idx) { - with(pollavg[idx, ], tapply(level, Parameter.Name, mean)) - }) - dlevel <- as.data.frame(t(levels)) - data.frame(df, dlevel) -} - -## Send to yhat - -library(yhatr) - -model.require <- function() { - library(fields) -} - -model.transform <- function(df) { - df -} - -model.predict <- function(df) { - pollutant(df) -} - -yhat.config <- c( - username="rdpeng@gmail.com", - apikey="90d2a80bb532cabb2387aa51ac4553cc", - env="http://sandbox.yhathq.com/" -) - -yhat.deploy("pollutant") - - - -################################################################################ -## Client side - -library(yhatr) -yhat.config <- c( - username="rdpeng@gmail.com", - apikey="90d2a80bb532cabb2387aa51ac4553cc", - env="http://sandbox.yhathq.com/" -) -df <- data.frame(lon = c(-76.6167, -118.25), lat = c(39.2833, 34.05), - radius = 20) -yhat.predict("pollutant", df)