-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy path05-evolution.Rmd
2481 lines (1894 loc) · 90.7 KB
/
05-evolution.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Evolution {#evolution}
---
```{r percent-stacked-evolution-intro-image, echo=FALSE, fig.align='center', fig.cap="Multi Group Line Chart with Ggplot2", out.width = '50%'}
knitr::include_graphics("https://www.r-graph-gallery.com/136-stacked-area-chart_files/figure-html/thecode3-1.png")
```
## Area Chart
---
An [area chart](https://www.data-to-viz.com/graph/area.html) represents the evolution of a numeric variable. It is very close to a [line chart](https://www.r-graph-gallery.com/line-plot.html). This section displays many examples build with R and [ggplot2](https://www.r-graph-gallery.com/ggplot2-package.html). Have a look to data-to-viz.com if want to learn more about [line chart](https://www.r-graph-gallery.com/line-plot.html) theory.
#### Note on Area Chart
This section is tightly linked with other sections. A [line chart](https://www.r-graph-gallery.com/line-plot.html) is the same but doesn't fill the surface between the line and the X axis. A [connected scatterplot](https://www.r-graph-gallery.com/connected-scatterplot.html) is almost the same thing, but each observation is represented as a dot. More generally, the [time series](https://www.r-graph-gallery.com/time-series.html) section can interest you.
#### Step by Step with `ggplot2`
`ggplot2` allows to draw line charts thanks to the `geom_line()` function. It expects as input a data frame with 2 numeric variables, one displayed on each axis. Start your journey with the [most basic line chart](https://www.r-graph-gallery.com/line-chart-ggplot2.html).
### Area Chart with R and `ggplot2`
This section is a step by step introduction to [area chart](https://www.r-graph-gallery.com/area-chart.html) with `R` and [ggplot2](https://www.r-graph-gallery.com/ggplot2-package.html). It provides several reproducible examples with explanation and R code.
#### Basic Line Chart with `ggplot2` and `geom_line()`
An [area chart](https://www.r-graph-gallery.com/area-chart.html) displays the evolution of one or several numeric variables. Data points are usually connected by straight line segments, and the area between the X axis and the line is filled. See [data-to-viz](https://www.data-to-viz.com/graph/line.html) for a more in-depth definition.
As for a line chart, the input data frame requires at least 2 columns:
* An *ordered* numeric variable for the X axis
* Another numeric variable for the Y axis
Once the data is read by ggplot2 and those 2 variables are specified in the x and y arguments of the `aes()`, just call the `geom_area()` function.
```{r area-chart-evolution, echo=TRUE, message=FALSE, warning=FALSE}
# Libraries
library(ggplot2)
# create data
xValue <- 1:50
yValue <- cumsum(rnorm(50))
data <- data.frame(xValue,yValue)
# Plot
ggplot(data, aes(x=xValue, y=yValue)) +
geom_area()
```
### Customize the Line Chart
Several options are available to customize the area chart appearance:
* Add a title with `ggtitle()`.
* Change line style with arguments like `shape`, `size`, `color` and more.
* Add transparency to the filled region with opacity
* Custom the general theme with the `theme_ipsum()` function of the `hrbrthemes` package.
* Highlight the top line with `geom_line()`
* If not too many observation, show individual data points with `geom_point()`
More generally, visit the [ggplot2 section](https://www.r-graph-gallery.com/ggplot2-package.html) for more ggplot2 related stuff.
```{r area-chart-evolution-customize, echo=TRUE, message=FALSE, warning=FALSE}
# Libraries
library(ggplot2)
library(hrbrthemes)
# create data
xValue <- 1:10
yValue <- abs(cumsum(rnorm(10)))
data <- data.frame(xValue,yValue)
# Plot
ggplot(data, aes(x=xValue, y=yValue)) +
geom_area( fill="#69b3a2", alpha=0.4) +
geom_line(color="#69b3a2", size=2) +
geom_point(size=3, color="#69b3a2") +
theme_ipsum() +
ggtitle("Evolution of something")
```
### Basic Stacked Area Chart with R
This post provides the basics concerning [stacked area chart](https://www.r-graph-gallery.com/stacked-area-graph.html) with R and `ggplot2`. It takes into account several input format types and show how to customize the output.
#### Most Basic Stacked Area with `ggplot2`
The data frame used as input to build a [stacked area chart](https://www.r-graph-gallery.com/stacked-area-graph.html) requires 3 columns:
* `x`: Numeric variable used for the X axis, often it is a time.
* `y`: Numeric variable used for the Y axis. What are we looking at?
* `group`: One shape will be done per group.
The chart is built using the `geom_area()` function.
```{r stacked-area-evolution-basic, echo=TRUE, message=FALSE, warning=FALSE}
# Packages
library(ggplot2)
library(dplyr)
# create data
time <- as.numeric(rep(seq(1,7),each=7)) # x Axis
value <- runif(49, 10, 100) # y Axis
group <- rep(LETTERS[1:7],times=7) # group, one shape per group
data <- data.frame(time, value, group)
# stacked area chart
ggplot(data, aes(x=time, y=value, fill=group)) +
geom_area()
```
### Control Stacking Order with `ggplot2`
The gallery offers a section dedicated to [reordering with ggplot2](https://www.r-graph-gallery.com/267-reorder-a-variable-in-ggplot2.html). This step can be tricky but the code below shows how to:
* Give a specific order with the `factor()` function.
* Order alphabetically using `sort()`.
* Order following values at a specific data.
```{r stacked-area-evolution-stacking-order, echo=TRUE, message=FALSE, warning=FALSE}
# Give a specific order:
data$group <- factor(data$group , levels=c("B", "A", "D", "E", "G", "F", "C") )
# Plot again
ggplot(data, aes(x=time, y=value, fill=group)) +
geom_area()
# Note: you can also sort levels alphabetically:
myLevels <- levels(data$group)
data$group <- factor(data$group , levels=sort(myLevels) )
# Note: sort followinig values at time = 5
myLevels <- data %>%
filter(time==6) %>%
arrange(value)
data$group <- factor(data$group , levels=myLevels$group )
```
### Proportional Stacked Area Chart
In a proportional stacked area graph, the sum of each year is always equal to hundred and value of each group is represented through percentages.
To make it, you have to calculate these percentages first. This can be done using `dplyr` of with `base R`.
```{r stacked-area-evolution-proportional, echo=TRUE, message=FALSE, warning=FALSE}
# Compute percentages with dplyr
library(dplyr)
data <- data %>%
group_by(time, group) %>%
summarise(n = sum(value)) %>%
mutate(percentage = n / sum(n))
# Plot
ggplot(data, aes(x=time, y=percentage, fill=group)) +
geom_area(alpha=0.6 , size=1, colour="black")
# Note: compute percentages without dplyr:
my_fun <- function(vec){
as.numeric(vec[2]) / sum(data$value[data$time==vec[1]]) *100
}
data$percentage <- apply(data , 1 , my_fun)
```
### Color & Style
Let's improve the chart general appearance:
* Usage of the `viridis` color scale.
* `theme_ipsum` of the hrbrthemes package.
* Add title with `ggtitle`.
```{r stacked-area-evolution-color-and-style, echo=TRUE, message=FALSE, warning=FALSE}
# Library
library(viridis)
library(hrbrthemes)
# Plot
ggplot(data, aes(x=time, y=value, fill=group)) +
geom_area(alpha=0.6 , size=.5, colour="white") +
scale_fill_viridis(discrete = T) +
theme_ipsum() +
ggtitle("The race between ...")
```
### Stacking Area Charts
#### What is Stacking
`Stacking` is a process where a chart is broken up across more than one categoric variables which make up the whole. Each item of the categoric variable is represented by a shaded area. These areas are stacked on top of one another.
Here is an example with a [stacked area chart](https://www.data-to-viz.com/graph/stackedarea.html). It shows the evolution of baby name occurence in the US between 1880 and 2015. Six first names are represented on top of one another.
```{r stacked-area-evolution-stacking, echo=TRUE, message=FALSE, warning=FALSE, fig.height=5, fig.width=5}
# Libraries
library(tidyverse)
library(babynames)
library(streamgraph)
library(viridis)
library(hrbrthemes)
library(plotly)
# Load dataset from github
data <- babynames %>%
filter(name %in% c("Amanda", "Jessica", "Patricia", "Deborah", "Dorothy", "Helen")) %>%
filter(sex=="F")
# Plot
p <- data %>%
ggplot( aes(x=year, y=n, fill=name, text=name)) +
geom_area( ) +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
theme(legend.position="none")
ggplotly(p, tooltip="text")
```
#### Heaven or Hell?
The efficiency of stacked area graph is discussed and it must be used with care. To put it in a nutshell:
* Stacked graphs are `appropriate` to study the `evolution of the whole` and the `relative proportions` of each group. Indeed, the top of the areas allows to visualize how the whole behaves, like for a classic area chart. In the previous graphic, it is easy to see that in 1920, Helen and Dorothy were common names but the 4 other names barely existed.
* However, they are `not appropriate` to study the `evolution of each` individual group. This is due to 2 main reasons. First, all except the since they do not have a flat `baseline`, it is very hard to read their values at each tile stamp.
### Example: Mental Arithmetic
In the previous graphic, try to find out how many times the name Dorothy was given in 1920.
It is not trivial to find it out using the previous chart. You have to mentally do 75000 - 37000 which is hard. If you want to convey a message efficiently, you don't want the audience to perform mental arithmetic.
#### Example: Optical Illusion.
> Important note: this section is inspired from this post by Dr. Drang.
Dr Drang gives this nice example. Consider the graphic below, and try to visualize how the 3 categories evolved on the period:
```{r stacked-area-evolution-stacking-2, echo=TRUE, message=FALSE, warning=FALSE}
# create dummy data
don <- data.frame(
x = rep(seq(2000,2005), 3),
value = c( 75, 73, 68, 57, 36, 0, 15, 16, 17, 18, 19, 20, 10, 11, 15, 25, 45, 80),
group = rep(c("A", "B", "C"), each=6)
)
#plot
don %>%
ggplot( aes(x=x, y=value, fill=group)) +
geom_area( ) +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
theme_ipsum() +
theme(legend.position="none")
```
It looks obvious that the yellow category increased, the purple decreased, and the green. is harder to read. At a first glance it looks like it is slightly decreasing I would say.
Now let's plot just the green group to find out:
```{r stacked-area-evolution-stacking-3, echo=TRUE, message=FALSE, warning=FALSE}
#plot
don %>%
filter(group=="B") %>%
ggplot( aes(x=x, y=value, fill=group)) +
geom_area( fill="#22908C") +
theme(legend.position="none") +
theme_ipsum() +
theme(legend.position="none")
```
#### Workaround
If you have just a `few categories`, I would suggest to build a line chart. Here it is easy to follow a category and understand how it evolved accurately.
```{r stacked-area-evolution-stacking-4, echo=TRUE, message=FALSE, warning=FALSE}
data %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum()
```
However, this solution is not suitable if you have `many categories`. Indeed, it would result in a spaghetti chart that is very hard to read. You can read more about this here.
Instead I would suggest to use `small multiple: here each category has its own section in the graphic. It makes easy to understand the pattern of each category.
```{r stacked-area-evolution-stacking-5, echo=TRUE, message=FALSE, warning=FALSE}
data %>%
ggplot( aes(x=year, y=n, group=name, fill=name)) +
geom_area() +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
) +
facet_wrap(~name, scale="free_y")
```
#### Going Further
* Stacked Area Graphs Are Not Your Friend by [Everyday analytics](http://www.everydayanalytics.ca/2014/08/stacked-area-graphs-are-not-your-friend.html).
* Quantitative Displays for Combining Time-Series and Part-to-Whole Relationships by [Stephen Few](http://www.perceptualedge.com/articles/visual_business_intelligence/displays_for_combining_time-series_and_part-to-whole.pdf).
* I hate stacked area charts by [Dr Drang](http://www.leancrew.com/all-this/2011/11/i-hate-stacked-area-charts/).
### Interactive Area Chart with R and plotly
The `plotly` package allows to build interactive charts directly from R. Here is a application to [area chart](https://www.r-graph-gallery.com/area-chart.html), using both the `plot_ly()` and `ggplotly()` functions.
The `ggplotly()` function of the `plotly` library makes it a breeze to build an interactive version. Try to hover circles to get a tooltip, or select an area of interest for zooming. Double click to reinitialize.
### Basic Line Chart with ggplot2 and `geom_line()`
Base R also allows to build area charts thanks to the `polygon()` function. This functions requires 2 inputs: x and y.
Note that extreme values of both are added at the beginning and at the end of each vectors to make sure the polygon is closed.
This section describes how to build an area chart using base R and the `polygon()` function. See the area chart section for a ggplot2 implementation.
```{r line-chart-ggplot-evolution-basic, echo=TRUE, message=FALSE, warning=FALSE}
# Create data
data <- data.frame(
x=seq(1,10),
y=sample(seq(1,15),10)
)
# Add line on top
plot( data$x , data$y , col=rgb(0.2,0.1,0.5,0.9) , type="o" , lwd=3 , xlab="" , ylab="size" , pch=20)
# Fill the area
polygon(
c(min(data$x), data$x , max(data$x)) ,
c( min(data$y) , data$y , min(data$y)) ,
col=rgb(0.2,0.1,0.5,0.2) , border=F
)
```
## Line Chart
---
This is the [line chart](https://www.data-to-viz.com/graph/line.html) section of the gallery. If you want to know more about this kind of chart, visit [data-to-viz.com](https://www.data-to-viz.com/graph/line.html). If you're looking for a simple way to implement it in R, pick an example below.
#### Note on Line Chart
This section is tightly linked with other sections. A [connected scatterplot](https://www.r-graph-gallery.com/connected-scatterplot.html) is almost the same thing, but each observation is represented as a dot. An [area chart](https://www.r-graph-gallery.com/area-chart.html) fills the surface between the line and the X axis. More generally, the [time series](https://www.r-graph-gallery.com/time-series.html) section can interest you.
#### Step by Step with `ggplot2`
`ggplot2` allows to draw line charts thanks to the `geom_line()` function. It expects as input a data frame with 2 numeric variables, one displayed on each axis. Start your journey with the [most basic line chart](https://www.r-graph-gallery.com/line-chart-ggplot2.html).
### Line Chart with R and `ggplot2`
This section is a step by step introduction to [line chart](https://www.r-graph-gallery.com/line-plot.html) with R and ggplot2. It provides several reproducible examples with explanation and `R` code.
#### Basic Line Chart with `ggplot2` and `geom_line()`
A [line chart](https://www.r-graph-gallery.com/line-plot.html) or [line graph](https://www.r-graph-gallery.com/line-plot.html) displays the evolution of one or several numeric variables. Data points are usually connected by straight line segments. You read an extensive definition [here](https://www.data-to-viz.com/graph/line.html).
The input data frame requires at least 2 columns:
* An *ordered* numeric variable for the X axis.
* Another numeric variable for the Y axis.
Once the data is read by ggplot2 and those 2 variables are specified in the `x` and `y` arguments of the `aes()`, just call the `geom_line()` function.
```r
# Libraries
library(ggplot2)
# create data
xValue <- 1:10
yValue <- cumsum(rnorm(10))
data <- data.frame(xValue,yValue)
# Plot
ggplot(data, aes(x=xValue, y=yValue)) +
geom_line()
```
![](https://www.r-graph-gallery.com/line-chart-ggplot2_files/figure-html/thecode-1.png)
### Customize the Line Chart
Several options are available to customize the line chart appearance:
* Add a title with `ggtitle()`.
* Change line style with arguments like `shape`, `size`, `color` and more.
* Custom the general theme with the `theme_ipsum()` function of the `hrbrthemes` package.
More generally, visit the [ggplot2](https://www.r-graph-gallery.com/ggplot2-package.html) section for more ggplot2 related stuff.
```r
# Libraries
library(ggplot2)
library(hrbrthemes)
# create data
xValue <- 1:10
yValue <- cumsum(rnorm(10))
data <- data.frame(xValue,yValue)
# Plot
ggplot(data, aes(x=xValue, y=yValue)) +
geom_line( color="#69b3a2", size=2, alpha=0.9, linetype=2) +
theme_ipsum() +
ggtitle("Evolution of something")
```
![](https://www.r-graph-gallery.com/line-chart-ggplot2_files/figure-html/thecode2-1.png)
### Connected scatterplot with R and ggplot2
This section explains how to build a basic [connected scatterplot](https://www.r-graph-gallery.com/connected-scatterplot.html) with R and [ggplot2](https://www.r-graph-gallery.com/ggplot2-package.html). It provides several reproducible examples with explanation and R code.
### Most Basic Connected Scatterplot: `geom_point()` and `geom_line()`
A [connected scatterplot](https://www.r-graph-gallery.com/connected-scatterplot.html) is basically a hybrid between a [scatterplot](https://www.r-graph-gallery.com/scatterplot.html) and a [line plot](https://www.r-graph-gallery.com/line-plot.html). Thus, you just have to add a `geom_point()` on top of the `geom_line()` to build it.
```{r basic-conntected-scatterplot-geompoint, echo=TRUE, message=FALSE, warning=FALSE}
# Libraries
library(ggplot2)
library(dplyr)
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/3_TwoNumOrdered.csv", header=T)
data$date <- as.Date(data$date)
# Plot
data %>%
tail(10) %>%
ggplot( aes(x=date, y=value)) +
geom_line() +
geom_point()
```
### Customize the Connected Scatterplot
Custom the general theme with the `theme_ipsum()` function of the `hrbrthemes` package. Add a title with `ggtitle()`. Custom circle and line with arguments like `shape`, `size`, `color` and more.
```r
# Libraries
library(ggplot2)
library(dplyr)
library(hrbrthemes)
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/3_TwoNumOrdered.csv", header=T)
data$date <- as.Date(data$date)
# Plot
data %>%
tail(10) %>%
ggplot( aes(x=date, y=value)) +
geom_line( color="grey") +
geom_point(shape=21, color="black", fill="#69b3a2", size=6) +
theme_ipsum() +
ggtitle("Evolution of bitcoin price")
```
![](https://www.r-graph-gallery.com/connected_scatterplot_ggplot2_files/figure-html/thecode2-1.png)
### Connected Scatterplot to Show an Evolution
The [connected scatterplot](https://www.r-graph-gallery.com/connected-scatter.html) can also be a powerfull technique to tell a story about the evolution of 2 variables. Let???s consider a dataset composed of 3 columns:
* Year
* Number of baby born called Amanda this year
* Number of baby born called Ashley
The scatterplot beside allows to understand the evolution of these 2 names. Note that the code is pretty different in this case. `geom_segment()` is used of `geom_line()`. This is because `geom_line()` automatically sort data points depending on their X position to link them.
It makes sense to add arrows and labels to guide the reader in the chart:
```r
# Libraries
library(ggplot2)
library(dplyr)
library(babynames)
library(ggrepel)
library(tidyr)
# data
data <- babynames %>%
filter(name %in% c("Ashley", "Amanda")) %>%
filter(sex=="F") %>%
filter(year>1970) %>%
select(year, name, n) %>%
spread(key = name, value=n, -1)
# plot
data %>%
ggplot(aes(x=Amanda, y=Ashley, label=year)) +
geom_point() +
geom_segment(aes(
xend=c(tail(Amanda, n=-1), NA),
yend=c(tail(Ashley, n=-1), NA)
)
)
```
<center>
![](https://www.r-graph-gallery.com/connected_scatterplot_ggplot2_files/figure-html/thecode3-1.png){width=75%}
</center>
### Line Plot with Log Scale
This section explaines how to build a line chart with a log scale for its Y axis, using the `scale_y_log10()` function. It is sometimes useful to use a log scale for a numeric variable. Indeed, it allows to magnify the lower part of the curve.
This is possible thanks to the `scale_y_log10()` function. Control the horizontal grid lines with `breaks`, and the axis limits with `limits`. (Note that 0 is not allowed, since `log(0)` is not defined.)
```r
# Library
library(ggplot2)
# Create dummy data
data <- data.frame(
x=seq(10,100),
y=seq(10,100)/2+rnorm(90)
)
# Make the plot
ggplot(data, aes(x=x, y=y)) +
geom_line() +
scale_y_log10( breaks=c(1,5,10,15,20,50,100), limits=c(1,100) )
```
<center>
![](https://www.r-graph-gallery.com/connected_scatterplot_ggplot2_files/figure-html/thecode4-1.png){width=75%}
</center>
### Without Log Transform
This is the same chart without the log transform:
```r
# Library
library(ggplot2)
# Create dummy data
data <- data.frame(
x=seq(10,100),
y=seq(10,100)/2+rnorm(90)
)
# Make the plot
ggplot(data, aes(x=x, y=y)) +
geom_line()
```
![](https://www.r-graph-gallery.com/line-plot-log-scale_files/figure-html/thecode2-1.png)
### Multi Groups Line Chart with ggplot2
This section explains how to build a [line chart](https://www.r-graph-gallery.com/line-plot.html) that represents several groups with [ggplot2](https://www.r-graph-gallery.com/ggplot2-package.html). It provides several examples with explanation and reproducible code.
#### Basic Version
If you're not familiar with the `geom_line()` function, you should probably have a look to the [most basic line chart](https://www.r-graph-gallery.com/line-chart-ggplot2.html) first.
Here, the input data frame is composed by 3 columns:
* An *ordered* numeric variable for the X axis
* Another numeric variable for the Y axis
* A categorical variable that specify the group of the observation
The idea is to draw one line per group. This is doable by specifying a different color to each group with the `color` argument of `ggplot2`.
```r
# Libraries
library(ggplot2)
library(babynames) # provide the dataset: a dataframe called babynames
library(dplyr)
# Keep only 3 names
don <- babynames %>%
filter(name %in% c("Ashley", "Patricia", "Helen")) %>%
filter(sex=="F")
# Plot
don %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line()
```
<center>
![](https://www.r-graph-gallery.com/line-chart-several-groups-ggplot2_files/figure-html/thecode-1.png){width=75%}
</center>
### Customize the Grouped Line Chart
Several options are available to customize the line chart appearance:
* Add a title with `ggtitle()`.
* Change line style with arguments like `shape`, `size`, `color` and more.
* Use the `viridis` package to get a nice color palette.
* Custom the general theme with the `theme_ipsum()` function of the `hrbrthemes` package.
More generally, visit the [ggplot2 section](https://www.r-graph-gallery.com/ggplot2-package.html) for more ggplot2 related stuff.
```r
# Libraries
library(ggplot2)
library(babynames) # provide the dataset: a dataframe called babynames
library(dplyr)
library(hrbrthemes)
library(viridis)
# Keep only 3 names
don <- babynames %>%
filter(name %in% c("Ashley", "Patricia", "Helen")) %>%
filter(sex=="F")
# Plot
don %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
ylab("Number of babies born")
```
<center>
![](https://www.r-graph-gallery.com/line-chart-several-groups-ggplot2_files/figure-html/thecode2-1.png){width=75%}
</center>
#### Notes
* Read more about line charts theory [here](https://www.data-to-viz.com/graph/line.html).
* Be careful: a line chart with too many groups results in a [spaghetti chart](https://www.data-to-viz.com/caveat/spaghetti.html), which is a [bad practice](https://www.data-to-viz.com/caveats.html).
* Visit [data-to-viz](https://www.data-to-viz.com/) for line chart alternatives.
### Linear Model and Confidence Interval in ggplot2
Display the result of a linear model and its confidence interval on top of a [scatterplot](https://www.r-graph-gallery.com/scatterplot.html). A [ggplot2](https://www.r-graph-gallery.com/ggplot2-package.html) implementation with reproducible code.
#### Linear Trend
Adding a linear trend to a [scatterplot](https://www.r-graph-gallery.com/scatterplot.html) helps the reader in seeing patterns. `ggplot2` provides the `geom_smooth()` function that allows to add the linear trend and the confidence interval around it if needed (option `se=TRUE`).
<u>Note</u>: the `method` argument allows to apply different smoothing method like `glm`, `loess` and more. See the [doc](https://ggplot2.tidyverse.org/reference/geom_smooth.html) for more.
```r
# Library
library(ggplot2)
library(hrbrthemes)
# Create dummy data
data <- data.frame(
cond = rep(c("condition_1", "condition_2"), each=10),
my_x = 1:100 + rnorm(100,sd=9),
my_y = 1:100 + rnorm(100,sd=16)
)
# Basic scatter plot.
p1 <- ggplot(data, aes(x=my_x, y=my_y)) +
geom_point( color="#69b3a2") +
theme_ipsum()
# with linear trend
p2 <- ggplot(data, aes(x=my_x, y=my_y)) +
geom_point() +
geom_smooth(method=lm , color="red", se=FALSE) +
theme_ipsum()
# linear trend + confidence interval
p3 <- ggplot(data, aes(x=my_x, y=my_y)) +
geom_point() +
geom_smooth(method=lm , color="red", fill="#69b3a2", se=TRUE) +
theme_ipsum()
```
```r
p1
```
![](https://www.r-graph-gallery.com/50-51-52-scatter-plot-with-ggplot2_files/figure-html/unnamed-chunk-1-1.png)
```r
p2
```
![](https://www.r-graph-gallery.com/50-51-52-scatter-plot-with-ggplot2_files/figure-html/unnamed-chunk-2-1.png)
```r
p3
```
![](https://www.r-graph-gallery.com/50-51-52-scatter-plot-with-ggplot2_files/figure-html/unnamed-chunk-3-1.png)
### Line Chart Annotation with ggplot2
Annotation is a crucial part of a [time series](https://www.r-graph-gallery.com/time-series.html) visual. This section shows how to highlight main parts of a [line chart](https://www.r-graph-gallery.com/line-plot.html) with text, circles, lines and more.
The `ggplot2` package recognizes the `date` format and automatically uses a specific type of X axis. If the time variable isn't at the `date` format, this won't work. Always check with `str(data)` how variables are understood by R. If not read as a date, use [lubridate](https://www.r-graph-gallery.com/time-series.html) to convert it. Read more about this [here](https://www.r-graph-gallery.com/time-series.html).
On the chart beside, dates are displayed using a neat format: month + year.
<u>Note</u>: the gallery offers a section dedicated to [line charts](https://www.r-graph-gallery.com/line-plot.html).
```r
# Libraries
library(ggplot2)
library(dplyr)
library(plotly)
library(hrbrthemes)
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/3_TwoNumOrdered.csv", header=T)
data$date <- as.Date(data$date)
# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2") +
ylim(0,22000) +
annotate(geom="text", x=as.Date("2017-01-01"), y=20089,
label="Bitcoin price reached 20k $\nat the end of 2017") +
annotate(geom="point", x=as.Date("2017-12-17"), y=20089, size=10, shape=21, fill="transparent") +
geom_hline(yintercept=5000, color="orange", size=.5) +
theme_ipsum()
```
<center>
![](https://www.r-graph-gallery.com/line_chart_annotation_files/figure-html/thecode-1.png){width=75%}
</center>
### Dual Y Axis with R and `ggplot2`
This section describes how to build a dual Y axis chart using `R` and `ggplot2`. It uses the `sec.axis` attribute to add the second Y axis. Note that this kind of chart has major [drawbacks](https://www.data-to-viz.com/caveats.html). Use it with care.
#### Visualizing 2 Series with `R` and `ggplot2`
Let's consider a dataset with 3 columns:
* `date`
* first series to display: fake `temperature`. Range from 0 to 10.
* second series: fake `price`. Range from 0 to 100.
One could easily build 2 line charts to study the evolution of those 2 series using the code below.
But even if strongly [unadvised](https://www.data-to-viz.com/caveats.html), one sometimes wants to display both series on the same chart, thus needing a second Y axis.
```r
# Libraries
library(ggplot2)
library(dplyr)
library(patchwork) # To display 2 charts together
library(hrbrthemes)
# Build dummy data
data <- data.frame(
day = as.Date("2019-01-01") + 0:99,
temperature = runif(100) + seq(1,100)^2.5 / 10000,
price = runif(100) + seq(100,1)^1.5 / 10
)
# Most basic line chart
p1 <- ggplot(data, aes(x=day, y=temperature)) +
geom_line(color="#69b3a2", size=2) +
ggtitle("Temperature: range 1-10") +
theme_ipsum()
p2 <- ggplot(data, aes(x=day, y=price)) +
geom_line(color="grey",size=2) +
ggtitle("Price: range 1-100") +
theme_ipsum()
# Display both charts side by side thanks to the patchwork package
p1 + p2
```
![](https://www.r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2_files/figure-html/thecode-1.png)
### Adding a Second Y Axis with `sec.axis()`: The Idea
`sec.axis()` does not allow to build an entirely new Y axis. It just builds a second Y axis based on the first one, applying a mathematical transformation.
In the example below, the second Y axis simply represents the first one multiplied by 10, thanks to the `trans` argument that provides the `~.*10` mathematical statement.
Note that because of that you can't easily control the second axis lower and upper boundaries. We'll see a trick below in the tweaking section.
```r
# Start with a usual ggplot2 call:
ggplot(data, aes(x=day, y=temperature)) +
# Custom the Y scales:
scale_y_continuous(
# Features of the first axis
name = "First Axis",
# Add a second axis and specify its features
sec.axis = sec_axis( trans=~.*10, name="Second Axis")
) +
theme_ipsum()
```
![](https://www.r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2_files/figure-html/thecode1-1.png)
### Show 2 Series on the Same Line Chart with `sec.axis()`
We can use this sec.axis mathematical transformation to display 2 series that have a different range.
Since the price has a maximum value that is 10 times biggeer than the maximum temperature:
* The second Y axis is like the first multiplied by 10 (`trans=~.*10`).
* The value be display in the second variable `geom_line()` call must be divided by 10 to mimic the range of the first variable.
```r
# Value used to transform the data
coeff <- 10
ggplot(data, aes(x=day)) +
geom_line( aes(y=temperature)) +
geom_line( aes(y=price / coeff)) + # Divide by 10 to get the same range than the temperature
scale_y_continuous(
# Features of the first axis
name = "First Axis",
# Add a second axis and specify its features
sec.axis = sec_axis(~.*coeff, name="Second Axis")
)
```
![](https://www.r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2_files/figure-html/thecode2-1.png)
### Dual Y Axis Customization with `ggplot2`
A few usual tricks to make the chart looks better:
* `ipsum` theme to remove the black background and improve the general style.
* Add a title.
* Customize the Y axes to pair them with their related line.
```r
# Value used to transform the data
coeff <- 10
# A few constants
temperatureColor <- "#69b3a2"
priceColor <- rgb(0.2, 0.6, 0.9, 1)
ggplot(data, aes(x=day)) +
geom_line( aes(y=temperature), size=2, color=temperatureColor) +
geom_line( aes(y=price / coeff), size=2, color=priceColor) +
scale_y_continuous(
# Features of the first axis
name = "Temperature (Celsius )",
# Add a second axis and specify its features
sec.axis = sec_axis(~.*coeff, name="Price ($)")
) +
theme_ipsum() +
theme(
axis.title.y = element_text(color = temperatureColor, size=13),
axis.title.y.right = element_text(color = priceColor, size=13)
) +
ggtitle("Temperature down, price up")
```
![](https://www.r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2_files/figure-html/thecode3-1.png)
### Barplot with Overlapping Line Chart
It is totally possible to usee the same tricks with other `geoms`.
Here is an example displaying a [line chart](https://www.r-graph-gallery.com/line-plot.html) on top of a [barplot](https://www.r-graph-gallery.com/barplot.html).
```r
# Value used to transform the data
coeff <- 10
# A few constants
temperatureColor <- "#69b3a2"
priceColor <- rgb(0.2, 0.6, 0.9, 1)
ggplot(head(data, 80), aes(x=day)) +
geom_bar( aes(y=temperature), stat="identity", size=.1, fill=temperatureColor, color="black", alpha=.4) +
geom_line( aes(y=price / coeff), size=2, color=priceColor) +
scale_y_continuous(
# Features of the first axis
name = "Temperature (Celsius )",
# Add a second axis and specify its features
sec.axis = sec_axis(~.*coeff, name="Price ($)")
) +
theme_ipsum() +
theme(
axis.title.y = element_text(color = temperatureColor, size=13),
axis.title.y.right = element_text(color = priceColor, size=13)
) +
ggtitle("Temperature down, price up")
```
![](https://www.r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2_files/figure-html/thecode4-1.png)
### Line Chart with Error Envelop: ggplot2 and geom_ribbon()
This section explains how to add an error envelop around a [line chart](https://www.r-graph-gallery.com/line-plot.html) using ggplot2 and the `geom_ribbon()` function.
#### Basic Line Chart with `ggplot2` and `geom_line()`
This graph has been made by [Alastair Sanderson](http://www.alastairsanderson.com/). You can have a look to his gallery [here](http://www.sr.bham.ac.uk/~ajrs/R/r-gallery.html).
It shows mean temperature profiles and their error envelopes, using the [ggplot2](https://www.r-graph-gallery.com/ggplot2-package.html) package and its `geom_ribbon()` function.
Note that `geom_ribbon()` is used since upper and lower values of the envelop are available in the input data. As an alternative, the [geom_smooth](https://www.r-graph-gallery.com/50-51-52-scatter-plot-with-ggplot2.html) function automatically draw an error envelop using different statistical models.
```r
library(ggplot2)
# Get the data from the web !
CC <- read.table("http://www.sr.bham.ac.uk/~ajrs/papers/sanderson06/mean_Tprofile-CC.txt" , header=TRUE)
nCC <- read.table("http://www.sr.bham.ac.uk/~ajrs/papers/sanderson06/mean_Tprofile-nCC.txt" , header=TRUE)
CC$type <- "Cool core"
nCC$type <- "Non-cool core"
A <- rbind(CC, nCC)
# Make the plot
ggplot(data=A, aes(x=r.r500, y=sckT, ymin=sckT.lo, ymax=sckT.up, fill=type, linetype=type)) +
geom_line() +
geom_ribbon(alpha=0.5) +
scale_x_log10() +
scale_y_log10() +
xlab(as.expression(expression( paste("Radius (", R[500], ")") ))) +
ylab("Scaled Temperature")
```
![](https://www.r-graph-gallery.com/104-plot-lines-with-error-envelopes-ggplot2_files/figure-html/thecode-1.png)
### Spaghetti Plot
A Spaghetti plot is a [line plot](https://www.data-to-viz.com/graph/line.html) with many lines displayed together. With more than a few (~5?) groups this kind of graphic gets really hard to read, and thus provides little insight about the data. Let's make an example with the [evolution of baby names](http://www.data-to-viz.com/story/OneCatSevOrderedNum.html) in the US from 1880 to 2015.
#### Mind the Spaghetti (`ggplot2`)
When too many groups are displayed on the same line chart it gets very hard to get insight from the figure. This is commonly called a [spaghetti chart](https://www.data-to-viz.com/caveat/spaghetti.html). Here are a few alternatives using [ggplot2](https://www.r-graph-gallery.com/ggplot2-package.html): annotation and small multiple.
```{r spaghetti-plot-evolution, echo=TRUE, message=FALSE, warning=FALSE}
# Libraries
library(tidyverse)
library(hrbrthemes)
library(kableExtra)
options(knitr.table.format = "html")
library(babynames)
library(streamgraph)
library(viridis)
library(DT)
library(plotly)
# Load dataset from github
data <- babynames %>%
filter(name %in% c("Mary","Emma", "Ida", "Ashley", "Amanda", "Jessica", "Patricia", "Linda", "Deborah", "Dorothy", "Betty", "Helen")) %>%
filter(sex=="F")
# Plot
data %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
theme(
legend.position="none",
plot.title = element_text(size=14)
) +
ggtitle("A spaghetti chart of baby names popularity") +
theme_ipsum()
```
It is very hard to follow a line to understand the evolution of a specific name's popularity. Plus, even if you manage to follow a line, you then need to link it with the legend which is even harder. Let's try to find a few workarounds to improve this graphic.
#### Target a Specific Group
Let's say you plot many groups, but the actual reason for that is to explain the feature of one particular group compared to the others. Then a good workaround is to highlight this group: make it appear different, and give it a proper annotation. Here, the evolution of Amanda's popularity is obvious. Leaving the other lines is important since it allows you to compare Amanda to all other names.
```{r spaghetti-plot-evolution-2, echo=TRUE, message=FALSE, warning=FALSE}
data %>%
mutate( highlight=ifelse(name=="Amanda", "Amanda", "Other")) %>%
ggplot( aes(x=year, y=n, group=name, color=highlight, size=highlight)) +
geom_line() +
scale_color_manual(values = c("#69b3a2", "lightgrey")) +
scale_size_manual(values=c(1.5,0.2)) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
geom_label( x=1990, y=55000, label="Amanda reached 3550\nbabies in 1970", size=4, color="#69b3a2") +
theme(
legend.position="none",
plot.title = element_text(size=14)
)
```
#### Use Small Multiples
[Area charts](https://www.data-to-viz.com/graph/area.html) can be used to give a more general overview of the dataset, especially when used in combination with small multiples. In the following chart, it is easy to get a glimpse of the evolution of any name:
```{r spaghetti-plot-evolution-small-multiples-area-chart, echo=TRUE, message=FALSE, warning=FALSE}
data %>%
ggplot( aes(x=year, y=n, group=name, fill=name)) +
geom_area() +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8),
plot.title = element_text(size=14)
) +
facet_wrap(~name)
```
For instance, `Linda` was a really popular name for a really short period of time. On another hand, Ida has never been very popular, but was used a little during several decades.
#### Combine Approaches
Note that if you want to compare the evolution of each line compared to the others, you can combine both approaches: