forked from jtr13/cc21fall2
-
Notifications
You must be signed in to change notification settings - Fork 0
/
r_vs_python_plots.Rmd
172 lines (107 loc) · 3.55 KB
/
r_vs_python_plots.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# Visualization in R v.s. Python
Zining Chen
```{r ,include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
For data scientists, R studio and Python might be two tools that are most familiar to. However, for people that are previsouly really procifient in Python (like me), it can be a little tough and unfamiliar with the grammar and functioning in R. Since R and Python are holding completely different packages for plotting, I will introduce a collection of the comman usage of the different code and syntax used for data visualization in R and Python. It also works as a cheatsheet for those that come from Python get started in R faster.
## Basic setup
The dataset is retrived from Kaggle. https://www.kaggle.com/anandhuh/covid-in-african-countries-latest-data
Generally, in R studio, the packages used for visualzation is using **ggplot**. And packages in R using `library()`. For example:
```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(dplyr)
library(vcd)
df_r <- read_csv("resources/r_vs_python/covid_africa.csv")
```
In python, plotting can be done using **matplotlib**. And importing packages be like:
![](resources/r_vs_python/pic1.png)
## Histogram
R:
```{r}
#original histogram
hist(df_r$`Total Deaths`, xlab = "cases", main = "Total deaths histogram")
```
```{r}
#basic histogram
ggplot(df_r, aes(x = `Total Deaths`)) +
geom_histogram(color = "white", fill = "lightblue") +
ggtitle("Total deaths histogram") + labs(x = "deaths")
```
Python:
![](resources/r_vs_python/pic2_hist.png)
![](resources/r_vs_python/pic3_hist.png)
## Barplot
R:
```{r}
#basic barplot
barplot(`Total Cases` ~ Country, data = df_r, las=2, main = "Total cases barplot")
```
```{r}
#ggplot barplot
ggplot(df_r, aes(x = Country, y = `Total Cases`)) +
geom_bar(stat='identity', fill = "cornflowerblue") +
ggtitle("Total cases barchart") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
```
Python:
![](resources/r_vs_python/pic4_bar.png)
![](resources/r_vs_python/pic5_bar.png)
## Boxplot
R:
```{r}
#basic boxplot
boxplot(df_r$Population, main = "Population boxplot", xlab = "population")
```
```{r}
#ggplot boxplot
ggplot(df_r, aes(x = Population)) +
geom_boxplot() +
ggtitle("Population boxplot")
```
Python:
![](resources/r_vs_python/pic6_box.png)
## Scatterplot
R:
```{r}
# with a regression line
ggplot(na.omit(df_r), aes(x = `Total Tests`, y =`Population`)) +
geom_point() +
geom_smooth(method=lm, se=FALSE, color="blue")
```
Python:
![](resources/r_vs_python/pic7_scatter.png)
## Parallel Coordinates
R:
```{r}
#choose the first 10 countries for better
GGally::ggparcoord(df_r[1:10,], columns = c(2:5), scale = "globalminmax", groupColumn = "Country") +
xlab("country") + ylab("count")
```
Python:
![](resources/r_vs_python/pic8_parcoord.png)
## Heatmap
We use another dataset as an example.
The dataset is retrived from Kaggle. https://www.kaggle.com/sonukumari47/students-performance-in-exams
R:
```{r message = FALSE, warning=FALSE}
df_r2 <- read_csv("resources/r_vs_python/student_performance.csv")
df_r2 <- df_r2[,-1]
```
Python:
![](resources/r_vs_python/pic9.png)
R:
```{r}
ggplot(df_r2, aes(x = `parental level of education`, y = `race/ethnicity`,fill = `math percentage`)) +
geom_tile() +
scale_fill_viridis_c(direction = -1) + ggtitle("Square heatmap") +
theme(axis.text.x = element_text(angle = 10))
```
Python:
![](resources/r_vs_python/pic10_heat.png)
## Mosaic Plot
R:
```{r}
mosaic(`race/ethnicity`~ sex , df_r2, direction = c("v", "h"))
```
Python:
![](resources/r_vs_python/pic11_mo.png)