forked from jtr13/cc21fall2
-
Notifications
You must be signed in to change notification settings - Fork 0
/
python_vs_r_graph.Rmd
110 lines (85 loc) · 3.36 KB
/
python_vs_r_graph.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# Data Visualization in Python vs R
Xingyu Wei
```{r , include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Description:
We will show how to plot graphs in Python and compare data visualization between Python ('Matplotlib.pyplot' & 'Seaborn') and R ('ggplot2'), illustrated by an example. We use the same data set and the same types of graphs to show the code and output differences between Python and R.
## Python part:
Link: https://github.com/xxxxy9/5702CC
## R part:
```{r}
# install.packages("jsonlite", repos="https://cran.rstudio.com/")
library("jsonlite") # must be installed from source
library(ggplot2) # For visualization
```
### Data:<br />
Source: https://datahub.io/machine-learning/iris#data
```{r}
# Clean the environment
rm(list=ls())
json_file <- 'https://datahub.io/machine-learning/iris/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
for(i in 1:length(json_data$resources$datahub$type)){
if(json_data$resources$datahub$type[i]=='derived/csv'){
path_to_file = json_data$resources$path[i]
df <- read.csv(url(path_to_file))
}
}
df['index']<-1:150
```
### The same five graphs as Python Part, Section 1.4
1. Heatmap<br />
Source: http://www.sthda.com/english/wiki/ggplot2-quick-correlation-matrix-heatmap-r-software-and-data-visualization
```{r,fig_width=17, fig_height = 12}
#create correlation matrix
library(reshape)
cormatrix <- round(cor(df[1:4]),2)
melted_cormat <- melt(cormatrix)
#plot heatmap
ggplot(data = melted_cormat, aes(x=X1, y=X2, fill=value)) +
geom_tile()+
scale_fill_viridis_c(alpha = 1) +
theme(axis.text.x = element_text(angle=30, vjust=0.6))+ggtitle("Correlation Matrix")
```
2. Scatterpot including 3 variables
use color and size of each scatter point to represent other two variables
```{r,fig_width=12, fig_height = 4}
ggplot(df, aes(x=index, y=sepallength, color=sepalwidth)) + geom_point(aes(size = petallength), alpha=0.7) + ggtitle("epal length for each iris plant ")+ scale_color_gradient(low="blue", high="yellow")+ theme_bw()
```
3. Regression graph with multiple regression lines
```{r,fig_width=12, fig_height =8}
theme_set(theme_bw())
g <- ggplot(data=df, aes(x=sepallength, y=sepalwidth, color=class)) +
geom_point(aes(shape=class), size=1) + xlab("Sepal Length") + ylab("Sepal Width") +
ggtitle("Iris Species classification")
# generalised additive model
g + geom_smooth(method="gam", formula= y~s(x, bs="cs"))
```
4. Multiple plots sharing axes
```{r,fig_width=8, fig_height =4}
p <-ggplot(df, aes(x=index)) +
geom_line(aes(y = sepallength), color = "blue") +
geom_line(aes(y = petallength), color = "orange") +
xlab("ID") + ylab("Length")+
ggtitle('Sepal Length and Petal Length for each iris plant')
p + theme(
plot.title = element_text(color="navy", size=10, face="bold"),
axis.title.x = element_text(color="navy", size=10, face="bold"),
axis.title.y = element_text(color="navy", size=10, face="bold")
,legend.title = element_blank())
```
5. Boxplot <br />
Source: https://www.r-graph-gallery.com/265-grouped-boxplot-with-ggplot2.html
```{r,fig_width=24, fig_height =4}
p1<- ggplot(df, aes( y=sepallength, fill=class))+
geom_boxplot() +
facet_wrap(~class)+
ylim(c(0,8))
p2<- ggplot(df, aes( y=petallength, fill=class))+
geom_boxplot() +
facet_wrap(~class)+
ylim(c(0,8))
library(ggpubr)
ggarrange(p1, p2, widths = c(4,4), labels = c('a', 'b'))
```