-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path3-sim.qmd
320 lines (249 loc) · 8.85 KB
/
3-sim.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
---
title: "Simulation of diffusion networks: rdiffnet"
author: "Thomas W. Valente and George G. Vega Yon"
---
```{r setup, echo=FALSE, message=FALSE, warning=FALSE}
library(netdiffuseR)
knitr::opts_chunk$set(comment = "#")
```
# Introduction
Before we start, a review of the concepts we will be using here
1. Exposure: Proportion/number of neighbors that has adopted an innovation at each point in time.
2. Threshold: The proportion/number of your neighbors who had adopted at or one time period before ego (the focal individual) adopted.
3. Infectiousness: How much $i$'s adoption affects her alters.
4. Susceptibility: How much $i$'s alters' adoption affects her.
5. Structural equivalence: How similar are $i$ and $j$ in terms of position in the network.
# Simulating diffusion networks
We will simulate a diffusion network with the following parameters:
1. Will have 1,000 vertices,
2. Will span 20 time periods,
3. The initial adopters (seeds) will be selected random,
4. Seeds will be a 10\% of the network,
5. The graph (network) will be small-world,
6. Will use the WS algorithmwith $p=.2$ (probability of rewire).
7. Threshold levels will be uniformly distributed between [0.3, 0.7\]
To generate this diffusion network, we can use the `rdiffnet` function included in the package:
```{r Generating the random graph}
# Setting the seed for the RNG
set.seed(1213)
# Generating a random diffusion network
net <- rdiffnet(
n = 1e3, # 1.
t = 20, # 2.
seed.nodes = "random", # 3.
seed.p.adopt = .1, # 4.
seed.graph = "small-world", # 5.
rgraph.args = list(p=.2), # 6.
threshold.dist = function(x) runif(1, .3, .7) # 7.
)
```
* The function `rdiffnet` generates random diffusion networks. Main features:
1. Simulating random graph or using your own,
2. Setting threshold levels per node,
3. Network rewiring throughout the simulation, and
4. Setting the seed nodes.
* The simulation algorithm is as follows:
1. If required, a baseline graph is created,
2. Set of initial adopters and threshold distribution are established,
3. The set of t networks is created (if required), and
4. Simulation starts at t=2, assigning adopters based on exposures and thresholds:
a. For each $i \in N$, if its exposure at $t-1$ is greater than its threshold, then
adopts, otherwise continue without change.
b. next $i$
# Rumor spreading
```{r sim-rumor}
library(netdiffuseR)
set.seed(09)
diffnet_rumor <- rdiffnet(
n = 5e2,
t = 5,
seed.graph = "small-world",
rgraph.args = list(k = 4, p = .3),
seed.nodes = "random",
seed.p.adopt = .05,
rewire = TRUE,
threshold.dist = function(i) 1L,
exposure.args = list(normalized = FALSE)
)
```
```{r summary-rumor}
summary(diffnet_rumor)
```
```{r plot-rumor, fig.align='center', cache=TRUE}
plot_diffnet(diffnet_rumor, slices = c(1, 3, 5))
# We want to use igraph to compute layout
igdf <- diffnet_to_igraph(diffnet_rumor, slices=c(1,2))[[1]]
pos <- igraph::layout_with_drl(igdf)
plot_diffnet2(diffnet_rumor, vertex.size = dgr(diffnet_rumor)[,1], layout=pos)
```
# Difussion
```{r sim-disease}
set.seed(09)
diffnet_disease <- rdiffnet(
seed.graph = diffnet_rumor$graph,
seed.nodes = which(diffnet_rumor$toa == 1),
rewire = FALSE,
threshold.dist = function(i) rbeta(1, 3, 10),
name = "Diffusion",
behavior = "Some social behavior"
)
```
```{r plot-disease-and-disease}
plot_adopters(diffnet_rumor, what = "cumadopt", include.legend = FALSE)
plot_adopters(diffnet_disease, bg="lightblue", add=TRUE, what = "cumadopt")
legend(
"topleft",
legend = c("Disease", "Rumor"),
col = c("lightblue", "tomato"),
bty = "n", pch=19
)
```
# Multi-diffusion models (TBD)
# Mentor Matching
```{r mentor-match, cache = TRUE}
# Finding mentors
mentors <- mentor_matching(diffnet_rumor, 25, lead.ties.method = "random")
# Simulating diffusion with these mentors
set.seed(09)
diffnet_mentored <- rdiffnet(
seed.graph = diffnet_disease,
seed.nodes = which(mentors$`1`$isleader),
rewire = FALSE,
threshold.dist = diffnet_disease[["real_threshold"]],
name = "Diffusion using Mentors"
)
summary(diffnet_mentored)
```
```{r toa_mat-mentors}
cumulative_adopt_count(diffnet_disease)
cumulative_adopt_count(diffnet_mentored)
```
# Example by changing threshold
The following block of code runs multiple diffnet simulations. Before we proceed, we will generate a scale-free homophilic network:
```{r}
#| label: scale-free-homophilic
# Simulating a scale-free homophilic network
set.seed(1231)
X <- rep(c(1,1,1,1,1,0,0,0,0,0), 50)
net <- rgraph_ba(t = 499, m=4, eta = X)
# Taking a look in igraph
ig <- igraph::graph_from_adjacency_matrix(net)
plot(ig, vertex.color = c("azure", "tomato")[X+1], vertex.label = NA,
vertex.size = sqrt(dgr(net)))
```
Besides of the usual parameters passed to `rdiffnet`, the `rdiffnet_multiple` function requires `R` (number of repetitions/simulations), and `statistic` (a function that returns the statistic of insterst). Optionally, the user may choose to specify the number of clusters to run it in parallel (multiple CPUs):
```{r rdiffnet-multiple}
nsim <- 500L
ans_1and2 <- rdiffnet_multiple(
# Num of sim
R = nsim,
# Statistic
statistic = function(d) cumulative_adopt_count(d)["prop",],
seed.graph = net,
t = 10,
threshold.dist = sample(1:2, 500L, TRUE),
seed.nodes = "random",
seed.p.adopt = .1,
rewire = FALSE,
exposure.args = list(outgoing=FALSE, normalized=FALSE),
# Running on 4 cores
ncpus = 4L
) |> t()
ans_2and3 <- rdiffnet_multiple(
# Num of sim
R = nsim,
# Statistic
statistic = function(d) cumulative_adopt_count(d)["prop",],
seed.graph = net,
t = 10,
threshold.dist = sample(2:3, 500, TRUE),
seed.nodes = "random",
seed.p.adopt = .1,
rewire = FALSE,
exposure.args = list(outgoing=FALSE, normalized=FALSE),
# Running on 4 cores
ncpus = 4L
) |> t()
ans_1and3 <- rdiffnet_multiple(
# Num of sim
R = nsim,
# Statistic
statistic = function(d) cumulative_adopt_count(d)["prop",],
seed.graph = net,
t = 10,
threshold.dist = sample(1:3, 500, TRUE),
seed.nodes = "random",
seed.p.adopt = .1,
rewire = FALSE,
exposure.args = list(outgoing=FALSE, normalized=FALSE),
# Running on 4 cores
ncpus = 4L
) |> t()
```
```{r sim-sim-results}
boxplot(ans_1and2, col="ivory", xlab = "Time", ylab = "Proportion of Adopters")
boxplot(ans_2and3, col="tomato", add=TRUE)
boxplot(ans_1and3, col = "steelblue", add=TRUE)
legend(
"topleft",
fill = c("ivory", "tomato", "steelblue"),
legend = c("1/2", "2/3", "1/3"),
title = "Threshold range",
bty ="n"
)
```
* Example simulating a thousand networks by changing threshold levels.
The final prevalence, or hazard as a function of threshold levels.
# Problems
1. Given the following types of networks: Small-world, Scale-free, Bernoulli,
what set of $n$ initiators maximizes diffusion?
(<a href="files/sim-solutions.r" target="_blank">solution script</a> and <a href="files/sim-solutions.png" target="_blank">solution plot</a>)
# Appendix
The following is example code that can be used to run multiple simulations like it is done using the `rdiffnet_multiple` function. We do not recommend this approach but it may be useful for some users:
```{r sim-sim, cache = TRUE, collapse = TRUE}
# Now, simulating a bunch of diffusion processes
nsim <- 500L
ans_1and2 <- vector("list", nsim)
set.seed(223)
for (i in 1:nsim) {
# We just want the cum adopt count
ans_1and2[[i]] <-
cumulative_adopt_count(
rdiffnet(
seed.graph = net,
t = 10,
threshold.dist = sample(1:2, 500L, TRUE),
seed.nodes = "random",
seed.p.adopt = .10,
exposure.args = list(outgoing = FALSE, normalized = FALSE),
rewire = FALSE
)
)
# Are we there yet?
if (!(i %% 50))
message("Simulation ", i," of ", nsim, " done.")
}
# Extracting prop
ans_1and2 <- do.call(rbind, lapply(ans_1and2, "[", i="prop", j=))
ans_2and3 <- vector("list", nsim)
set.seed(223)
for (i in 1:nsim) {
# We just want the cum adopt count
ans_2and3[[i]] <-
cumulative_adopt_count(
rdiffnet(
seed.graph = net,
t = 10,
threshold.dist = sample(2:3, 500L, TRUE),
seed.nodes = "random",
seed.p.adopt = .10,
exposure.args = list(outgoing = FALSE, normalized = FALSE),
rewire = FALSE
)
)
# Are we there yet?
if (!(i %% 50))
message("Simulation ", i," of ", nsim, " done.")
}
ans_2and3 <- do.call(rbind, lapply(ans_2and3, "[", i="prop", j=))
```