-
-
Notifications
You must be signed in to change notification settings - Fork 3
/
part-01-04-ergms.qmd
616 lines (447 loc) · 27 KB
/
part-01-04-ergms.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
---
date-modified: 2024-06-27
---
# Exponential Random Graph Models
I strongly suggest reading the vignette in the `ergm` R package.
:::{.content-hidden}
{{< include math.tex >}}
:::
```{r}
#| echo: false
if (file.exists("ergm.rda")) {
load("ergm.rda")
chapter_cached <- TRUE
} else {
chapter_cached <- FALSE
}
knitr::opts_chunk$set(collapse = TRUE)
```
```r
vignette("ergm", package="ergm")
```
> The purpose of ERGMs, in a nutshell, is to describe parsimoniously the local selection forces that shape the global structure of a network. To this end, a network dataset, like those depicted in Figure 1, may be considered as the response in a regression model, where the predictors are things like "propensity for individuals of the same sex to form partnerships" or "propensity for individuals to form triangles of partnerships". In Figure 1(b), for example, it is evident that the individual nodes appear to cluster in groups of the same numerical labels (which turn out to be students' grades, 7 through 12); thus, an ERGM can help us quantify the strength of this intra-group effect.
>
> --- [@Hunter2008]
![Source: Hunter et al. (2008)](img/hunter2008.png)
In a nutshell, we use ERGMs as a parametric interpretation of the distribution of $\mathbf{Y}$, which takes the canonical form:
$$
\Pr{\mathbf{Y}=\mathbf{y}|\theta, \mathcal{Y}} = \frac{\exp{\theta^{\text{T}}\mathbf{g}(\mathbf{y})}}{\kappa\left(\theta, \mathcal{Y}\right)},\quad\mathbf{y}\in\mathcal{Y}
$$ {#eq-main-ergm}
Where $\theta\in\Omega\subset\mathbb{R}^q$ is the vector of model coefficients and $\mathbf{g}(\mathbf{y})$ is a *q*-vector of statistics based on the adjacency matrix $\mathbf{y}$.
Model @eq-main-ergm may be expanded by replacing $\mathbf{g}(\mathbf{y})$ with $\mathbf{g}(\mathbf{y}, \mathbf{X})$ to allow for additional covariate information $\mathbf{X}$ about the network. The denominator $\kappa\left(\theta,\mathcal{Y}\right) = \sum_{\mathbf{y}\in\mathcal{Y}}\exp{\theta^{\text{T}}\mathbf{g}(\mathbf{y})}$ is the normalizing factor that ensures that equation @eq-main-ergm is a legitimate probability distribution. Even after fixing $\mathcal{Y}$ to be all the networks that have size $n$, the size of $\mathcal{Y}$ makes this type of statistical model hard to estimate as there are $N = 2^{n(n-1)}$ possible networks! [@Hunter2008]
Later developments include new dependency structures to consider more general neighborhood effects. These models relax the one-step Markovian dependence assumptions, allowing investigation of longer-range configurations, such as longer paths in the network or larger cycles (Pattison and Robins 2002). Models for bipartite (Faust and Skvoretz 1999) and tripartite (Mische and Robins 2000) network structures have been developed. [@Hunter2008 p. 9]
## A naïve example
In the simplest case, ERGMs equate a logistic regression. By simple, I mean cases with no Markovian terms--motifs involving more than one edge--for example, the Bernoulli graph. In the Bernoulli graph, ties are independent, so the presence/absence of a tie between nodes $i$ and $j$ won't affect the presence/absence of a tie between nodes $k$ and $l$.
Let's fit an ERGM using the `sampson` dataset in the `ergm` package.
```{r}
#| label: part-01-04-loading-data
#| echo: true
#| collapse: true
#| message: false
library(ergm)
library(netplot)
data("sampson")
nplot(samplike)
```
Using `ergm` to fit a Bernoulli graph requires using the `edges` term, which counts how many ties are in the graph:
```{r}
#| label: first-fit
#| echo: true
#| collapse: true
ergm_fit <- ergm(samplike ~ edges)
```
Since this is equivalent to a logistic regression, we can use the `glm` function to fit the same model. First, we need to prepare the data so we can pass it to `glm`:
```{r}
y <- sort(as.vector(as.matrix(samplike)))
y <- y[-c(1:18)] # We remove the diagonal from the model, which is all 0.
y
```
We can now fit the GLM model:
```{r}
glm_fit <- glm(y~1, family=binomial("logit"))
```
The coefficients of both ERGM and GLM should match:
```{r}
glm_fit
ergm_fit
```
Furthermore, in the case of the Bernoulli graph, we can get the estimate using the Logit function:
```{r}
pr <- mean(y)
# Logit function:
# Alternatively we could have used log(pr) - log(1-pr)
qlogis(pr)
```
Again, the same result. The Bernoulli graph is not the only ERGM model that can be fitted using a Logistic regression. Moreover, if all the terms of the model are non-Markov terms, `ergm` automatically defaults to a Logistic regression.
## Estimation of ERGMs
The ultimate goal is to perform statistical inference on the proposed model. In a *standard* setting, we could use Maximum Likelihood Estimation (MLE), which consists of finding the model parameters $\theta$ that, given the observed data, maximize the likelihood of the model. For the latter, we generally use [Newton's method](https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization). Newton's method requires computing the model's log-likelihood, which can be challenging in ERGMs.
For ERGMs, since part of the likelihood involves a normalizing constant that is a function of all possible networks, this is not as straightforward as in the regular setting. Because of this, most estimation methods rely on simulations.
In `statnet`, the default estimation method is based on a method proposed by [@Geyer1992], Markov-Chain MLE, which uses Markov-Chain Monte Carlo for simulating networks and a modified version of the Newton-Raphson algorithm to estimate the parameters.
The idea of MC-MLE for this family of statistical models is to approximate the expectation of normalizing constant ratios using the law of large numbers. In particular, the following:
\begin{align*}
\frac{\kappa\left(\theta,\mathcal{Y}\right)}{\kappa\left(\theta_0,\mathcal{Y}\right)} & =
\frac{
\sum_{\mathbf{y}\in\mathcal{Y}}\exp{\theta^{\text{T}}\mathbf{g}(\mathbf{y})}}{
\sum_{\mathbf{y}\in\mathcal{Y}}\exp{\theta_0^{\text{T}}\mathbf{g}(\mathbf{y})}
} \\
& = \sum_{\mathbf{y}\in\mathcal{Y}}\left( %
\frac{1}{%
\sum_{\mathbf{y}\in\mathcal{Y}\exp{\theta_0^{\text{T}}\mathbf{g}(\mathbf{y})}}%
} \times %
\exp{\theta^{\text{T}}\mathbf{g}(\mathbf{y})} %
\right) \\
& = \sum_{\mathbf{y}\in\mathcal{Y}}\left( %
\frac{\exp{\theta_0^{\text{T}}\mathbf{g}(\mathbf{y})}}{%
\sum_{\mathbf{y}\in\mathcal{Y}\exp{\theta_0^{\text{T}}\mathbf{g}(\mathbf{y})}}%
} \times %
\exp{(\theta - \theta_0)^{\text{T}}\mathbf{g}(\mathbf{y})} %
\right) \\
& = \sum_{\mathbf{y}\in\mathcal{Y}}\left( %
\Pr{Y = y|\mathcal{Y}, \theta_0} \times %
\exp{(\theta - \theta_0)^{\text{T}}\mathbf{g}(\mathbf{y})} %
\right) \\
& = \text{E}_{\theta_0}\left(\exp{(\theta - \theta_0)^{\text{T}}\mathbf{g}(\mathbf{y})} \right)
\end{align*}
In particular, the MC-MLE algorithm uses this fact to maximize the log-likelihood ratio. The objective function itself can be approximated by simulating $m$ networks from the distribution with parameter $\theta_0$:
$$
l(\theta) - l(\theta_0) \approx (\theta - \theta_0)^{\text{T}}\mathbf{g}(\mathbf{y}_{obs}) -
\text{log}{\left[\frac{1}{m}\sum_{i = 1}^m\exp{(\theta-\theta_0)^{\text{T}}}\mathbf{g}(\mathbf{Y}_i)\right]}
$$
For more details, see [@Hunter2008]. A sketch of the algorithm follows:
1. Initialize the algorithm with an initial guess of $\theta$, call it $\theta^{(t)}$ (must be a rather OK guess)
2. While (no convergence) do:
a. Using $\theta^{(t)}$, simulate $M$ networks by means of small changes in the $\mathbf{Y}_{obs}$ (the observed network). This part is done by using an importance-sampling method which weights each proposed network by its likelihood conditional on $\theta^{(t)}$
b. With the networks simulated, we can do the Newton step to update the parameter $\theta^{(t)}$ (this is the iteration part in the `ergm` package): $\theta^{(t)}\to\theta^{(t+1)}$.
c. If convergence has been reached (which usually means that $\theta^{(t)}$ and $\theta^{(t + 1)}$ are not very different), then stop; otherwise, go to step a.
[@lusher2012;@admiraal2006;@Snijders2002;@Wang2009] provides details on the algorithm used by PNet (the same as the one used in `RSiena`), and [@lusher2012] provides a short discussion on the differences between `ergm` and `PNet`.
## The `ergm` package
The `ergm` R package [@R-ergm]
From the previous section:^[You can download the 03.rda file from [this link](https://github.com/gvegayon/appliedsnar).]
```{r 04-setup, message=FALSE, warning=FALSE}
library(igraph)
library(dplyr)
load("03.rda")
```
In this section, we will use the `ergm` package (from the `statnet` suit of packages [@R-ergm],) and the `intergraph` [@R-intergraph] package. The latter provides functions to go back and forth between `igraph` and `network` objects from the `igraph` and `network` packages respectively^[Yes, the classes have the same name as the packages.]
```{r 03-ergms-setup, message=FALSE}
library(ergm)
library(intergraph)
```
As a rather important side note, the order in which R packages are loaded matters. Why is this important to mention now? Well, it turns out that at least a couple of functions in the `network` package have the same name as some functions in the `igraph` package. When the `ergm` package is loaded, since it depends on `network`, it will load the `network` package first, which will _mask_ some functions in `igraph`. This becomes evident once you load `ergm` after loading `igraph`:
```
The following objects are masked from ‘package:igraph’:
add.edges, add.vertices, %c%, delete.edges, delete.vertices, get.edge.attribute, get.edges,
get.vertex.attribute, is.bipartite, is.directed, list.edge.attributes, list.vertex.attributes, %s%,
set.edge.attribute, set.vertex.attribute
```
What are the implications of this? If you call the function `list.edge.attributes` for an object of class `igraph` R will return an error as the first function that matches that name comes from the `network` package! To avoid this you can use the double colon notation:
```r
igraph::list.edge.attributes(my_igraph_object)
network::list.edge.attributes(my_network_object)
```
Anyway... Using the `asNetwork` function, we can coerce the `igraph` object into a network object so we can use it with the `ergm` function:
```{r 03-ergms-intergraph, cache=TRUE}
# Creating the new network
network_111 <- intergraph::asNetwork(ig_year1_111)
# Running a simple ergm (only fitting edge count)
ergm(network_111 ~ edges)
```
So what happened here? We got a warning. It turns out that our network has loops (didn't think about it before!). Let's take a look at that with the `which_loop` function
```{r 03-ergms-which-loop}
E(ig_year1_111)[which_loop(ig_year1_111)]
```
We can get rid of these using the `igraph::-.igraph`. Let's remove the isolates using the same operator
```{r 03-ergms-intergraph2, cache=TRUE, eval=TRUE}
# Creating the new network
network_111 <- ig_year1_111
# Removing loops
network_111 <- network_111 - E(network_111)[which(which_loop(network_111))]
# Removing isolates
network_111 <- network_111 - which(degree(network_111, mode = "all") == 0)
# Converting the network
network_111 <- intergraph::asNetwork(network_111)
```
`asNetwork(simplify(ig_year1_111))`
`ig_year1_111 |> simplify() |> asNetwork()`
A problem that we have with this data is the fact that some vertices have missing values in the variables `hispanic`, `female1`, and `eversmk1`. For now, we will proceed by imputing values based on the averages:
```{r}
#| label: 04-impute-values
for (v in c("hispanic", "female1", "eversmk1")) {
tmpv <- network_111 %v% v
tmpv[is.na(tmpv)] <- mean(tmpv, na.rm = TRUE) > .5
network_111 %v% v <- tmpv
}
```
Let's take a look at the network
```{r}
#| label: fig-before-big-fit
nplot(
network_111,
vertex.color = ~ hispanic
)
```
## Running ERGMs
Proposed workflow:
1. Estimate the simplest model, adding one variable at a time.
2. After each estimation, run the `mcmc.diagnostics` function to see how good (or bad) behaved the chains are.
3. Run the `gof` function and verify how good the model matches the network's structural statistics.
What to use:
1. `control.ergms`: Maximum number of iterations, seed for Pseudo-RNG, how many cores
2. `ergm.constraints`: Where to sample the network from. Gives stability and (in some cases) faster convergence as by constraining the model you are reducing the sample size.
Here is an example of a couple of models that we could compare^[Notice that this document may not include the usual messages that the `ergm` command generates during the estimation procedure. This is just to make it more printable-friendly.]
```{r}
#| label: 04-ergms-model0
#| cache: true
#| message: false
ans0 <- ergm(
network_111 ~
edges +
nodematch("hispanic") +
nodematch("female1") +
nodematch("eversmk1") +
mutual,
constraints = ~bd(maxout = 19),
control = control.ergm(
seed = 1,
MCMLE.maxit = 10,
parallel = 4,
CD.maxit = 10
)
)
```
So what are we doing here:
1. The model is controlling for:
a. `edges` Number of edges in the network (as opposed to its density)
b. `nodematch("some-variable-name-here")` Includes a term that controls for homophily/heterophily
c. `mutual` Number of mutual connections between $(i, j), (j, i)$. This can be related to, for example, triadic closure.
For more on control parameters, see [@Morris2008].
```{r 04-ergms-model1, cache=TRUE, eval=!chapter_cached, message=FALSE}
ans1 <- ergm(
network_111 ~
edges +
nodematch("hispanic") +
nodematch("female1") +
nodematch("eversmk1")
,
constraints = ~bd(maxout = 19),
control = control.ergm(
seed = 1,
MCMLE.maxit = 10,
parallel = 4,
CD.maxit = 10
)
)
```
This example takes longer to compute
```{r 04-ergms-model2, cache=TRUE, eval=!chapter_cached, message=FALSE}
ans2 <- ergm(
network_111 ~
edges +
nodematch("hispanic") +
nodematch("female1") +
nodematch("eversmk1") +
mutual +
balance
,
constraints = ~bd(maxout = 19),
control = control.ergm(
seed = 1,
MCMLE.maxit = 10,
parallel = 4,
CD.maxit = 10
)
)
```
Now, a nice trick to see all regressions in the same table, we can use the `texreg` package [@R-texreg] which supports `ergm` ouputs!
```{r 04-ergm-tabulation-screen}
library(texreg)
screenreg(list(ans0, ans1, ans2))
```
Or, if you are using rmarkdown, you can export the results using LaTeX or html, let's try the latter to see how it looks like here:
```{r 04-ergm-tabulation-html, results='asis', eval=knitr::is_html_output(), echo=knitr::is_html_output()}
library(texreg)
htmlreg(list(ans0, ans1, ans2))
```
```{r 04-ergm-tabulation-latex, results='asis', eval=knitr::is_latex_output(), echo=knitr::is_latex_output()}
library(texreg)
texreg(list(ans0, ans1, ans2))
```
```{r ergm-save-image, echo=FALSE, cache=TRUE}
save.image("ergm.rda", compress = TRUE)
```
## Model Goodness-of-Fit
In raw terms, once each chain has reached stationary distribution, we can say that there are no problems with autocorrelation and that each sample point is iid. The latter implies that, since we are running the model with more than one chain, we can use all the samples (chains) as a single dataset.
> Recent changes in the ergm estimation algorithm mean that these plots can no longer be used to ensure that the mean statistics from the model match the observed network statistics. For that functionality, please use the GOF command: `gof(object, GOF=~model)`.
>
> ---?ergm::mcmc.diagnostics
Since `ans0` is the best model, let's look at the GOF statistics. First, let's see how the MCMC did. We can use the `mcmc.diagnostics` function included in the package. The function is a wrapper of a couple of functions from the `coda` package [@R-coda], which are called upon the `$sample` object holding the *centered* statistics from the sampled networks. At first, it can be confusing to look at the `$sample` object; it neither matches the observed statistics nor the coefficients.
When calling `mcmc.diagnostics(ans0, centered = FALSE)`, you will see many outputs, including a couple of plots showing the trace and posterior distribution of the *uncentered* statistics (`centered = FALSE`). The following code chunks will reproduce the output from the `mcmc.diagnostics` function step by step using the coda package. First, we need to *uncenter* the sample object:
```{r ergm-uncentering}
# Getting the centered sample
sample_centered <- ans0$sample
# Getting the observed statistics and turning it into a matrix so we can add it
# to the samples
observed <- summary(ans0$formula)
observed <- matrix(
observed,
nrow = nrow(sample_centered[[1]]),
ncol = length(observed),
byrow = TRUE
)
# Now we uncenter the sample
sample_uncentered <- lapply(sample_centered, function(x) {
x + observed
})
# We have to make it an mcmc.list object
sample_uncentered <- coda::mcmc.list(sample_uncentered)
```
Under the hood:
1. _Empirical means and sd, and quantiles_:
```{r coda-summary}
summary(sample_uncentered)
```
2. _Cross correlation_:
```{r coda-corr}
coda::crosscorr(sample_uncentered)
```
3. Autocorrelation_: For now, we will only look at autocorrelation for chain one. Autocorrelation should be small (in a general MCMC setting). If autocorrelation is high, then it means that your sample is not idd (no Markov property). A way to solve this is *thinning* the sample.
```{r coda-autocorr}
coda::autocorr(sample_uncentered)[[1]]
```
4. _Geweke Diagnostic_: From the function's help file:
> "If the samples are drawn from the stationary distribution of the chain, the two means are equal and Geweke's statistic has an asymptotically standard normal distribution. [...]
The Z-score is calculated under the assumption that the two parts of the chain are asymptotically independent, which requires that the sum of frac1 and frac2 be strictly less than 1.""
>
> ---?coda::geweke.diag
Let's take a look at a single chain:
```{r coda-geweke.diag}
coda::geweke.diag(sample_uncentered)[[1]]
```
5. _(not included) Gelman Diagnostic_: From the function's help file:
> Gelman and Rubin (1992) propose a general approach to monitoring convergence of MCMC output in which m > 1 parallel chains are run with starting values that are overdispersed relative to the posterior distribution. Convergence is diagnosed when the chains have ‘forgotten’ their initial values, and the output from all chains is indistinguishable. The gelman.diag diagnostic is applied to a single variable from the chain. It is based a comparison of within-chain and between-chain variances, and is similar to a classical analysis of variance.
> ---?coda::gelman.diag
As a difference from the previous diagnostic statistic, this uses all chains simultaneously:
```{r coda-gelman.diag}
coda::gelman.diag(sample_uncentered)
```
As a rule of thumb, values in the $[.9,1.1]$ are good.
One nice feature of the `mcmc.diagnostics` function is the nice trace and posterior distribution plots that it generates. If you have the R package `latticeExtra` [@R-latticeExtra], the function will override the default plots used by `coda::plot.mcmc` and use lattice instead, creating nicer-looking plots. The next code chunk calls the `mcmc.diagnostic` function, but we suppress the rest of the output (see figure @fig-coda-plots).
```{r}
#| label: fig-coda-plots
#| fig-align: center
#| fig-cap: "Trace and posterior distribution of sampled network statistics."
# [2022-03-13] This line is failing for what it could be an ergm bug
# mcmc.diagnostics(ans0, center = FALSE) # Suppressing all the output
```
If we call the function `mcmc.diagnostics`, this message appears at the end:
> MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).
>
> ---`mcmc.diagnostics(ans0)`
Not that bad (although the `mutual` term could do better)!^[The statnet wiki website as a very nice example of (very) bad and good MCMC diagnostics plots [here](https://statnet.org/trac/raw-attachment/wiki/Resources/ergm.fit.diagnostics.pdf).] First, observe that in the figure, we see four different lines; why is that? Since we were running in parallel using four cores, the algorithm ran four chains of the MCMC algorithm. An eyeball test is to see if all the chains moved at about the same place; in such a case, we can start thinking about model convergence from the MCMC perspective.
Once we are sure to have reach convergence on the MCMC algorithm, we can start thinking about how well does our model predicts the observed network's proterties. Besides the statistics that define our ERGM, the `gof` function's default behavior show GOF for:
a. In degree distribution,
b. Out degree distribution,
c. Edge-wise shared partners, and
d. Geodesics
Let's take a look at it
```{r checking-gof, cache = TRUE, fig.pos='!h'}
# Computing and printing GOF estatistics
ans_gof <- gof(ans0)
ans_gof
# Plotting GOF statistics
plot(ans_gof)
```
Try the following configuration instead
```r
ans0_bis <- ergm(
network_111 ~
edges +
nodematch("hispanic") +
nodematch("female1") +
mutual +
esp(0:3) +
idegree(0:10)
,
constraints = ~bd(maxout = 19),
control = control.ergm(
seed = 1,
MCMLE.maxit = 15,
parallel = 4,
CD.maxit = 15,
MCMC.samplesize = 2048*4,
MCMC.burnin = 30000,
MCMC.interval = 2048*4
)
)
```
Increase the sample size, so the curves are smoother, longer intervals (thinning), which reduces autocorrelation, and a larger burin. All this together to improve the Gelman test statistic. We also added idegree from 0 to 10, and esp from 0 to 3 to explicitly match those statistics in our model.
![An example of a terrible ERGM (no convergence at all). Also, a good example of why running multiple chains can be useful](img/awful-chains.png){fig-align="center"}
## More on MCMC convergence
For more on this issue, I recommend reviewing [chapter 1](http://www.mcmchandbook.net/HandbookChapter1.pdf) and [chapter 6](http://www.mcmchandbook.net/HandbookChapter6.pdf) from the Handbook of MCMC [@brooks2011]. Both chapters are free to download from the [book's website](http://www.mcmchandbook.net/HandbookSampleChapters.html).
For GOF take a look at section 6 of [ERGM 2016 Sunbelt tutorial](https://statnet.csde.washington.edu/trac/raw-attachment/wiki/Sunbelt2016/ergm_tutorial.html), and for a more technical review, you can take a look at [@HunterJASA2008].
## Mathematical Interpretation
One of the most critical parts of statistical modeling is interpreting the results,
if not the most important. In the case of ERGMs, a key aspect is based on change
statistics. Suppose that we would like to know how likely the tie $y_{ij}$ is to
happen, given the rest of the network. We can compute such probabilities using what
literature sometimes describes as the Gibbs-sampler.
In particular, the log-odds of the $ij$ tie ocurring conditional on the rest of
the network can be written as:
\begin{equation}
\logit{\Pr{y_{ij} = 1|y_{-ij}}} = \transpose{\theta}\Delta\chng{ij},
\end{equation}
\noindent with $\chng{ij}\equiv \snamed{\mathbf{y}}{ij}^+ - \snamed{\mathbf{y}}{ij}^-$ as
the vector of change statistics, in other words, the difference between the
sufficient statistics when $y_{ij}=1$ and its value when $y_{ij} = 0$. To show
this, we write the following:
\begin{align*}
\Pr{y_{ij} = 1|y_{-ij}} & = %
\frac{\Pr{y_{ij} = 1, x_{-ij}}}{%
\Pr{y_{ij} = 1, y_{-ij}} + \Pr{y_{ij} = 0, y_{-ij}}} \\
& = \frac{\exp{\transpose{\theta}\s{\mathbf{y}}^+_{ij}}}{%
\exp{\transpose{\theta}\s{\mathbf{y}}^+_{ij}} + \exp{\transpose{\theta}\s{\mathbf{y}}^-_{ij}}}
\end{align*}
Applying the logit function to the previous equation, we obtain:
\begin{align*}
& = \log{\frac{\exp{\transpose{\theta}\s{\mathbf{y}}^+_{ij}}}{%
\exp{\transpose{\theta}\s{\mathbf{y}}^+_{ij}} + %
\exp{\transpose{\theta}\s{\mathbf{y}}^-_{ij}}}} - %
\log{ %
\frac{\exp{\transpose{\theta}\s{\mathbf{y}}^-_{ij}}}{%
\exp{\transpose{\theta}\s{\mathbf{y}}^+_{ij}} + \exp{\transpose{\theta}\s{\mathbf{y}}^-_{ij}}}%
} \\
& = \log{\exp{\transpose{\theta}\s{\mathbf{y}}^+_{ij}}} - \log{\exp{\transpose{\theta}\s{\mathbf{y}}^-_{ij}}} \\
& = \transpose{\theta}\left(\s{\mathbf{y}}^+_{ij} - \s{\mathbf{y}}^-_{ij}\right) \\
& = \transpose{\theta}\Delta\chng{ij}
\end{align*}
\noindent Henceforth, the conditional probability of node $n$ gaining function $k$ can be written as:
\begin{equation}
\Pr{y_{ij} = 1|y_{-ij}} = \frac{1}{1 + \exp{-\transpose{\theta}\Delta\chng{ij}}}
\end{equation}
\noindent i.e., a logistic probability.
## Markov independence
The challenge of analyzing networks is their interdependent nature. Nonetheless, in the absence of such interdependence, ERGMs are equivalent to logistic regression. Conceptually, if all the statistics included in the model do not involve two or more dyads, then the model is non-Markovian in the sense of Markov graphs.
Mathematically, to see this, it suffices to show that the ERGM probability can be written as the product of each dyads' probabilities.
\begin{equation*}
\Pr{\mathbf{y} | \theta} = \frac{\exp{\transpose{\theta}\s{\mathbf{y}}}}{\sum_{y}\exp{\transpose{\theta}\s{\mathbf{y}}}}
= \frac{\prod_{ij}\exp{\transpose{\theta}\s{\mathbf{y}}_{ij}}}{\sum_{y}\exp{\transpose{\theta}\s{\mathbf{y}}}}
\end{equation*}
Where $\s{}_{ij}$ is a function such that $\s{\mathbf{y}} = \sum_{ij}{\s{\mathbf{y}}_{ij}}$. We now need to deal with the normalizing constant. To see how that can be saparated, let's start from the result:
\newcommand{\thetaS}[1]{\exp{\transpose{\theta}\s{\mathbf{y}}_{#1}}}
\begin{align*}
& =\prod_{ij}\left(1 + \thetaS{ij}\right) \\
& = \left(1 + \thetaS{11}\right)\left(1 + \thetaS{12}\right)\dots\left(1 + \thetaS{nn}\right) \\
& = 1 + \thetaS{11} + \thetaS{11}\thetaS{12} + \dots + \prod_{ij}\thetaS{ij} \\
& = 1 + \thetaS{11} + \exp{\transpose{\theta}\left(\s{\mathbf{y}}_{11} + \s{\mathbf{y}}_{12}\right)} + \dots + \prod_{ij}\thetaS{ij} \\
& = \sum_{\mathbf{y}\in\mathcal{Y}}\exp{\transpose{\theta}\s{\mathbf{y}}}
\end{align*}
Where the last equality follows from the fact that the sum *is* the sum over all possible combinations of networks, starting from $exp(0) = 1$, up to $exp(all)$. This way, we can now write:
\begin{equation}
\frac{\prod_{ij}\exp{\transpose{\theta}\s{\mathbf{y}}_{ij}}}{\sum_{y}\exp{\transpose{\theta}\s{\mathbf{y}}}} =
\prod_{ij}\frac{\exp{\transpose{\theta}\s{\mathbf{y}}_{ij}}}{1 + \exp{\transpose{\theta}\s{\mathbf{y}}_{ij}}}
\end{equation}
Related to this, block-diagonal ERGMs can be estimated as independent models, one per block. To see more about this, read [@Snijders2010margin]. Likewise, since independence depends--pun intended--on partitioning the objective function, as pointed by Snijders, non-linear functions make the model dependent, e.g., $\s{\mathbf{y}} = \sqrt{\sum_{ij}y_{ij}}$, the square root of the edgecount is no longer a bernoulli graph.
\renewcommand{\Pr}[1]{\mathbb{P}{#1}}
```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = FALSE)
```