Skip to content

Commit fbf0e33

Browse files
author
Travis CI
committedNov 16, 2016
Fixed an error and added code
There was a compile error in chapter 12 and then I added code to 13
1 parent 2d6cef1 commit fbf0e33

File tree

2 files changed

+63
-1
lines changed

2 files changed

+63
-1
lines changed
 

‎12_underover.tex

+1-1
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ \subsection{Coding example}
189189
Catholic 0.1041153 0.03525785 2.952969 5.190079e-03
190190
Infant.Mortality 1.0770481 0.38171965 2.821568 7.335715e-03
191191
\end{verbatim}
192-
Here the increase in variance is \texttt{(0.25387820 / 0.1781971)^2} which is approximately
192+
Here the increase in variance is \texttt{(0.25387820 / 0.1781971)} squared which is approximately
193193
2. This is much less than is predicted by the VIF because it involves the estimated
194194
variance rather than the actual variance.
195195

‎13_penalties.tex

+62
Original file line numberDiff line numberDiff line change
@@ -265,8 +265,70 @@ \subsection{Coding examples}
265265
However, let's see if we can create these sums of squares manually using our
266266
approach.
267267

268+
\begin{verbatim}
269+
> xtilde = as.matrix(swiss);
270+
> y = xtilde[,1]
271+
> x1 = cbind(1, xtilde[,2])
272+
> x2 = cbind(1, xtilde[,2:4])
273+
> x3 = cbind(1, xtilde[,-1])
274+
> makeH = function(x) x %*% solve(t(x) %*% x) %*% t(x)
275+
> n = length(y); I = diag(n)
276+
> h1 = makeH(x1)
277+
> h2 = makeH(x2)
278+
> h3 = makeH(x3)
279+
> ssres1 = t(y) %*% (I - h1) %*% y
280+
> ssres2 = t(y) %*% (I - h2) %*% y
281+
> ssres3 = t(y) %*% (I - h3) %*% y
282+
> ssreg2g1 = t(y) %*% (h2 - h1) %*% y
283+
>ssreg3g2 = t(y) %*% (h3 - h2) %*% y
284+
> out = rbind( c(n - ncol(x1), ssres1, NA, NA),
285+
c(n - ncol(x2), ssres2, ncol(x2) - ncol(x1), ssreg2g1),
286+
c(n - ncol(x3), ssres3, ncol(x3) - ncol(x2), ssreg3g2)
287+
)
288+
> out
289+
[,1] [,2] [,3] [,4]
290+
[1,] 45 6283.116 NA NA
291+
[2,] 43 3180.925 2 3102.191
292+
[3,] 41 2105.043 2 1075.882
293+
\end{verbatim}
294+
It is interesting to note that the F test comapring Model 1 to Model 2 from the \texttt{anova} command
295+
is obtained by dividing \texttt{3102.191 / 2} (a chi-squared divided by its 2 degrees of freedom)
296+
by \texttt{2105.043 / 41} (an independent chi-squared divided by its 3 degrees of freedom). The
297+
denominator of the F statistic is then the residual sum of squares from Model 3, not from Model 2.
298+
299+
This is why the following give two different answers for the F statistic:
268300

301+
\begin{verbatim}
302+
> anova(fit1, fit2)
303+
Analysis of Variance Table
269304
305+
Model 1: Fertility ~ Agriculture
306+
Model 2: Fertility ~ Agriculture + Examination + Education
307+
Res.Df RSS Df Sum of Sq F Pr(>F)
308+
1 45 6283.1
309+
2 43 3180.9 2 3102.2 20.968 4.407e-07 ***
310+
---
311+
> anova(fit1, fit2, fit3)
312+
Analysis of Variance Table
313+
314+
Model 1: Fertility ~ Agriculture
315+
Model 2: Fertility ~ Agriculture + Examination + Education
316+
Model 3: Fertility ~ Agriculture + Examination + Education + Catholic +
317+
Infant.Mortality
318+
Res.Df RSS Df Sum of Sq F Pr(>F)
319+
1 45 6283.1
320+
2 43 3180.9 2 3102.2 30.211 8.638e-09 ***
321+
3 41 2105.0 2 1075.9 10.477 0.0002111 ***
322+
\end{verbatim}
323+
In the first case, the denominator of the F statistic is
324+
\texttt{3180.9 / 43}, the residual mean squared error for Model 2,
325+
as opposed to the latter case where it is dividing by the residual
326+
mean squared error for Model 3. Of course, under the null hypothesis,
327+
either approach yields an independent chi squared statistic in the denominator.
328+
However, using the Model 3 residual mean squared error reduces the
329+
denominator degrees of freedom, though also necessarily reduces the
330+
residual sum of squared errors (since extra terms in the regression
331+
model always do that).
270332

271333
\section{Ridge regression}
272334

0 commit comments

Comments
 (0)
Please sign in to comment.