\documentclass[12pt]{article}
\usepackage{Sweave}
\usepackage{myVignette}
\usepackage[authoryear,round]{natbib}
\bibliographystyle{plainnat}
\DefineVerbatimEnvironment{Sinput}{Verbatim}
{formatcom={\vspace{-1.5ex}},fontshape=sl,
fontfamily=courier,fontseries=b, fontsize=\scriptsize}
\DefineVerbatimEnvironment{Soutput}{Verbatim}
{formatcom={\vspace{-2.5ex}},fontfamily=courier,fontseries=b,%
fontsize=\scriptsize}
%%\VignetteIndexEntry{Examples from Multilevel Software Reviews}
%%\VignetteDepends{lme4}
\begin{document}
\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,width=5,height=3,strip.white=TRUE,keep.source=TRUE}
\SweaveOpts{prefix=TRUE,prefix.string=figs/SoftRev,include=FALSE}% ^^^^^^^^^^^^^^^^
\setkeys{Gin}{width=\textwidth}
\title{Examples from Multilevel Software Comparative Reviews}
\author{Douglas Bates\\R Development Core Team\\\email{Douglas.Bates@R-project.org}}
\date{February 2005, {\small with updates up to \today}}
\maketitle
\begin{abstract}
The Center for Multilevel Modelling at the Institute of Education,
London maintains a web site of ``Software reviews of multilevel
modeling packages''. The data sets discussed in the reviews are
available at this web site. We have incorporated these data sets
in the \code{mlmRev} package for \RR{} and, in this vignette, provide
the results of fitting several models to these data sets.
\end{abstract}
<>=
options(width=80, show.signif.stars = FALSE,
lattice.theme = function() canonical.theme("pdf", color = FALSE))
library(Matrix)
library(lattice)
library(lme4)
library(mlmRev)
set.seed(1234321)
@
\section{Introduction}
\label{sec:Intro}
\RR{} is an Open Source implementation of John Chambers' \Slang{}
language for data analysis and graphics. \RR{} was initially developed
by Ross Ihaka and Robert Gentleman of the University of Auckland and
now is developed and maintained by an international group of
statistical computing experts.
In addition to being Open Source software, which means that anyone can
examine the source code to see exactly how the computations are being
carried out, \RR{} is freely available from a network of archive sites
on the Internet. There are precompiled versions for installation on
the Windows operating system, Mac OS X and several distributions of
the Linux operating system. Because the source code is available
those interested in doing so can compile their own version if they wish.
\RR{} provides an environment for interactive computing with data and for
graphical display of data. Users and developers can extend the
capabilities of \RR{} by writing their own functions in the language
and by creating packages of functions and data sets. Many such
packages are available on the archive network called CRAN
(Comprehensive R Archive Network) for which the parent site is
\url{http://cran.r-project.org}. One such package called \code{lme4}
(along with a companion package called \code{Matrix}) provides
functions to fit and display linear mixed models and generalized
linear mixed models, which are the statisticians' names for the models
called multilevel models or hierarchical linear models in other
disciplines. The \code{lattice} package provides functions to
generate several high level graphics plots that help with the
visualization of the types of data to which such models are fit.
Finally, the \code{mlmRev} package provides the data sets used in the
``Software Reviews of Multilevel Modeling Packages'' from the
Multilevel Modeling Group at the Institute of Education in the UK.
This package also contains several other data sets from the multilevel
modeling literature.
The software reviews mentioned above were intended to provide
comparison of the speed and accuracy of many different packages for
fitting multilevel models. As such, there is a standard set of
models that fit to each of the data sets in each of the packages that
were capable of doing the fit. We will fit these models for
comparative purposes but we will also do some graphical exploration of
the data and, in some cases, discuss alternative models.
We follow the general outline of the previous reviews, beginning with
simpler structures and moving to the more complex structures. Because
the previous reviews were performed on older and considerably slower
computers than the ones on which this vignette will be compiled, the
timings produced by the \code{system.time} function and shown in the
text should not be compared with previous timings given on the web
site. They are an indication of the times required to fit such models
to these data sets on recent computers with processors running at
around 2 GHz or faster.
\section{Two-level normal models}
\label{sec:TwoLevelNormal}
In the multilevel modeling literature a two-level model is one with
two levels of random variation; the per-observation noise term and
random effects which are grouped according to the levels of a factor.
We call this factor a \textit{grouping factor}. If the response is
measured on a continuous scale (more or less) our initial models are
based on a normal distribution for the per-observation noise and for
the random effects. Thus such a model is called a ``two-level normal
model'' even though it has only one grouping factor for the random effects.
\subsection{The Exam data}
\label{sec:Examdata}
<>=
lmer(normexam ~ standLRT + sex + schgend + (1|school), Exam)
@
The data set called \code{Exam} provides the normalized exam scores
attained by 4,059 students from 65 schools in inner London. Some of
the covariates available with this exam score are the school the
student attended, the sex of the student, the school gender (boys,
girls, or mixed) and the student's result on the Standardised London
Reading test.
The \RR{} functions \code{str} and \code{summary} can be used to
examine the structure of a data set (or, in general, any \RR{} object)
and to provide a summary of an object.
<>=
str(Exam)
summary(Exam)
@
\subsection{Model fits and timings}
\label{sec:ExamFits}
The first model to fit to the Exam data incorporates fixed-effects
terms for the pretest score, the student's sex and the school gender.
The only random-effects term is an additive shift associated with the
school.
<>=
(Em1 <- lmer(normexam ~ standLRT + sex + schgend + (1|school), Exam))
@
The \code{system.time} function can be used to time the execution of
an \RR{} expression. It returns a vector of five numbers giving the
user time (time spend executing applications code), the system time
(time spent executing system functions called by the applications
code), the elapsed time, and the user and system time for any child
processes. The first number is what is commonly viewed as the time
required to do the model fit. (The elapsed time is unsuitable because
it can be affected by other processes running on the computer.) These
times are in seconds. On modern computers this fit takes only a
fraction of a second.
<>=
system.time(lmer(normexam ~ standLRT + sex + schgend + (1|school), Exam))
@
\subsection{Interpreting the fit}
\label{sec:ExamInterpret}
As can be seen from the output, the default method of fitting a linear
mixed model is restricted maximum likelihood (REML). The estimates of
the variance components correspond to those reported by other packages
as given on the Multilevel Modelling Group's web site. Note that the
estimates of the variance components are given on the scale of the
variance and on the scale of the standard deviation. That is, the
values in the column headed \code{Std.Dev.} are simply the square
roots of the corresponding entry in the \code{Variance} column. They
are \textbf{not} standard errors of the estimate of the variance.
The estimates of the fixed-effects are different from those quoted on
the web site because the terms for \code{sex} and \code{schgend} use a
different parameterization than in the reviews. Here the reference
level of \code{sex} is female (\code{F}) and the coefficient labelled
\code{sexM} represents the difference for males compared to females.
Similarly the reference level of \code{schgend} is \code{mixed} and
the two coefficients represent the change from mixed to boys only and
the change from mixed to girls only. The value of the coefficient
labelled \code{Intercept} is affected by both these changes as is the
value of the REML criterion.
To reproduce the results obtained from other packages, we must change
the reference level for each of these factors.
<>=
Exam$sex <- relevel(Exam$sex, "M")
Exam$schgend <- relevel(Exam$schgend, "girls")
(Em2 <- lmer(normexam ~ standLRT + sex + schgend + (1|school), Exam))
@
The coefficients now correspond to those in the tables on the web
site. It happens that the REML criterion at the optimum in this fit
is the same as in the previous fit, but you cannot depend on this
occuring. In general the value of the REML criterion at the optimum
depends on the parameterization used for the fixed effects.
\subsection{Further exploration}
\label{sec:ExamExplore}
\subsubsection{Checking consistency of the data}
\label{sec:consistency}
It is important to check the consistency of data before trying to fit
sophisticated models. One should plot the data in many different ways
to see if it looks reasonableand also check relationships between
variables.
For example, each observation in these data is associated with a
particular student. The variable \code{student} is not a unique
identifier of the student as it only has 650 unique values. It is
intended to be a unique identifier within a school but it is not. To
show this we create a factor that is the interaction of school and
student then drop unused levels.
<>=
Exam <- within(Exam, ids <- factor(school:student))
str(Exam)
@
Notice that there are 4059 observations but only 4055 unique levels of
student within school. We can check the ones that are duplicated
<>=
as.character(Exam$ids[which(duplicated(Exam$ids))])
@
One of these cases
<>=
subset(Exam, ids == '43:86')
xtabs(~ sex + school, Exam, subset = school %in% c(43, 50, 52), drop = TRUE)
@
is particularly interesting. Notice that one of the students
numbered 86 in school 43 is the only male student out of 61 students
from this school who took the exam. It is quite likely that this
student's score was attributed to the wrong school and that the school
is in fact a girls-only school, not a mixed-sex school.
The causes of the other three cases of duplicate student numbers
within a school are not as clear. It would be necessary to go back
to the original data records to check these.
The cross-tabulation of the students by sex and school for the
mixed-sex schools
<>=
xtabs(~ sex + school, Exam, subset = type == "Mxd", drop = TRUE)
@
shows another anomaly. School 47 is similar to school 43 in that,
although it is classified as a mixed-sex school, 81 male students
and only one female student took the exam. It is likely that the
school was misrecorded for this one female student and the school is a
male-only school.
Another school is worth noting. There were only eight students from
school 54 who took the exam so any within-school estimates from this
school will be unreliable.
A mosaic plot (Figure~\ref{fig:ExamMosaic}) produced with
<>=
ExamMxd <- within(subset(Exam, type == "Mxd"), school <- factor(school))
mosaicplot(~ school + sex, ExamMxd)
@
helps to detect mixed-sex schools with unusually large or unusually small ratios
of females to males taking the exam.
<>=
ExamMxd <- within(subset(Exam, type == "Mxd"), school <- factor(school))
mosaicplot(~ school + sex, ExamMxd)
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-ExamMosaic}
\caption{A mosaic plot of the sex distribution by school. The areas
of the rectangles are proportional to the number of students of
that sex from that school who took the exam. Schools with an
unusally large or unusually small ratio or females to males are
highlighted.}
\label{fig:ExamMosaic}
\end{figure}
\subsubsection{Preliminary graphical displays}
\label{sec:Graphical}
In addition to the pretest score (\code{standLRT}), the predictor
variables used in this model are the student's sex and the school
gender, which is coded as having three levels. There is some
redundancy in these two variables in that all the students in a
boys-only school must be male. For graphical exploration we convert
from \code{schgend} to \code{type}, an indicator of whether the school
is a mixed-sex school or a single-sex school, and plot the response
versus the pretest score for each combination of sex and school type.
<>=
print(xyplot(normexam ~ standLRT | sex * type, Exam,
type = c("g", "p", "smooth"), layout = c(2,2),
xlab = "Standardized London Reading Test score",
ylab = "Normalized exam score", aspect = 1.2))
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-Examplot1}
\caption{Normalized exam score versus pretest
(Standardized London Reading Test) score for 4095 students from 65 schools
in inner London. The panels on the left show the male students'
scores; those on the right show the females' scores. The top row
of panels shows the scores of students in single-sex schools and
the bottom row shows the scores of students in mixed-sex
schools. A scatterplot smoother line for each panel has been added
to help visualize the trend.}
\label{fig:Examplot1}
\end{figure}
This plot is created with the \code{xyplot} from the \code{lattice}
package as (essentially)
<>=
xyplot(normexam ~ standLRT | sex * type, Exam, type = c("g", "p", "smooth"))
@
The formula would be read as ``plot \code{normexam} by \code{standLRT}
given \code{sex} by (school) \code{type}''. A few other arguments were
added in the actual call to make the axis annotations more readable.
Figure~\ref{fig:Examplot1} shows the even after accounting for a
student's sex, pretest score and school type, there is considerable
variation in the response. We may attribute some of this variation to
differences in schools but the fitted model indicates that most of the
variation is unaccounted or ``residual'' variation.
In some ways the high level of residual variation obscures the pattern
in the data. By removing the data points and overlaying the
scatterplot smoothers we can concentrate on the relationships between
the covariates.
<>=
print(xyplot(normexam ~ standLRT, Exam, groups = sex:type,
type = c("g", "smooth"), xlim = c(-3,3), ylim = c(-2,2),
xlab = "Standardized London Reading Test score",
ylab = "Normalized exam score",
auto.key = list(space = 'right', points = FALSE, lines = TRUE),
aspect = 1))
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-Examplot2}
\caption{Overlaid scatterplot smoother lines of the normalized test
scores versus the pretest (Standardized London Reading Test) score
for female (F) and male (M) students in single-sex (Sngl) and
mixed-sex (Mxd) schools.}
\label{fig:Examplot2}
\end{figure}
The call to \code{xyplot} is essentially
<>=
xyplot(normexam ~ standLRT, Exam, groups = sex:type, type = c("g", "smooth"))
@
Figure~\ref{fig:Examplot2} is a remarkable plot in that it shows
nearly a perfect ``main effects'' relationship of the response with
the three covariates and almost no interaction. It is rare to see
real data follow a simple theoretical relationship so closely.
To elaborate, we can see that for each of the four groups the smoothed
relationship between the exam score and the pretest score is close to
a straight line and that the lines are close to being parallel. The
only substantial deviation is in the smoothed relationship for the
males in single-sex schools and this is the group with the fewest
observations and hence the least precision in the estimated
relationship. The lack of parallelism for this group is most apparent
in the region of extremely low pretest scores where there are few
observations and a single student who had a low pretest score and a
moderate post-test score can substantially influence the curve. Five
or six such points can be seen in the upper left panel of
Figure~\ref{fig:Examplot1}.
We should also notice the ordering of the lines and the spacing
between the lines. The smoothed relationships for students in
single-sex schools are consistently above those in the mixed-sex
schools and, except for the region of low pretest scores described
above, the relationship for the females in a given type of school is
consistently above that for the males. Furthermore the distance
between the female and male lines in the single-sex schools is
approximately the same as the corresponding distance in the mixed-sex
schools. We would summarize this by saying that there is a positive
effect for females versus males and a positive effect for single-sex
versus mixed-sex and no indication of interaction between these
factors.
\subsubsection{The effect of schools}
\label{sec:schoolEffects}
We can check for patterns within and between schools by plotting the
response versus the pretest by school. Because there appear to be
differences in this relationship for single-sex versus mixed-sex
schools and for females versus males we consider these separately.
<>=
print(xyplot(normexam ~ standLRT | school, Exam,
type = c("g", "p", "r"),
xlab = "Standardized London Reading Test score",
ylab = "Normalized exam score",
subset = sex == "F" & type == "Sngl"))
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-Examplot3}
\caption{Normalized exam scores versus pretest (Standardized London
Reading Test) score by school for female students in single-sex
schools.}
\label{fig:Examplot3}
\end{figure}
In Figure~\ref{fig:Examplot3} we plot the normalized exam scores
versus the pretest score by school for female students in single-sex
schools. The plot is produced as
<>=
xyplot(normexam ~ standLRT | school, Exam,
type = c("g", "p", "r"),
subset = sex == "F" & type == "Sngl")
@
The \code{"r"} in the \code{type} argument adds a simple linear
regression line to each panel.
The first thing we notice in Figure~\ref{fig:Examplot3} is that
school 48 is an anomaly because only two students in this school took
the exam. Because within-school results based on only two students
are unreliable, we will exclude this school from further plots (but we
do include these data when fitting comprehensive models).
Although the regression lines on the panels can help us to look for
variation in the schools, the ordering of the panels is, for our
purposes, random. We recreate this plot in Figure~\ref{fig:Examplot4} using
<>=
xyplot(normexam ~ standLRT | school, Exam, type = c("g", "p", "r"),
subset = sex == "F" & type == "Sngl" & school != 48,
index.cond = function(x, y) coef(lm(y ~ x))[1])
@
<>=
print(xyplot(normexam ~ standLRT | school, Exam,
type = c("g", "p", "r"),
xlab = "Standardized London Reading Test score",
ylab = "Normalized exam score",
subset = sex == "F" & type == "Sngl" & school != 48,
index.cond = function(x, y) coef(lm(y ~ x))[1]))
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-Examplot4}
\caption{Normalized exam scores versus pretest (Standardized London
Reading Test) score by school for female students in single-sex
schools. School 48 where only two students took the exam has been
eliminated and the panels have been ordered by increasing
intercept (predicted normalized score for a pretest score of 0) of
the regression line.}
\label{fig:Examplot4}
\end{figure}
so that the panels are ordered (from left to right starting at the
bottom row) by increasing intercept for the regression line (i.e.{} by
increasing predicted exam score for a student with a pretest score of 0).
Alternatively, we could order the panels by increasing slope of the
within-school regression lines, as in Figure~\ref{fig:Examplot4a}.
<>=
print(xyplot(normexam ~ standLRT | school, Exam,
type = c("g", "p", "r"),
xlab = "Standardized London Reading Test score",
ylab = "Normalized exam score",
subset = sex == "F" & type == "Sngl" & school != 48,
index.cond = function(x, y) coef(lm(y ~ x))[2]))
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-Examplot4a}
\caption{Normalized exam scores versus pretest (Standardized London
Reading Test) score by school for female students in single-sex
schools. School 48 has been eliminated and the panels have been
ordered by increasing slope of the within-school regression
lines.}
\label{fig:Examplot4a}
\end{figure}
Although it is informative to plot the within-school regression lines
we need to assess the variability in the estimates of the coefficients
before concluding if there is ``significant'' variability between
schools. We can obtain the individual regression fits with the
\code{lmList} function
<>=
show(ExamFS <- lmList(normexam ~ standLRT | school, Exam,
subset = sex == "F" & type == "Sngl" & school != 48))
@
and compare the confidence intervals on these coefficients.
<>=
plot(confint(ExamFS, pool = TRUE), order = 1)
@
<>=
print(plot(confint(ExamFS, pool = TRUE), order = 1))
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-Examplot4c}
\caption{Confidence intervals on the coefficients of the
within-school regression lines for female students in single-sex
schools. School 48 has been eliminated and the schools have been
ordered by increasing estimated intercept.}
\label{fig:Examplot4c}
\end{figure}
<>=
print(xyplot(normexam ~ standLRT | school, Exam,
type = c("g", "p", "r"),
xlab = "Standardized London Reading Test score",
ylab = "Normalized exam score",
subset = sex == "M" & type == "Sngl", layout = c(5,2),
index.cond = function(x, y) coef(lm(y ~ x))[1]))
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-Examplot5}
\caption{Normalized exam scores versus pretest (Standardized London
Reading Test) score by school for male students in single-sex
schools.}
\label{fig:Examplot5}
\end{figure}
<>=
show(ExamMS <- lmList(normexam ~ standLRT | school, Exam,
subset = sex == "M" & type == "Sngl"))
@
The corresponding plot of the confidence intervals is shown in
Figure~\ref{fig:Examplot5b}.
<>=
print(plot(confint(ExamMS, pool = TRUE), order = 1))
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-Examplot5b}
\caption{Confidence intervals on the coefficients of the
within-school regression lines for female students in single-sex
schools. School 48 has been eliminated and the schools have been
ordered by increasing estimated intercept.}
\label{fig:Examplot5b}
\end{figure}
For the mixed-sex schools we can consider the effect of the pretest
score and sex in the plot (Figure~\ref{fig:Examplot6}) and in the
<>=
print(xyplot(normexam ~ standLRT | school, Exam, groups = sex,
type = c("g", "p", "r"),
xlab = "Standardized London Reading Test score",
ylab = "Normalized exam score",
subset = !school %in% c(43, 47) & type == "Mxd",
index.cond = function(x, y) coef(lm(y ~ x))[1],
auto.key = list(space = 'top', lines = TRUE,
columns = 2), layout = c(7,5),
aspect = 1.2))
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-Examplot6}
\caption{Normalized exam scores versus pretest
score by school and sex for students in mixed-sex
schools.}
\label{fig:Examplot6}
\end{figure}
separate model fits for each school.
<>=
show(ExamM <- lmList(normexam ~ standLRT + sex| school, Exam,
subset = type == "Mxd" & !school %in% c(43,47,54)))
@
The confidence intervals for these separately fitted models,
<>=
print(plot(confint(ExamM, pool = TRUE), order = 1))
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-Examplot6b}
\caption{Confidence intervals on the coefficients of the
within-school regression lines for female students in single-sex
schools. School 48 has been eliminated and the schools have been
ordered by increasing estimated intercept.}
\label{fig:Examplot6b}
\end{figure}
shown in Figure~\ref{fig:Examplot6b} indicate differences in the
intercepts and possibly differences in the slopes with respect to the
pretest scores. However, there is not a strong indication of
variation by school in the effect of sex.
\subsection{Multilevel models for the exam data}
\label{sec:ExamModels}
We begin with a model that has a random effects for the intercept by
school plus additive fixed effects for the pretest score, the
student's sex and the school type.
<>=
(Em3 <- lmer(normexam ~ standLRT + sex + type + (1|school), Exam))
@
Our data exploration indicated that the slope with respect to the
pretest score may vary by school. We can fit a model with random
effects by school for both the slope and the intercept as
<>=
(Em4 <- lmer(normexam ~ standLRT + sex + type + (standLRT|school), Exam))
@
and compare this fit to the previous fit with
<>=
anova(Em3, Em4)
@
There is a strong evidence of a significant random effect for the
slope by school, whether judged by AIC, BIC or the p-value for the
likelihood ratio test.
The p-value for the likelihood ratio test is based on a $\chi^2$ distribution
with degrees of freedom calculated as the difference in the number of
parameters in the two models. Because one of the parameters
eliminated from the full model in the submodel is at its boundary the
usual asymptotics for the likelihood ratio test do not apply.
However, it can be shown that the p-value quoted for the test is
conservative in the sense that it is an upper bound on the
p-value that would be calculated say from a parametric bootstrap.
Having an upper bound of $1.9\times 10^{-10}$ on the p-value can be
regarded as ``highly significant'' evidence of the utility of the
random effect for the slope by school.
We could also add a random effect for the student's sex by school
<>=
(Em5 <- lmer(normexam ~ standLRT + sex + type + (standLRT + sex|school), Exam))
@
Notice that the estimate of the variance of the \code{sexM} term is
essentially zero so there is no need to test the significance of this
variance component. We proceed with the analysis of \code{Em4}.
\subsection{Markov Chain Monte Carlo methods for assessing parameter variation}
\label{sec:MCMC}
An important part of the analysis of a statistical model is the
assessment of the precision of parameter estimates, either through
confidence intervals and confidence regions or through hypothesis
tests on the parameters. Sometimes such information is condensed and
given as the parameter estimate and a standard error of the estimate,
with the assumption that a confidence interval will be formed as the
estimate plus/minus some multiple of the standard error.
Summarizing the variability in parameter estimates with such symmetric
intervals is appropriate if the distribution of the parameters can
be assumed to be reasonably symmetric but this is not the case for
variance components. In general we expect the distribution of a
variance component to be approximately $\chi^2$ and, depending upon the
precision with which the component is estimated, such a distribution
can be quite asymmetric.
To examine the distribution of the parameter estimates we can use a
parametric bootstrap or adopt a Bayesian approach and generate a
sample from the posterior distribution of the parameters using a
technique called Markov Chain Monte Carlo sampling. The
\code{mcmcsamp} function applied to a fitted \code{lmer} model
generates such a sample.
<>=
#Es4 <- mcmcsamp(Em4, 10000, trans = FALSE)
@
The parameter estimates in this sample are the fixed effects, the
variance of the random per-observation noise, $\sigma^2$, the variance
of the intercept random effect, the variance of the slope random
effect and their covariance. A plot of the estimated posterior
densities of each of the parameters (Figure~\ref{fig:Examplot7})
<>=
#print(densityplot(Es4, plot.points = FALSE, ylab = "",
# scales = list(x = list(axs = "i"))))
@
\begin{figure}[tbp]
\centering
% \includegraphics[width=\textwidth]{figs/SoftRev-Examplot7}
\caption{Plots of the estimated posterior probability densities from
a Markov Chain Monte Carlo sample of the parameters in the model
\code{Em4}.}
\label{fig:Examplot7}
\end{figure}
shows that the posterior densities of the fixed effects are indeed
quite symmetric and close to a normal (or Gaussian) distribution. The
posterior density of $\sigma^2$ and the covariance parameter are
reasonably symmetric but the posterior densities of the variances of
the random effects are not.
A better way of assessing possible asymmetry is to use normal
probability plots of the samples from each of the parameters, as in
Figure~\ref{fig:Examplot8}.
<>=
#print(qqmath(Es4, ylab = "",
# xlab = "Quantiles of a standard normal distribution",
# scales = list(y = list(relation = "free"),
# x = list(axs = "i")), type = c("g", "p"),
# pch = ".", cex = 1.5))
@
\begin{figure}[tbp]
\centering
% \includegraphics[width=\textwidth]{figs/SoftRev-Examplot8}
\caption{Normal probability plots of the Markov Chain Monte Carlo
samples from the posterior densities of the parameters in the
model \code{Em4}.}
\label{fig:Examplot8}
\end{figure}
We can see from Figure~\ref{fig:Examplot8} that the posterior
distributions of the fixed effects are remarkably close to the normal
(or Gaussian) distribution. Except for a few points in each panel,
these panels show almost perfect linearity. The few points that do
deviate from the straight line in each case represent fewer than
a dozen samples out of a total of 10,000 and are not a cause for concern.
The posterior distribution of $\sigma^2$ is satisfactorily close to a
normal distribution. As we will see, it is not always the case that
this parameter is close to normally distributed. It happens that in
this example $\sigma^2$ is very precisely determined and possible
asymmetry in the distribution will not be noticeable over such a small
range of values.
There is noticeable asymmetry in the distribution of the variances of
the random effects and perhaps some asymmetry in the distribution of
the covariance of the intercept and slope random effects.
Using this sample we can examine the distribution of functions of
these parameters. For example, \citep{box73:_bayes_infer_statis_analy} advocate
approximating the distribution of the logarithm of a variance rather
than the variance itself. For a covariance we first convert it to a
correlation then apply Fisher's z transformation (the hyperbolic
arc-tangent) to the correlation. The density plots and normal
probability plots of these transformed parameters are shown in
Figures~\ref{fig:Examplot9} and \ref{fig:Examplot10}.
<>=
#Es4a <- Es4[,5:8]
#Es4a[,4] <- atanh(Es4a[,4]/sqrt(Es4a[,2]*Es4a[,3]))
#for (i in 1:3) Es4a[,i] <- log(Es4a[,i])
#print(densityplot(Es4a, plot.points = FALSE, ylab = "",
# scales = list(x = list(axs = "i"))))
@
\begin{figure}[tbp]
\centering
% \includegraphics[width=\textwidth]{figs/SoftRev-Examplot9}
\caption{Estimated density plots of the Markov Chain Monte Carlo
samples from the posterior densities of the variance and
covariance parameters in model \code{Em4}. The variance
parameters are on the log scale and the covariance is expressed as
Fisher's z transformation of the correlation.}
\label{fig:Examplot9}
\end{figure}
<>=
#print(qqmath(Es4a,
# ylab = "",
# xlab = "Quantiles of a standard normal distribution",
# layout = c(2,2),
# scales = list(y = list(relation = "free"),
# x = list(axs = "i")), type = c("g", "p"), pch = ".",
# cex = 1.5))
@
\begin{figure}[tbp]
\centering
% \includegraphics[width=\textwidth]{figs/SoftRev-Examplot10}
\caption{Normal probability plots of the Markov Chain Monte Carlo
samples from the posterior densities of the variance and
covariance parameters in model \code{Em4}. The variance
parameters are on the log scale and the covariance is expressed as
Fisher's z transformation of the correlation.}
\label{fig:Examplot10}
\end{figure}
\section{Three-level Normal Models}
\label{sec:three-level}
<>=
lmer(score ~ gcsecnt + (1|school) + (1|lea), Chem97)
@
These results are from the 1997 A-level Chemistry exam. The
\code{school} is nested in \code{lea} (local education authority) and
has unique levels for each of the 2410 schools. It is a good practice
to make the nesting explicit by specifying the grouping factors as the
`outer' factor, \code{lea} in this case, and the interaction of the
outer and inner factors, \code{lea:school} or \code{school:lea} in
this case. This will ensure unique levels for each \code{school}
within \code{lea} combination.
To fit the model \code{mC2} we increase the number of EM iterations
from its default of 20 to 40. Without this change the current version
of the \code{optim} function in \RR{} will declare convergence to an
incorrect optimum. By increasing the number of EM iterations we are
able to get closer to the optimum before calling \code{optim} and
converge to the correct value. The optim function will be patched so
this change will not be needed in future versions of \RR{}.
Data from the 1997 A-level Chemistry exam are available as \code{Chem97}.
<>=
str(Chem97)
system.time(mC1 <- lmer(score ~ 1+(1|lea:school) + (1|lea), Chem97))
summary(mC1)
system.time(mC2 <- lmer(score ~ gcsecnt + (1|school) + (1|lea), Chem97))
summary(mC2)
@
\section{Two-level models for binary data}
\label{sec:TwolevelBinary}
When the response variable is binary or when it represents a count we
frequently model the data with a \emph{generalized linear model} (glm)
or, if we use random effects in the model, a \emph{generalized linear
mixed model} (glmm). Determining maximum likelihood estimates of
the parameters in such a model is not as easy as for the linear mixed
model because the likelihood for a glmm is expressed as an integral
that must be approximated.
The \code{lmer} function has provision for fitting such models using
one of three approximations. The simplest approximation
is called \emph{penalized quasi-likelihood} (PQL). This method is
generally quite fast but also the least accurate of the three. The
Laplace approximation is more accurate than PQL and the most accurate
approximation is called \emph{adaptive Gauss-Hermite quadrature}
(AGQ). At present AGQ can only be used on models that have
one-dimensional random effects associated with a single
grouping factor.
\subsection{The contraception use data }
\label{sec:Contraception}
A fertility survey of women in Bangladesh included as a response their
use of artificial contraception. Some of the covariates included the
woman's age (on a centered scale), the number of live children she
had, whether she lived in an urban or rural setting, and the district
in which she lived.
<>=
str(Contraception)
@
\subsection{Fitting the model used in the review}
\label{sec:Reviewmodel}
<>=
glmer(use ~ urban+age+livch+(1|district), Contraception, binomial)
@
The ``Software reviews of multilevel modeling packages'' fit a simple
model that includes fixed effects for \code{urban}, \code{age}, and
\code{livch} and a simple additive random effect by district. We
reproduce this fit as
%% keep.source=FALSE: suppress the R comment
<>=
system.time(Cm1 <- lmer(use ~ age+urban+livch+(1|district),
Contraception, binomial))
Cm1
system.time(Cm3 <- lmer(use ~ age+urban+livch+(1|district),
Contraception, binomial, nAGQ = 5))
Cm3
system.time(Cm4 <- lmer(use ~ age+urban+livch+(urban|district),
Contraception, binomial))
Cm4
@
\subsection{Data exploration}
\label{sec:ContraExplor}
As with the \code{Exam} data, we examine several data plots when
formulating a model for these data. Because the response is either
\code{"Y"} or \code{"N"}, plots of the data points themselves tend to
be uniformative because of overplotting. However, with a continuous
covariate such as \code{age} it is useful to plot the scatterplot
smoother line for the response versus the covariate to see the trend
of the probability.
The model being fit assumes that the logistic
transformation of the probability of a woman using artificial
contraception is approximately linear in \code{age} given her
urban/rural status and the number of live children she has.
Figure~\ref{fig:Contra1}, produced by
<>=
print(xyplot(ifelse(use == "Y", 1, 0) ~ age|urban, Contraception,
groups = livch, type = c("g", "smooth"),
auto.key = list(space = "top", points = FALSE,
lines = TRUE, columns = 4),
ylab = "Proportion", xlab = "Centered age"))
@
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figs/SoftRev-Contra1}
\caption{Scatterplot smoother curves of the use of artificial
contraception versus age for women in the Bangladesh fertility
survey. The panel on the left depicts the proportion for rural
women and the panel on the right depicts the proportion for urban
women.}
\label{fig:Contra1}
\end{figure}
shows that this is not the case. Indeed the proportion of women using
artificial contraception is more like a quadratic in age. Very young
women and older women are less likely to use artificial contraception.
We also see that the differences according to the number of live
children are predominantly differences between those women who have
live children and those who don't.
\subsection{Reformulated models}
\label{sec:ContraReformulated}
We include a quadratic term in \code{age}
<>=
(Cm6 <- lmer(use ~ age + I(age^2) + urban + livch + (1|district),
Contraception, binomial))
anova(Cm1, Cm6)
@
and we see that, as indicated by the plots, the quadratic term is highly significant.
We can check if the differences in the number of live children is
primarily the difference between no children and any children by
fitting a model with the indicator of any children as one of the
terms in the fixed effects.
<>=
Contraception <- within(Contraception, ch <- factor(ifelse(livch == 0, "N", "Y")))
(Cm7 <- lmer(use ~ age + I(age^2) + urban + ch + (1|district),
Contraception, binomial))
anova(Cm6, Cm7)
@
Based on the change in the log-likelihood the simpler model is
adequate. We check for an interaction with the \code{urban} factor.
<>=
(Cm8 <- lmer(use ~ age + I(age^2) + urban * ch + (1|district),
Contraception, binomial))
@
The interaction is not significant.
We can check if there is significant variation by district in the
effect of urban versus rural or in the effect of any children versus
no children by fitting further models
<>=
(Cm9 <- lmer(use ~ age + I(age^2) + urban + ch + (urban|district),
Contraception, binomial))
anova(Cm7, Cm9)
(Cm10 <- lmer(use ~ age + I(age^2) + urban + ch + (ch|district),
Contraception, binomial))
anova(Cm7, Cm10)
@
Variation due to district in the effect of any children versus no
children does not appear to be significant but there is a significant
variation in the effect of urban versus rural. It is interesting that
AIC (Akaike's Information Criterion) and BIC (Schwartz's Bayesian
Information Criterion) lead to different conclusions in the comparison
of \code{Cm7} versus \code{Cm9}. For both these criteria the
preferred model is the one with the lower value of the criterion.
Hence, AIC prefers \code{Cm9} and BIC prefers \code{Cm7}. The BIC
criterion puts a heavier penalty on having a greater number of parameters.
At present the code for creating a Markov Chain Monte Carlo sample
from the distribution of the parameters in a generalized linear mixed
model does not allow for random effects of dimension greater than one
so we produce a sample from the posterior distribution of the
parameters in model \code{Cm11}
<>=
(Cm11 <- lmer(use ~ age + I(age^2) + urban + ch + (1|district), Contraception, binomial))
#Cs11 <- mcmcsamp(Em4, 10000, trans = FALSE) # mcmcsamp for GLMMs in supernodal representation not yet implemented
@
Density plots (Figure~\ref{fig:Contra2})
<>=
#print(densityplot(Cs11, plot.points = FALSE, ylab = "",
# scales = list(x = list(axs = "i"))))
@
\begin{figure}[tbp]
\centering
% \includegraphics[width=\textwidth]{figs/SoftRev-Contra2}
\caption{Plots of the estimated posterior probability densities from
a Markov Chain Monte Carlo sample of the parameters in the model
\code{Cm11} fit to the \code{Contraception} data.}
\label{fig:Contra2}
\end{figure}
and normal probability plots (Figure~\ref{fig:Contra3})
<>=
#print(qqmath(Cs11,
# ylab = "",
# xlab = "Quantiles of a standard normal distribution",
# scales = list(y = list(relation = "free"),
# x = list(axs = "i")), type = c("g", "p"),
# pch = ".", cex = 1.5))
@
\begin{figure}[tbp]
\centering
% \includegraphics[width=\textwidth]{figs/SoftRev-Contra3}
\caption{Normal probability plots of the Markov Chain Monte Carlo
samples from the posterior densities of the parameters in the
model \code{Cm11} fit to the \code{Contraception} data.}
\label{fig:Contra3}
\end{figure}
for this sample show that the fixed-effects
parameters are very well approximated by a normal distribution but the
variance of the random effects shows noticeable skewness.
\section{Growth curve model for repeated measures data}
\label{sec:GrowthCurve}
<>=
str(Oxboys)
system.time(mX1 <- lmer(height ~ age + I(age^2) + I(age^3) + I(age^4) + (age + I(age^2)|Subject),
Oxboys))
summary(mX1)
system.time(mX2 <- lmer(height ~ poly(age,4) + (age + I(age^2)|Subject), Oxboys))
summary(mX2)
@
\section{Cross-classification model}
\label{sec:CrossClassified}
<>=
str(ScotsSec)
system.time(mS1 <- lmer(attain ~ sex + (1|primary) + (1|second), ScotsSec))
summary(mS1)
@
\section{Session Info}
<>=
toLatex(sessionInfo())
@
\bibliography{MlmSoftRev}
\end{document}