11 |
fontsize=\scriptsize} |
fontsize=\scriptsize} |
12 |
%%\VignetteIndexEntry{Implementation Details} |
%%\VignetteIndexEntry{Implementation Details} |
13 |
%%\VignetteDepends{Matrix} |
%%\VignetteDepends{Matrix} |
14 |
|
%%\VignetteDepends{lattice} |
15 |
%%\VignetteDepends{lme4} |
%%\VignetteDepends{lme4} |
16 |
\begin{document} |
\begin{document} |
17 |
\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,strip.white=TRUE} |
\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,strip.white=TRUE} |
34 |
<<preliminaries,echo=FALSE,print=FALSE>>= |
<<preliminaries,echo=FALSE,print=FALSE>>= |
35 |
library(lattice) |
library(lattice) |
36 |
library(Matrix) |
library(Matrix) |
37 |
library(lme4) |
#library(lme4) |
38 |
data("Early", package = "mlmRev") |
#data("Early", package = "mlmRev") |
39 |
options(width=80, show.signif.stars = FALSE, |
options(width=80, show.signif.stars = FALSE, |
40 |
lattice.theme = function() canonical.theme("pdf", color = FALSE)) |
lattice.theme = function() canonical.theme("pdf", color = FALSE)) |
41 |
@ |
@ |
44 |
|
|
45 |
General formulae for the evaluation of the profiled log-likelihood and |
General formulae for the evaluation of the profiled log-likelihood and |
46 |
profiled log-restricted-likelihood in a linear mixed model are given |
profiled log-restricted-likelihood in a linear mixed model are given |
47 |
in \citet{bate:debr:2004} and the use of a sparse matrix |
in \citet{bate:debr:2004}. Here we describe a more general |
48 |
representation for such models is described in \citet{bate:2004}. The |
formulation of the model using sparse matrix decompositions and the |
49 |
purpose of this paper is to describe the details of the implementation |
implementation of these methods in the \code{lmer} function for \RR{}. |
|
of this representation and those computational methods in the \code{lme4} |
|
|
package for \RR{}. |
|
|
|
|
|
Because we concentrate on the computational methods and the |
|
|
representation, the order and style of presentation will be based on |
|
|
the sequence of calculations, not on the sequence in which the results |
|
|
would be derived. We will emphasize ``what'' and not ``why''. For |
|
|
the ``why'', refer to the papers cited above. |
|
50 |
|
|
51 |
In \S\ref{sec:Form} we describe the form and representation of the |
In \S\ref{sec:Form} we describe the form and representation of the |
52 |
model. The calculation of the criteria to be optimized by the |
model. The calculation of the criteria to be optimized by the |
66 |
\label{eq:lmeGeneral} |
\label{eq:lmeGeneral} |
67 |
\by=\bX\bbeta+\bZ\bb+\beps\quad |
\by=\bX\bbeta+\bZ\bb+\beps\quad |
68 |
\beps\sim\mathcal{N}(\bzer,\sigma^2\bI), |
\beps\sim\mathcal{N}(\bzer,\sigma^2\bI), |
69 |
\bb\sim\mathcal{N}(\bzer,\sigma^2\bSigma^{-1}), |
\bb\sim\mathcal{N}(\bzer,\bSigma), |
70 |
\beps\perp\bb |
\beps\perp\bb |
71 |
\end{equation} |
\end{equation} |
72 |
where $\by$ is the $n$-dimensional response vector, $\bX$ is an |
where $\by$ is the $n$-dimensional response vector, $\bX$ is an |
73 |
$n\times p$ model matrix for the $p$ dimensional fixed-effects vector |
$n\times p$ model matrix for the $p$ dimensional fixed-effects vector |
74 |
$\bbeta$, $\bZ$ is the $n\times q$ model matrix for the |
$\bbeta$, $\bZ$ is the $n\times q$ model matrix for the $q$ |
75 |
$q$ dimensional random-effects vector $\bb$ that has a Gaussian |
dimensional random-effects vector $\bb$, which has a Gaussian |
76 |
distribution with mean $\bzer$ and relative precision matrix $\bOmega$ |
distribution with mean $\bzer$ and variance-covariance matrix |
77 |
(i.e., $\bOmega$ is the precision of $\bb$ relative to the precision |
$\bSigma$, and $\beps$ is the random noise assumed to have a spherical |
78 |
of $\beps$), and $\beps$ is the random noise assumed to have a |
Gaussian distribution. The symbol $\perp$ indicates independence of |
79 |
spherical Gaussian distribution. The symbol $\perp$ indicates |
random variables. |
|
independence of random variables. |
|
80 |
|
|
81 |
We will assume that $\bX$ has full column rank and that $\bSigma$ is |
We will assume that $\bX$ has full column rank and that $\bSigma$ is |
82 |
positive definite. |
positive definite. |
83 |
|
|
84 |
|
\subsection{Structure of the variance-covariance matrix} |
|
\subsection{Structure of $\bSigma$} |
|
85 |
\label{sec:sigmaStructure} |
\label{sec:sigmaStructure} |
86 |
|
|
87 |
Components of the random effects vector $\bb$ and portions of its |
Components of the random effects vector $\bb$ and portions of its |
133 |
|
|
134 |
This called the ``relative'' precision because it is precision of |
This called the ``relative'' precision because it is precision of |
135 |
$\bb$ ($\bSigma^{-1}$) relative to the precision of $\beps$ |
$\bb$ ($\bSigma^{-1}$) relative to the precision of $\beps$ |
136 |
($\bI/\left(\sigma^2\right)$). |
($\sigma^{-2}\bI$). |
137 |
|
|
138 |
It is easy to establish that $\bOmega$ will have a repeated |
It is easy to establish that $\bOmega$ will have a repeated |
139 |
block/block diagonal structure like that of $\bSigma$. That is, |
block/block diagonal structure like that of $\bSigma$. That is, |
143 |
the inner diagonal blocks in the $i$th outer block is a copy of the |
the inner diagonal blocks in the $i$th outer block is a copy of the |
144 |
$q_i\times q_i$ positive-definite, symmetric matrix $\bOmega_i$. |
$q_i\times q_i$ positive-definite, symmetric matrix $\bOmega_i$. |
145 |
|
|
146 |
Because $\bOmega$ has a repeated block/block structure we define |
Because $\bOmega$ has a repeated block/block structure we can define |
147 |
the entire matrix if we specify the symmetric matrices $\bOmega_i, i = |
the entire matrix by specifying the symmetric matrices $\bOmega_i, i = |
148 |
1,\dots,k$ and, because of the symmetry, $\bOmega_i$ has at most |
1,\dots,k$ and, because of the symmetry, $\bOmega_i$ has at most |
149 |
$q_i(q_i+1)/2$ distinct elements. We will write $\btheta$ for a |
$q_i(q_i+1)/2$ distinct elements. We will write $\btheta$ for a |
150 |
parameter vector of length at most $\sum_{i=1}^{k}q_i(q_i+1)/2$ that |
parameter vector of length at most $\sum_{i=1}^{k}q_i(q_i+1)/2$ that |
163 |
pointers, stored in the \code{Gp} slot. Successive differences of the |
pointers, stored in the \code{Gp} slot. Successive differences of the |
164 |
group pointers are the total number of components of $\bb$ associated |
group pointers are the total number of components of $\bb$ associated |
165 |
with the $i$th grouping factor. That is, these differences are $n_i |
with the $i$th grouping factor. That is, these differences are $n_i |
166 |
q_i,i=1,\dots,k$. |
q_i,i=1,\dots,k$. The first element of the \code{Gp} slot is always 0. |
167 |
|
|
168 |
|
|
169 |
\subsection{Examples} |
\subsection{Examples} |
179 |
Mm1 <- lmer(math ~ gr+sx*eth+cltype+(yrs|id)+(1|tch)+(yrs|sch), star, |
Mm1 <- lmer(math ~ gr+sx*eth+cltype+(yrs|id)+(1|tch)+(yrs|sch), star, |
180 |
control = list(niterEM = 0, gradient = FALSE)) |
control = list(niterEM = 0, gradient = FALSE)) |
181 |
@ |
@ |
|
Model \code{Sm1} has a single grouping factor with $18$ levels |
|
|
|
|
|
|
|
|
\section{Likelihood and restricted likelihood} |
|
|
\label{sec:likelihood} |
|
|
|
|
|
given grouping factor have the same variance-covariance matrix, say |
|
|
For convenience we assume that components of $\bb$ are ordered |
|
|
the components associated with grouping factor $\bbf_i$ are first, |
|
|
then those associated with grouping $\bbf_2$ are next, and so on. |
|
|
|
|
|
The random effects vector $\bb$ and the columns of $\bZ$ are divided |
|
|
into $k$ outer blocks associated with grouping factors $\bff_i, |
|
|
i=1,\dots,k$ in the data. Because so many of the expressions that we |
|
|
will use involve the grouping factors and quantities associated with |
|
|
them, we will reserve the letter $i$ to index the grouping factors and |
|
|
related quantities. We will not explicitly state the range of $i$, |
|
|
which will always be $i=1,\dots,k$. |
|
|
|
|
|
The outer blocks in $\bb$ and the columns of $\bZ$ are further |
|
|
subdivided into $m_i$ inner blocks of size $q_i$ where $m_i$ is the |
|
|
number of levels of $\bff_i$ (i.e.{} the number of distinct values in |
|
|
$\bff_i$). Each grouping factor has associated with it an $n\times |
|
|
q_i$ model matrix $\bZ_i$. The full model matrix $\bZ$ is derived |
|
|
from the grouping factors $\bff_i$ and these submodel matrices |
|
|
$\bZ_i$. |
|
|
|
|
|
Random effects in different outer blocks are independent. Within each |
|
|
of these blocks, the inner blocks of random effects are independent |
|
|
and identically distributed with mean $\bzer$ and $q_i\times q_i$ |
|
|
variance-covariance matrix $\sigma^2\bOmega_i\inv$. Thus $\bOmega$ is |
|
|
block diagonal in $k$ blocks of size $m_i q_i\times m_i q_i$ and each |
|
|
of these blocks is itself block diagonal in $m_i$ blocks, each of |
|
|
which is the positive definite symmetric $q_i\times q_i$ matrix |
|
|
$\bOmega_i$. We call this a \emph{repeated block/block diagonal} |
|
|
matrix. |
|
|
|
|
|
\subsection{Model specification and representation} |
|
|
\label{sec:Representation} |
|
|
|
|
|
Like most model fitting functions in the S language, the \code{lme} |
|
|
function in the \code{lme4} package uses a formula/data specification. |
|
|
The formula specifies how to evaluate the response $\by$ and the |
|
|
fixed-effects model matrix $\bX$ in the data frame provided as the |
|
|
\code{data} argument. An additional argument named \code{random} |
|
|
specifies the names of the grouping factors and the formulae for |
|
|
evaluation of the model matrices $\bZ_i$. |
|
|
|
|
|
Standard optional arguments, such as \code{na.action} and |
|
|
\code{subset}, are passed through to the \code{model.frame} function |
|
|
that returns the model frame in which to evaluate all the formulae of |
|
|
the model. The \code{drop.unused.levels} optional argument is set to |
|
|
\code{TRUE} so that unused levels in any factors are dropped during |
|
|
creation of the model frame. Thus there is no ambiguitiy regarding |
|
|
the number of levels $m_i$ of each of the $\bff_i$. Every level in |
|
|
each of the $\bff_i$ must occur at least once in the model frame. |
|
|
|
|
|
\subsubsection{Pairwise crosstabulation} |
|
|
|
|
|
We begin by ordering the grouping factors so that $m_1\ge m_2\ge \dots |
|
|
\ge m_k$ and, if $k>1$, forming their \emph{pairwise crosstabulation}. |
|
|
This is the crossproduct matrix, $\bT=\tilde{\bZ}\trans\tilde{\bZ}$, |
|
|
for the \emph{variance components model} determined by these grouping |
|
|
factors. (In a variance components model $q_1=q_2=\dots=q_k=1$ and |
|
|
each of the $\tilde{\bZ}_i$ is the $n\times 1$ matrix of $1$s. The |
|
|
$i$th outer block of columns in $\tilde{\bZ}$ is the set of indicator |
|
|
columns for the levels of $\bff_i$. The name ``variance components'' |
|
|
reflects the fact that, in this model, each of the $\bOmega_i$, which |
|
|
are scalars, is the relative precision of the component of the |
|
|
variation in the response attributed to factor $\bff_i$.) |
|
|
|
|
|
In fact it is not necessary to form and store $\bT$. All that we need |
|
|
are the locations of the nonzeros in the off-diagonal blocks of the |
|
|
representation |
|
|
\begin{equation} |
|
|
\label{eq:PairwiseCrosstab} |
|
|
\bT=\begin{bmatrix} |
|
|
\bT_{(1,1)}&\bT_{(2,1)}\trans&\dots&\bT_{(k,1)}\trans\\ |
|
|
\bT_{(2,1)}&\bT_{(2,2)}&\dots&\bT_{(k,2)}\trans\\ |
|
|
\vdots&\vdots&\ddots&\vdots\\ |
|
|
\bT_{(k,1)}&\bT_{(k,2)}&\dots&\bT_{(k,k)}\trans |
|
|
\end{bmatrix} . |
|
|
\end{equation} |
|
|
(Because the $i$th outer block of columns in $\tilde{\bZ}$ is a set of |
|
|
indicators, the diagonal blocks will themselves be diagonal and their |
|
|
patterns of nonzeros are trivial.) The blocks in the strict lower |
|
|
triangle, $\bT_{(i,j)}, i>j$ are stored as a list of compressed, |
|
|
sparse, column-oriented matrices (see Appendix~\ref{app:Sparse} for |
|
|
details). Only the column pointers and the row indices of these |
|
|
sparse matrices are used.. |
|
|
|
|
|
These off-diagonal blocks are easily calculated because the integer |
|
|
representations of the factors $\bff_i$ and $\bff_j$ form the row and |
|
|
column indices in a (possibly redundant) triplet representation of |
|
|
$\bT_{(i,j)}$. All that we need to do is convert the triplet |
|
|
representation to the compressed, sparse, column-oriented |
|
|
representation. This common conversion is easily accomplished with |
|
|
standard software. |
|
|
|
|
|
We check whether each of the matrices $\bT_{(j,j-1)},j=2,\dots,k$ has |
|
|
exactly one nonzero in each column. If so, the grouping factors form |
|
|
a \emph{nested sequence} in that each level of factor $\bff_i$ occurs |
|
|
with exactly one level of factor $\bff_j, i<j\le k$. In the |
|
|
compressed, sparse, column-oriented format it is easy to determine the |
|
|
number of nonzeros in each column because these are the successive |
|
|
differences in the column pointers. |
|
|
|
|
|
If the grouping factors form a nested sequence (and a single grouping |
|
|
factor is trivially a nested sequence) there is no need for further |
|
|
symbolic analysis. If not, which is to say that we have multiple, |
|
|
non-nested grouping factors, we continue the symbolic analysis for the |
|
|
unit lower triangular factor $\tL$ in the LDL form of the Cholesky |
|
|
decomposition of $\bT$~\citep{Davis:2004}. |
|
|
|
|
|
This symbolic analysis is performed on rows of blocks in the lower |
|
|
triangle of $\tL$, where the blocks of $\tL$ correspond to those |
|
|
of $\bT$ |
|
|
\begin{equation} |
|
|
\label{eq:blockwiseL} |
|
|
\tL=\begin{bmatrix} |
|
|
\bI&\bzer&\dots&\bzer\\ |
|
|
\tL_{(2,1)}&\tL_{(2,2)}&\dots&\bzer\\ |
|
|
\vdots&\vdots&\ddots&\vdots\\ |
|
|
\tL_{(k,1)}&\tL_{(k,2)}&\dots&\tL_{(k,k)} |
|
|
\end{bmatrix} . |
|
|
\end{equation} |
|
|
We emphasize that we are only determining the positions of the |
|
|
nonzeros in the blocks of $\tL$, not performing an actual |
|
|
decomposition. (The decomposition would fail for $k>1$ because $\bT$ |
|
|
is only positive semidefinite, not positive definite. It is composed |
|
|
of multiple blocks of indicators and the row sums within each block |
|
|
are always unity.) |
|
|
|
|
|
Because $\bT_{(1,1)}$ is diagonal, the block in the $(1,1)$ position |
|
|
of $\tL$ will be both diagonal and unit, which is to say that it is |
|
|
the identity matrix of size $m_1$. A consequence of the $(1,1)$ block |
|
|
being the identity is that the structure of the first column of blocks |
|
|
of $\tL$ is the same as the corresponding block of $\bT$. That is, |
|
|
the nonzeros in $\tL_{(j,1)}$ are in the same positions as those in |
|
|
$\bT_{(j,1)}$ for $1<j\le k$. |
|
|
|
|
|
The off-diagonal nonzero positions in $\tL_{(2,2)}$ are determined |
|
|
from the union of the off-diagonal nonzeros positions in |
|
|
$\bT_{(2,2)}$, of which there are none, and those in |
|
|
$\tL_{(2,1)}\tL_{(2,1)}\trans=\bT_{(2,1)}\bT_{(2,1)}\trans$. The |
|
|
number of nonzeros in $\tL_{(2,2)}$ can be changed by permuting the |
|
|
levels of $\bff_2$, which causes a permutation of the rows of the |
|
|
blocks $\bT_{(2,i)}$ and the columns of the blocks $\bT_{(i,2)}$. We |
|
|
determine a fill-reducing permutation of the levels of $\bff_2$ using |
|
|
routines from the Metis~\citep{Metis} graph-partitioning package |
|
|
applied to the incidence graph of $\bT_{(2,1)}\bT_{(2,1)}\trans$. |
|
|
(This is the graph of $m_2$ nodes in which nodes $s$ and $t$ are |
|
|
connected by an edge if and only if |
|
|
$\left\{\bT_{(2,1)}\bT_{(2,1)}\trans\right\}_{s,t}$ is nonzero.) We |
|
|
apply this permutation to the columns of all blocks $\bT_{(i,2)}$ and |
|
|
the rows of all blocks $\bT_{(2,i)}$. |
|
|
|
|
|
The symbolic analysis function from the LDL package applied to |
|
|
$\bT_{(2,1)}\bT_{(2,1)}\trans$ (after permuting the rows of |
|
|
$\bT_{(2,1)}$) provides the positions of the nonzeros in |
|
|
$\tL_{(2,2)}$, from which we determine the positions of the nonzeros |
|
|
in $\tL_{(2,2)}\inv$. The positions of the nonzeros in $\tL_{(3,2)}$ |
|
|
are those of $\bT_{(3,2)}\tL_{(2,2)}\inv$. The next step in the |
|
|
iteration is to form the union of the nonzero positions of |
|
|
$\tL_{(3,2)}\tL_{(3,2)}\trans$ and the nonzero positions in |
|
|
$\bT_{(3,1)}\bT_{(3,1)}\trans$ from which we determine a fill-reducing |
|
|
permutation for the levels of $\bff_3$. The process is continued |
|
|
until a fill-reducing permutation for the levels of $\bff_k$ and the |
|
|
structure of $\tL_{(k,k)}$ and $\tL_{(k,k)}\inv$ are determined. |
|
|
|
|
|
As part of the symbolic analysis of each diagonal block, the |
|
|
elimination tree~\citep{Davis:2004} of the block is determined and |
|
|
stored as an integer array of length $m_i$. It is straightforward to |
|
|
check the number of terminal nodes in this tree. If the elmination |
|
|
tree for $\tL_{(i,i)}$ is found to have only one terminal node then |
|
|
$\tL_{(i,i)}\inv$ will be dense. Furthermore, all subsequent diagonal |
|
|
blocks will be dense so there is no purpose in checking for a |
|
|
fill-reducing permutation of the levels of $\bff_j,i<j\le k$, and the |
|
|
symbolic analysis can be terminated. |
|
|
|
|
|
\subsubsection{Allocating storage} |
|
|
|
|
|
In the numeric representation of the model we will write the augmented |
|
|
model matrix of size $n\times(p+1)$ obtained by appending $\by$ to |
|
|
$\bX$ as $\tilde{\bX}=\left[\bX,\by\right]$. We can allocate the |
|
|
storage for the sparse matrix representation of the model using the |
|
|
results of the symbolic analysis and the numbers of columns in the |
|
|
model matrices, $q_i,i=1,\dots,k$ and $(p+1)$. |
|
|
|
|
|
We will store $\bOmega$, $\bZ\trans\bZ$, $\bZ\trans\tilde{\bX}$, |
|
|
$\tilde{\bX}\trans\tilde{\bX}$ and components $\RZZ$, $\tRZX$ and $\tRXX$ of |
|
|
the Cholesky decomposition |
|
|
\begin{equation} |
|
|
\label{eq:tildeCrossProdGen} |
|
|
\begin{bmatrix} |
|
|
\bZ\trans\bZ+\bOmega & \bZ\trans\tilde{\bX} \\ |
|
|
\tilde{\bX}\trans\bZ & \tilde{\bX}\trans\tilde{\bX} |
|
|
\end{bmatrix}=\bR\trans\bR\quad\text{where}\quad\bR= |
|
|
\begin{bmatrix} |
|
|
\RZZ & \tRZX\\ |
|
|
\bzer& \tRXX |
|
|
\end{bmatrix} . |
|
|
\end{equation} |
|
|
in one of four possible formats: as a dense matrix, as a block/block |
|
|
sparse matrix, as a block/block diagonal matrix, or as a repeated |
|
|
block/block diagonal matrix. |
|
|
|
|
|
For example, we have already indicated that $\bOmega$ is repeated |
|
|
block/block diagonal so we store it in the \code{Omega} slot as a list |
|
|
of $k$ symmetric matrices of sizes $q_i\times q_i$ (only the upper |
|
|
triangles of the symmetric matrices are used). |
|
|
|
|
|
The matrices $\bZ\trans\tilde{\bX}$ and |
|
|
$\tilde{\bX}\trans\tilde{\bX}$, and the decomposition components |
|
|
$\tRZX$, and $\tRXX$ matrices are stored as dense matrices in slots |
|
|
named \code{ZtX}, \code{XtX}, \code{RZX} and \code{RXX}, respectively. |
|
|
|
|
|
The matrix $\bZ\trans\bZ$ is stored in a symmetric, sparse |
|
|
column-oriented format like that of $\bT$ in |
|
|
(\ref{eq:PairwiseCrosstab}) except that $\bZ\trans\bZ$ has both inner |
|
|
and outer blocks. The $k\times k$ grid of outer blocks |
|
|
$(\bZ\trans\bZ)_{(i,j)}$ is determined by the grouping factors. These |
|
|
outer blocks correspond to the blocks $\bT_{(i,j)}$ in $\bT$. The |
|
|
diagonal outer block $(\bZ\trans\bZ)_{(i,i)}$ is itself block diagonal |
|
|
in $m_i$ blocks of size $q_i\times q_i$. We store the upper triangles |
|
|
of these inner blocks in an array of size $q_i\times q_i\times m_i$. |
|
|
Block $(\bZ\trans\bZ)_{(i,j)}$ for $j<i\le k$ is subdivided into inner |
|
|
blocks of size $q_i\times q_j$ corresponding to the levels of grouping |
|
|
factors $i$ and $j$. Because an inner block of |
|
|
$(\bZ\trans\bZ)_{(i,j)}$ is nonzero if and only if the corresponding |
|
|
element of $\bT_{(i,j)}$ is nonzero, we can use the column pointers |
|
|
and row indices from $\bT_{(i,j)}$ for $(\bZ\trans\bZ)_{(i,j)}$ with |
|
|
the convention that they index the inner blocks, not the individual |
|
|
elements, of $(\bZ\trans\bZ)_{(i,j)}$. The inner blocks are stored in |
|
|
a dense array of dimension $q_i\times q_j\times n_{(i,j)}$ where |
|
|
$n_{(i,j)}$ is the number of nonzeros in $\bT_{(i,j)}$. This is the |
|
|
block/block sparse matrix format. |
|
|
|
|
|
Instead of calculating $\RZZ$ we calculate the LDL form |
|
|
\begin{equation} |
|
|
\label{eq:bigLDL} |
|
|
\bZ\trans\bZ+\bOmega=\bL\bD\bL\trans |
|
|
\end{equation} |
|
|
where $\bL$ is block/block unit lower triangular (i.e.{} block/block lower |
|
|
triangular with all the inner diagonal blocks in the $i$th outer |
|
|
diagonal block being the $q_i\times q_i$ identity matrix) and $\bD$ is |
|
|
block/block diagonal. |
|
|
|
|
|
Just as the column pointers and row indices of the blocks of $\bT$ can |
|
|
be used for the outer blocks of $\bZ\trans\bZ$, we can use the column |
|
|
pointers and row indices of $\tL$, which we have determined in the |
|
|
symbolic analysis, for the outer blocks of |
|
|
\begin{equation} |
|
|
\label{eq:Lform} |
|
|
\bL= |
|
|
\begin{bmatrix} |
|
|
\bI & \bzer & \dots & \bzer\\ |
|
|
\bL_{(2,1)} & \bL_{(2,2)} & \dots & \bzer\\ |
|
|
\vdots & \vdots & & \bzer\\ |
|
|
\bL_{(k,1)} & \bL_{(k,2)} & \dots & \bL_{(k,k)} |
|
|
\end{bmatrix} . |
|
|
\end{equation} |
|
|
That is, the same structure as $\tL_{(i,j)}$ |
|
|
applies to the blocks $\bL_{(i,i)}, 1<i\le k$, $\bL_{(i,i)}, 1<i\le k$, |
|
|
and $\bL_{(i,j)}, 1\le j<i\le k$ except that each nonzero in |
|
|
$\tL_{(i.j)}$ corresponds to a block of size $q_i\times q_j$ in |
|
|
$\bL_{(i.j)}$. |
|
|
|
|
|
From the symbolic analysis we also have the column pointers and row |
|
|
indices for $\tL^{-1}_{(i,i)}$, corresponding to the diagonal outer |
|
|
blocks of $\bL^{-1}$. We allocate storage for these diagonal outer |
|
|
blocks only. Note that $\bL^{-1}$ is block/block unit lower |
|
|
triangular just like $\bL$. Because the inner diagonal blocks of |
|
|
these matrices are always the identity, these inner blocks are not |
|
|
explicitly stored. Neither $\bL_{(1,1)}=\bI$ nor |
|
|
$\left(\bL^{-1}\right)_{(1,1)}=\bI$ require any storage to be |
|
|
allocated. |
|
|
|
|
|
We have used fill-reducing permutations of the levels of the grouping |
|
|
factors $\bff_j,j=2,\dots,k$. These can decrease, sometimes |
|
|
dramatically, the amount of storage required for the $\tL_{(i,i)}$ but |
|
|
generally they do not result in compact storage of |
|
|
$\tL_{(i,i)}^{-1}$. In the worst case the matrix $\tL_{(2,2)}^{-1}$ |
|
|
will be dense or nearly dense, resulting in a storage requirement of |
|
|
approximately $q_2^2 m_2(m_2+1)/2$ elements for the array holding the |
|
|
numeric values for the $(2,2)$ outer block of $\bL^{-1}$. |
|
|
|
|
|
The matrix $\bD$ is block/block diagonal. It is stored as a list of |
|
|
$k$ arrays of sizes $q_i\times q_i\times m_i$. |
|
|
|
|
|
After allocating the storage we evaluate $\by$, $\bX$ and the $\bZ_i$ |
|
|
then update the contents of the \code{XtX}, \code{ZtX}, and \code{ZtZ} |
|
|
slots. We allocate the storage and update the contents of the storage |
|
|
in separate steps so we can update the numeric values without |
|
|
reallocating storage during iterative algorithms for fitting |
|
|
generalized linear mixed models or nonlinear mixed models. |
|
|
|
|
|
\section{Evaluation of the objective} |
|
|
\label{sec:Cholesky} |
|
|
|
|
|
If prior estimates of the $\bOmega_i$ are available, we set the |
|
|
\code{Omega} slot accordingly. Otherwise we form initial estimates |
|
|
from the matrices $\bZ_i$, as described in |
|
|
\citet[ch.~3]{pinh:bate:2000}. We then begin iterative optimization |
|
|
of the estimation criterion with respect to the $\bOmega_i$ or with |
|
|
respect to the unconstrained parameter $\btheta$ that determines the |
|
|
$\bOmega_i$, as described in \S\ref{sec:Unconstrained}. |
|
|
|
|
|
In this section we will describe the steps in evaluating the objective |
|
|
function (i.e.{} the function to be optimized w.r.t.{} the $\bOmega_i$ |
|
|
or w.r.t.{} $\btheta$) given the current values of the $\bOmega_i$. |
|
|
Recall that after setting values of the $\bOmega_i$ we form the |
|
|
Cholesky decomposition (\ref{eq:tildeCrossProdGen}) |
|
|
\begin{equation*} |
|
|
\begin{bmatrix} |
|
|
\bZ\trans\bZ+\bOmega & \bZ\trans\tilde{\bX} \\ |
|
|
\tilde{\bX}\trans\bZ & \tilde{\bX}\trans\tilde{\bX} |
|
|
\end{bmatrix}=\bR\trans\bR\quad\text{where}\quad\bR= |
|
|
\begin{bmatrix} |
|
|
\RZZ & \tRZX\\ |
|
|
\bzer& \tRXX |
|
|
\end{bmatrix} . |
|
|
\end{equation*} |
|
|
and $\tilde{\bX}$ incorporates both $\bX$ and $\by$. In some cases |
|
|
it will be convenient to consider $\bX$ and $\by$ separately and we will |
|
|
write the components of the Cholesky decomposition as if they were |
|
|
\begin{equation} |
|
|
\label{eq:CrossProdGen} |
|
|
\begin{bmatrix} |
|
|
\bZ\trans\bZ+\bOmega & \bZ\trans\bX & \bZ\trans\by \\ |
|
|
\bX\trans\bZ & \bX\trans\bX & \bX\trans\by \\ |
|
|
\by\trans\bZ & \by\trans\bX & \by\trans\by |
|
|
\end{bmatrix}=\bR\trans\bR\quad\text{where}\quad\bR= |
|
|
\begin{bmatrix} |
|
|
\RZZ & \RZX & \rZy \\ |
|
|
\bzer & \RXX & \rXy \\ |
|
|
\bzer & \bzer & \ryy |
|
|
\end{bmatrix} |
|
|
\end{equation} |
|
|
even though they are stored in the form of (\ref{eq:tildeCrossProdGen}). |
|
|
|
|
|
Recall also that $\RZZ$ is calculated and stored in the LDL form. |
|
|
Instead of storing the symmetric positive definite inner blocks of the |
|
|
block/block diagonal $\bD$, we calculate and store their upper |
|
|
Cholesky factors. We write the block/block diagonal matrix of upper |
|
|
Cholesky factors as $\bD^{1/2}$ and its transpose as |
|
|
$\bD^{\mathsf{T}/2}$. (Note that transposing a block/block diagonal |
|
|
matrix only requires transposing the inner blocks.) The quantity |
|
|
$\log\left|\bD\right|$ is evaluated as the sum of the logarithms of |
|
|
the squares of the diagonal elements of the inner diagonal blocks of |
|
|
$\bD^{1/2}$. |
|
|
|
|
|
The next step in the factorization is to solve for $\tRZX$ in |
|
|
\begin{equation} |
|
|
\label{eq:RZX} |
|
|
\RZZ\trans\tRZX=\bL\bD^{\mathsf{T}/2}\tRZX=\bZ\trans\tilde{\bX} |
|
|
\end{equation} |
|
|
using blocked sparse matrix techniques (see |
|
|
Appendix~\ref{app:blocked}), storing the result in the \code{RZX} |
|
|
slot. Finally, dense matrix operations are used to downdate the |
|
|
densely stored $\tilde{\bX}\trans\tilde{\bX}$ and obtain the upper |
|
|
Cholesky factor $\tRXX$ that satisfies |
|
|
\begin{equation} |
|
|
\label{eq:RXX} |
|
|
\tRXX\trans\tRXX= |
|
|
\tilde{\bX}\trans\tilde{\bX}-\tRZX\trans\tRZX , |
|
|
\end{equation} |
|
|
and provides $\log\left(\left|\RXX\right|^2\right)$ and $\ryy$. |
|
|
|
|
|
The \code{status} slot is a pair of logical values called |
|
|
\code{factored} and \code{inverted}. It is set to \code{(TRUE, |
|
|
FALSE)} indicating that the factorization is current and that it has |
|
|
not been inverted. |
|
|
|
|
|
In a separate calculation the logarithm of the determinant |
|
|
\begin{equation} |
|
|
\label{eq:OmegaDet} |
|
|
\log |\bOmega|=\sum_{i=1}^k m_i \log |\bOmega_i| |
|
|
\end{equation} |
|
|
is evaluated from the Cholesky factors of the $\bOmega_i$. The |
|
|
function to be minimized w.r.t.{} $\btheta$, called the \emph{profiled |
|
|
deviance}, is |
|
|
\begin{equation} |
|
|
\label{eq:ProfiledLogLik} |
|
|
f(\btheta)=\log\left|\bD\right|-\log\left|\bOmega\right| |
|
|
+ n\left[1+\log\left(\frac{2\pi\ryy^2}{n}\right)\right] |
|
|
\end{equation} |
|
|
for ML estimation or |
|
|
\begin{equation} |
|
|
\label{eq:ProfiledLogRestLik} |
|
|
f_R(\btheta)= |
|
|
\log\left|\bD\right|+\log\left(\left|\RXX\right|^2\right) |
|
|
- \log\left|\bOmega\right| |
|
|
+ (n-p)\left[1+\log\left(\frac{2\pi\ryy^2}{n-p}\right)\right] |
|
|
\end{equation} |
|
|
for REML. |
|
|
|
|
|
\subsection{Inverting the factorization} |
|
|
\label{ssec:Inverting} |
|
182 |
|
|
183 |
Evaluatation of the objective function does not require inversion of |
Model \code{Sm1} has a single grouping factor with $18$ levels. The |
184 |
the factorization nor does evaluation of many other quantities of |
\code{Omega} slot is a list of length one containing a $2\times 2$ |
185 |
interest, such as the conditional variance estimates |
symmetric matrix. |
186 |
$\widehat{\sigma^2}=\ryy^2/n$ and $\widehat{\sigma}_R^2=\ryy^2/(n-p)$, |
<<Sm1grp>>= |
187 |
the conditional fixed effects estimates, $\widehat{\bbeta}$, which satisfy |
str(Sm1@flist) |
188 |
$\RXX\widehat{\bbeta}=\rXy$, |
show(Sm1@Omega) |
189 |
and the conditional modes of |
show(Sm1@nc) |
190 |
the random effects, $\widehat{\bb}$, which satisfy |
show(Sm1@Gp) |
191 |
\begin{equation} |
diff(Sm1@Gp)/Sm1@nc |
|
\label{eq:bbhat} |
|
|
\bD^{1/2}\bL\trans\widehat{\bb}=\rZy-\RZX\widehat{\bbeta} |
|
|
\end{equation} |
|
|
|
|
|
However, if we wish to evaluate the ECME step or the gradient and/or |
|
|
the Hessian of the objective, it is convenient to invert some parts of |
|
|
the decomposition. We invert the dense upper triangular matrix |
|
|
$\tRXX$ in place, producing $\RXX^{-1}$ in the first $p$ rows and |
|
|
columns, $1/\ryy$ in the $(p+1,p+1)$ position, and |
|
|
$-\widehat{\bbeta}/\ryy$ in the first $p$ rows of the $p+1$st column. |
|
|
We also invert each of the triangular inner blocks of $\bD^{1/2}$ in |
|
|
place producing $\bD^{-1/2}$. The inverses of the outer blocks on the |
|
|
diagonal of $\bL$ are also determined and stored separately. (In |
|
|
general the diagonal outer blocks cannot be inverted in place. The |
|
|
number of nonzero inner blocks in the inverse of an outer block on the |
|
|
diagonal can be different from the number of nonzero inner blocks in |
|
|
the outer diagonal block itself.) Notice that this last operation |
|
|
is trivial in the case of a single grouping factor or multiple, nested |
|
|
grouping factors because all the outer diagonal blocks in $\bL$ are |
|
|
identity matrices. |
|
|
|
|
|
The matrix $\tRZX{\tRXX}^{-1}$ is evaluated as a dense matrix product |
|
|
then each column of |
|
|
$-\bL\invtrans\bD^{-1/2}\left(\tRZX{\tRXX}^{-1}\right)$ is evaluated |
|
|
using blocked sparse matrix operations (see Appendix~\ref{app:blocked} |
|
|
for details) and stored in the \code{RZX} slot. The first $p$ columns |
|
|
of the result, which are $-\bL\invtrans\bD^{-1/2}\RZX\RXX^{-1}$, are |
|
|
used to create products involving the matrix $\vb$, which is the |
|
|
unscaled, conditional, REML variance-covariance of the random effects |
|
|
\begin{equation} |
|
|
\label{eq:VbDef} |
|
|
\begin{aligned} |
|
|
\vb&= \begin{bmatrix}\bI&\bzer\end{bmatrix} |
|
|
\left( |
|
|
\begin{bmatrix}\RZZ\trans & \bzer \\ \RZX\trans & \RXX\trans\end{bmatrix} |
|
|
\begin{bmatrix}\RZZ & \RZX \\ \bzer & \RXX\end{bmatrix} |
|
|
\right)^{-1} |
|
|
\begin{bmatrix} \bI\\\bzer \end{bmatrix}\\ |
|
|
&= \begin{bmatrix}\bI&\bzer\end{bmatrix} |
|
|
\begin{bmatrix}\RZZ\inv&-\RZZ\inv\RZX\RXX\inv\\\bzer&\RXX\inv\end{bmatrix} |
|
|
\begin{bmatrix}\RZZ\invtrans&\bzer\\ |
|
|
-\RXX\invtrans\RZX\trans\RZZ\invtrans&\RXX\invtrans\end{bmatrix} |
|
|
\begin{bmatrix} \bI\\\bzer \end{bmatrix}\\ |
|
|
&=\begin{bmatrix}\RZZ\inv&-\RZZ\inv\RZX\RXX\inv\end{bmatrix} |
|
|
\begin{bmatrix}\RZZ\invtrans\\ |
|
|
-\RXX\invtrans\RZX\trans\RZZ\invtrans\end{bmatrix}\\ |
|
|
&=\RZZ\inv\RZZ\invtrans |
|
|
+\RZZ\inv\RZX\RXX\inv\RXX\invtrans\RZX\trans\RZZ\invtrans\\ |
|
|
&=\left(\bZ\trans\bZ+\bOmega\right)\inv |
|
|
+\RZZ\inv\RZX\RXX\inv\RXX\invtrans\RZX\trans\RZZ\invtrans\\ |
|
|
&=\left(\bZ\trans\bZ+\bOmega\right)\inv |
|
|
+\bL\invtrans\bD^{-1/2}\RZX\RXX\inv |
|
|
\RXX\invtrans\RZX\trans\bD^{-\mathsf{T}/2}\bL\inv\\ |
|
|
\end{aligned} |
|
|
\end{equation} |
|
|
The $p+1$st column is $-\widehat{\bb}/\ryy$. |
|
|
|
|
|
The \code{status} flag is changed to \code{(TRUE, TRUE)} indicating |
|
|
that the factorization is current and that it is the inverted |
|
|
components that are stored in the \code{RZX}, \code{RXX}, and |
|
|
\code{Dfac} slots and that the \code{LIx} slot if current. |
|
|
|
|
|
|
|
|
\section{Examples} |
|
|
\label{sec:Examples} |
|
|
|
|
|
To illustrate this representation and these operations we consider |
|
|
some examples of common types of mixed effects models. |
|
|
|
|
|
\subsection{Single grouping factor} |
|
|
\label{sec:SingleGrouping} |
|
|
|
|
|
The \code{Early} data set in the \code{lme4} package is from a |
|
|
longitudinal study on an early childhood intervention program. These |
|
|
data are described in \citet[ch.~3]{Sing:Will:2003}. The response is |
|
|
a cognitive development score, \code{cog}, measured at ages 1, 1.5, |
|
|
and 2 years (\code{age}) on 103 infants (identified by \code{id}); 58 |
|
|
of whom were in the treatment group and 45 in the control group |
|
|
(identified by \code{trt}). We convert the \code{age} measurement to |
|
|
``time on study'', \code{tos} as the treatment began at age 0.5 years. |
|
|
<<EarlyDataLoad>>= |
|
|
Early$tos <- Early$age - 0.5 |
|
|
str(Early) |
|
|
@ |
|
|
<<EarlyData,fig=TRUE,echo=FALSE,height=7.2,width=7>>= |
|
|
print(xyplot(cog ~ age | id, Early, type = 'b', aspect = 'xy', |
|
|
layout = c(23,5), between = list(y = c(0,0,0.5)), |
|
|
skip = rep(c(0,1),c(58,11)), |
|
|
xlab = "Age (yr)", |
|
|
ylab = "Cognitive development score", |
|
|
scales = list(x = list(tick.number = 3, alternating = TRUE, |
|
|
labels = c("1","","2"), at = c(1,1.5,2))), |
|
|
par.strip.text = list(cex = 0.7))) |
|
192 |
@ |
@ |
193 |
A lattice plot of the longitudinal scores is given in Figure~\ref{fig:EarlyData}. |
|
194 |
\begin{figure}[tbp] |
Model \code{Cm1} has two grouping factors: the \code{school} factor |
195 |
\centering |
with 2410 levels and the \code{lea} factor (local education authority |
196 |
\includegraphics[width=\textwidth]{figs/Example-EarlyData} |
- similar to a school district in the U.S.A.) with 131 levels. It |
197 |
\caption[Cognitive development scores]{Cognitive development scores |
happens that the \code{school} factor is nested within the \code{lea} |
198 |
for 103 infants; 45 in the control group (identification numbers |
factor, a property that we discuss below. The \code{Omega} slot is a |
199 |
above 900) and 58 in the treatment group. The data from |
list of length two containing two $1\times 1$ symmetric matrices. |
200 |
each infant are displayed in separate panels. Those for the |
<<Cm1grp>>= |
201 |
infants in the treatment group are in the lower three rows of |
str(Cm1@flist) |
202 |
panels; the control group are in the upper two rows. Within each |
show(Cm1@Omega) |
203 |
group the panels are ordered by increasing initial cognitive |
show(Cm1@nc) |
204 |
score.} |
show(Cm1@Gp) |
205 |
\label{fig:EarlyData} |
diff(Cm1@Gp)/Cm1@nc |
|
\end{figure} |
|
|
|
|
|
These data are grouped by subject, as is common in longitudinal data. |
|
|
The only grouping factor for random effects is \code{id}. (The |
|
|
subjects are themselves grouped into a treatment group and a control |
|
|
group but that dichotomy would be modeled with fixed effects, not |
|
|
random effects.) As an initial model an analyst may fit an |
|
|
\emph{unconditional growth} model~\cite[ch.~4]{Sing:Will:2003} |
|
|
<<EarlyUncGrowth>>= |
|
|
fm1 <- lmer(cog ~ tos + (tos|id), Early) |
|
|
@ |
|
|
or proceed immediately to a model that allows for the effects of the |
|
|
treatment |
|
|
<<EarlyTrt>>= |
|
|
fm2 <- lmer(cog ~ tos * trt + (tos|id), Early) |
|
206 |
@ |
@ |
207 |
|
|
208 |
The structure of the resulting object is |
Model \code{Mm1} has three grouping factors: \code{id} (the student) |
209 |
<<EarlyTrtStr>>= |
with 10732 levels, \code{tch} (the teacher) with 1374 levels and |
210 |
str(fm2) |
\code{sch} (the school) with 80 levels. The \code{Omega} slot is a |
211 |
|
list of length three containing two $2\times 2$ symmetric matrices and |
212 |
|
one $1\times 1$ matrix. |
213 |
|
<<Mm1grp>>= |
214 |
|
str(Mm1@flist) |
215 |
|
show(Mm1@Omega) |
216 |
|
show(Mm1@nc) |
217 |
|
show(Mm1@Gp) |
218 |
|
diff(Mm1@Gp)/Mm1@nc |
219 |
@ |
@ |
220 |
|
|
221 |
\section{Frechet derivatives} |
The last element of the \code{Gp} slot is the dimension of $\bb$. |
222 |
\label{sec:Frechet} |
Notice that for model \code{Mm1} the dimension of $\bb$ is 22,998. |
223 |
|
This is also the order of the symmetric matrix $\bOmega$ although the |
224 |
|
contents of the matrix are determined by $\btheta$ which has a length |
225 |
|
of $3+1+3=7$ in this case. |
226 |
|
|
227 |
We will denote the Frechet derivative of the matrix $\bOmega$ with |
Table~\ref{tbl:dims} summarizes some of the dimensions in these examples. |
|
respect to $\bOmega_i$ by $\der_{\bOmega_i}\bOmega$. This is a four |
|
|
dimensional array that can be regarded as an indexed set of $q_i^2$ |
|
|
matrices, each the same size as $\bOmega$. These matrices, which we |
|
|
call the $q_i^2$ ``faces'' of the array, are indexed by the rows $r$ |
|
|
and the columns $c$, $r,c=1,\dots,q_i$ of $\bOmega_i$. The $(r,c)$th |
|
|
face, which we write as $\der_{i:r,c}\bOmega$, is the derivative of |
|
|
$\bOmega$ with respect to the $(r,c)$th element of $\bOmega_i$. It is |
|
|
a repeated block/block diagonal matrix where all of the outer diagonal |
|
|
blocks are zero except for the $(i,i)$ diagonal block which is block |
|
|
diagonal in $m_i$ blocks of $\bbe_r\bbe_c\trans$, where $\bbe_j$ is |
|
|
the $j$th column of the identity matrix of size $q_i$. |
|
|
|
|
|
An expression of the form |
|
|
$\tr\left[\left(\der_{\bOmega_i}\bOmega\right)\bM\right]$ where $\bM$ |
|
|
is a matrix of the same size as $\bOmega$ evaluates to $q_i^2$ scalars |
|
|
indexed by $r$ and $c$, $r,c=1,\dots,q_i$, which we convert to a |
|
|
$q_i\times q_i$ matrix in the obvious way. This type of expression |
|
|
can be simplified as |
|
|
\begin{equation} |
|
|
\label{eq:derOmegai} |
|
|
\tr\left[(\der_{\bOmega_i}\bOmega)\bM\right] |
|
|
= \sum_{j=1}^{m_i}\tr\left[(\der_{\bOmega_i}\bOmega_i)\bM_{i,i,j,j}\right] |
|
|
= \sum_{j=1}^{m_i}\bM_{i,i,j,j}\trans |
|
|
\end{equation} |
|
|
where $\bM_{i,i,j,j}$ is the $j$th inner diagonal block in the $i$th |
|
|
outer diagonal block of $\bM$. To establish the last equality in |
|
|
(\ref{eq:derOmegai}) note that for any $q_i\times q_i$ matrix $\bA$ |
|
|
the entry in row $r$ and column $c$ of |
|
|
$\tr\left[(\der_{\bOmega_i}\bOmega_i)\bA\right]$ is |
|
|
\begin{equation} |
|
|
\label{eq:rowRcolCentry} |
|
|
\left\{\tr\left[(\der_{\bOmega_i}\bOmega_i)\bA\right]\right\}_{r,c} |
|
|
= \tr\left[\bbe_r\bbe_c\trans\bA\right] |
|
|
= \tr\left[\bbe_c\trans\bA\bbe_r\right] |
|
|
= \bbe_c\trans\bA\bbe_r |
|
|
= \bA_{c,r} |
|
|
\end{equation} |
|
|
If $\bM$ is symmetric, as is the case in all the expressions of this |
|
|
form that we consider, then |
|
|
$\tr\left[\left(\der_{\bOmega_i}\bOmega\right)\bM\right]$ will be |
|
|
symmetric. |
|
228 |
|
|
229 |
Expressions of this form that we will require include |
\begin{table}[tb] |
230 |
\begin{equation} |
\centering |
231 |
\label{eq:derOmegaInv} |
\begin{tabular}{r r r r r r r r r r r} |
232 |
\begin{aligned} |
\hline\\ |
233 |
\tr\left[\left(\der_{\bOmega_i}\bOmega\right)\left(\bOmega\right)^{-1}\right] |
\multicolumn{1}{c}{Name}&\multicolumn{1}{c}{$n$}& \multicolumn{1}{c}{$k$}& |
234 |
&= \sum_{j=1}^{m_i}\left(\bOmega_{i,i,j,j}\right)\inv\\ |
\multicolumn{1}{c}{$n_1$}&\multicolumn{1}{c}{$q_1$}& |
235 |
&= m_i\bOmega_i\inv |
\multicolumn{1}{c}{$n_2$}&\multicolumn{1}{c}{$q_2$}& |
236 |
\end{aligned}, |
\multicolumn{1}{c}{$n_3$}&\multicolumn{1}{c}{$q_3$}& |
237 |
\end{equation} |
\multicolumn{1}{c}{$q$}&\multicolumn{1}{c}{$\#(\btheta)$}\\ |
238 |
\begin{equation} |
\hline |
239 |
\label{eq:bbDer} |
\code{Sm1}& 180&1& 18&2& & & & & 36&3\\ |
240 |
\tr\left[\left(\der_{\bOmega_i}\bOmega\right) \left( |
\code{Cm1}&31022&2& 2410&1& 131&1& & & 2541&2\\ |
241 |
\frac{\widehat{\bb}}{\ryy} \frac{\widehat{\bb}} |
\code{Mm1}&24578&3&10732&2&1374&1&80&2&22998&7\\ |
242 |
{\ryy} \trans\right)\right] = \bB_i \bB_i\trans |
\hline |
243 |
\end{equation} |
\end{tabular} |
244 |
where $\bB_i$ is the $q_i\times m_i$ matrix whose $j$ column is |
\caption{Summary of various dimensions of model fits used as |
245 |
$\widehat{\bb}_{i,j}/\ryy, j=1,\dots,m_i$ (recall that these vectors |
examples.} |
246 |
are in the $p+1$st column of the \code{RZX} slot after the inversion |
\label{tbl:dims} |
247 |
step), and $\tr \left[ \left( \der_{\bOmega_i} \bOmega \right) |
\end{table} |
248 |
(\bZ\trans\bZ+\bOmega)^{-1} \right]$. |
|
249 |
|
\subsection{Permutation of the random-effects vector} |
250 |
The last term requires evaluation of the |
\label{sec:permutation} |
251 |
inner diagonal blocks of |
|
252 |
\begin{equation} |
For most mixed-effects model fits, the model matrix $\bZ$ for the |
253 |
\label{eq:ZtZinv} |
random effects vector $\bb$ is large and sparse. That is, most of the entries |
254 |
\left(\bZ\trans\bZ +\bOmega\right)^{-1}=\left(\RZZ\trans\RZZ\right)\inv= |
in $\bZ$ are zero (by design, not by accident). |
255 |
=\bL\invtrans\bD^{-1/2}\bD^{-\mathsf{T}/2}\bL^{-1} |
|
256 |
\end{equation} |
Numerical analysts have developed special techniques for representing |
257 |
When $k=1$, $\bL$ is an identity |
and manipulating sparse matrices. Of particular importance to us are |
258 |
matrix and the result is simply |
techniques for obtaining the left Cholesky factor $\bL(\btheta)$ of |
259 |
\begin{equation*} |
the large, sparse, positive-definite, symmetric matrices. In |
260 |
\tr\left[\left(\der_{\bOmega_i}\bOmega\right) |
particular, we want to obtain the Cholesky factorization of |
261 |
\bD^{-1/2}\bD^{-\mathsf{T}/2}\right]= |
$\bZ\trans\bZ+\bOmega(\btheta)$ for different values of $\btheta$. |
262 |
\sum_{j=1}^{m_1}\bD_{1:j}^{-1/2}\bD_{1:j}^{-\mathsf{T}/2} . |
|
263 |
\end{equation*} |
Sparse matrix operations are typically performed in two phases: a |
264 |
When $k>1$ we must evaluate the crossproduct of each of the $m_i$ |
\emph{symbolic phase} in which the number of non-zero elements in the |
265 |
inner blocks of $q_i$ columns in the $i$th outer block of columns of |
result and their positions are determined followed by a \emph{numeric |
266 |
$\bD^{-\mathsf{T}/2}\bL^{-1}=\RZZ\invtrans$, which we do using blocked, sparse |
phase} in which the actual numeric values are calculated. Advanced |
267 |
matrix techniques. See Appendix~\ref{ssec:InverseBlocks} for details. |
sparse Cholesky factorization routines, such as the CHOLMOD library |
268 |
|
(Davis, 2005) which we use, allow for calculation of a fill-reducing |
269 |
For REML results we also need |
permutation of the rows and columns during the symbolic phase. In |
270 |
\begin{multline} |
fact the CHOLMOD code allows for evaluation of both a fill-reducing |
271 |
\label{eq:Vbterm} |
permutation and a post-ordering that groups the columns of $\bL$ into |
272 |
\tr\left[\left(\der_{\bOmega_i}\bOmega\right)\vb\right] = \\ |
blocks with identical row structure, thus allowing for the dense |
273 |
\tr\left[\left(\der_{\bOmega_i}\bOmega\right) |
matrix techniques to be used on the blocks or ``super-nodes''. Such a |
274 |
\RZZ\inv\RZZ\invtrans\right] |
decomposition is called a \emph{supernodal} Cholesky factorization. |
275 |
+ \tr\left[\left(\der_{\bOmega_i}\bOmega\right) |
|
276 |
\RZZ\inv\RZX\RXX\inv\RXX\invtrans\RZX\trans\RZZ\invtrans\right] |
Because the number and positions of the nonzeros in $\bOmega(\btheta)$ |
277 |
\end{multline} |
do not change with $\btheta$ and the nonzeros in $\bOmega(\btheta)$ |
278 |
We have just described how to calculate the first term in |
are a subset of the nonzeros in $\bZ\trans\bZ$, we only need to |
279 |
(\ref{eq:Vbterm}). The second term is evaluated as the crossproducts |
perform the symbolic phase once and we can do so using $\bZ\trans\bZ$. |
280 |
of the $m_i$ inner blocks of $q_i$ rows in the $i$th outer block of the rows |
That is, using $\bZ$ only we can determine the permutation matrix |
281 |
of $-\RZZ\inv\RZX\RXX\inv$, which is calculated and stored in the |
$\bP$ for all decompositions of the form |
|
\code{RZX} slot during the inversion of the factorization, as |
|
|
described in \S\ref{ssec:Inverting}. |
|
|
|
|
|
\section{ECME updates} |
|
|
\label{sec:ECME} |
|
|
|
|
|
ECME iterations provide a stable, but only linearly convergent, |
|
|
optimization algorithm for the objective functions $f$ and $f_R$. We |
|
|
use a moderate number (the default is 20) of them to refine our |
|
|
initial estimates $\bOmega_i^{(0)}$ and bring us into the region of |
|
|
the optimum before switching to Newton iterations on the unconstrained |
|
|
parameter vector $\btheta$. |
|
|
|
|
|
The ECME iterations can be defined in terms of the $\bOmega_i$. At |
|
|
the $\nu$th iteration, $\bOmega^{(\nu+1)}$ is defined to be the |
|
|
repeated block/block diagonal, symmetric, positive definite matrix |
|
|
that satisfies the $k$ systems of equations |
|
|
\begin{equation} |
|
|
\label{eq:theta1} |
|
|
\tr\left[\der_{\bOmega_i}\bOmega\left( |
|
|
\frac{\widehat{\bb}^{(\nu)}}{\widehat{\sigma}^{(\nu)}} |
|
|
\frac{\left.{\widehat{\bb}}^{(\nu)}\right.\trans}{\widehat{\sigma}^{(\nu)}}+ |
|
|
\left(\bZ\trans\bZ+\bOmega^{(\nu)}\right)^{-1} |
|
|
-\left(\bOmega^{(\nu+1)}\right)^{-1}\right)\right]=\bzer |
|
|
\end{equation} |
|
|
for ML estimates or the $k$ systems |
|
282 |
\begin{equation} |
\begin{equation} |
283 |
\label{eq:theta1R} |
\label{eq:Cholesky} |
284 |
\tr\left[\der_{\bOmega_i}\bOmega\left( |
\bP\left[\bZ\trans\bZ+\bOmega(\btheta)\right]\bP\trans = \bL(\btheta)\bL(\btheta)\trans |
|
\frac{\widehat{\bb}^{(\nu)}}{\widehat{\sigma}_R^{(\nu)}} |
|
|
\frac{\left.{\widehat{\bb}}^{(\nu)}\right.\trans}{\widehat{\sigma}_R^{(\nu)}}+ |
|
|
\vb^{(\nu)} |
|
|
-{\bOmega^{(\nu+1)}}^{-1}\right)\right]=\bzer |
|
285 |
\end{equation} |
\end{equation} |
|
for REML. |
|
286 |
|
|
287 |
The results of \S\ref{sec:Frechet} provide |
We revise (\ref{eq:lmeGeneral}) by incorporating the permutation to obtain |
|
\begin{equation} |
|
|
\label{eq:theta1soln} |
|
|
\bOmega_i^{(\nu+1)}=m_i\left\{ |
|
|
\tr \left[ \left( \der_{\bOmega_i} \bOmega \right) |
|
|
(\bZ\trans\bZ+\bOmega^{(\nu)})^{-1} |
|
|
\right]+n\bB_i^{(\nu)}{\bB_i^{(\nu)}}\trans\right\}^{-1} |
|
|
\end{equation} |
|
|
for ML and |
|
|
\begin{equation} |
|
|
\label{eq:theta1Rsoln} |
|
|
\bOmega_i^{(\nu+1)}=m_i\left\{ |
|
|
\tr \left[ \left( \der_{\bOmega_i} \bOmega \right)\vb |
|
|
\right]+(n-p)\bB_i^{(\nu)}{\bB_i^{(\nu)}}\trans\right\}^{-1} |
|
|
\end{equation} |
|
|
for REML. |
|
|
\subsection{Gradient evaluations} |
|
|
\label{ssec:Gradient} |
|
|
|
|
|
The gradients of (\ref{eq:ProfiledLogLik}) and (\ref{eq:ProfiledLogRestLik}) |
|
|
with respect to $\bOmega_i$ are |
|
|
\begin{align} |
|
|
\label{eq:gradDev} |
|
|
\nabla_{\bOmega_i}f&=\tr\left[\der_{\bOmega_i}\bOmega\left( |
|
|
(\bZ\trans\bZ+\bOmega)^{-1}-\bOmega^{-1}+ |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}} |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}}\trans\right)\right]\\ |
|
|
\label{eq:gradDevRest} |
|
|
\nabla_{\bOmega_i} f_R&=\tr\left[\der_{\bOmega_i}\bOmega\left( |
|
|
\vb-\bOmega^{-1}+ |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}_R} |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}_R}\trans\right)\right] |
|
|
\end{align} |
|
|
and we have already described how to calculate all those terms. |
|
|
|
|
|
\section{Evaluation of the Hessian} |
|
|
\label{sec:Hessian} |
|
|
|
|
|
The Hessians of the scalar functions $f$ and $f_R$ are symmetric |
|
|
matrices of second derivatives. As for the calculation of the |
|
|
gradient, we first evaluate the Hessian with respect to the |
|
|
$Q=\sum_{i=1}^k q_i^2$ dimensional vector |
|
|
$\bomega=\left[\bomega_1\trans,\dots,\bomega_k\trans\right]\trans$ |
|
|
where $\bomega_i=\ovec\bOmega_i$ and `$\ovec$' is the vectorization |
|
|
function that produces a column vector from a matrix by concatenating |
|
|
the columns. That is, we temporarily ignore the fact that the |
|
|
$\bOmega_i$ must be positive definite and symmetric and we consider each of the |
|
|
elements in these matrices separately. In \S\ref{sec:Unconstrained} |
|
|
we describe an unconstrained parameter vector $\btheta$ of length |
|
|
$\sum_{i=1}^k\binom{q_i+1}{2}$ and the conversion of the gradient and |
|
|
Hessian for $\bomega$ to the corresponding quantities for $\btheta$. |
|
|
|
|
|
For convenience we index the $Q$ elements of $\bomega$ by triplets |
|
|
$i:r,c$ denoting the element at row $r$ and column $c$ of |
|
|
$\bOmega_i$. \citet{bate:debr:2004} show that the element in row |
|
|
$i:r,c$ and column $j:s,t$ of the Hessian is |
|
|
\begin{equation} |
|
|
\label{eq:HessianML} |
|
|
\begin{aligned} |
|
|
\left\{\nabla_{\bomega}^2 f\right\}_{i:r,c;j:s,t}= |
|
|
&\tr\left[\der_{i:r,c}\left(\der_{j:s,t}\bOmega\right) |
|
|
\left(\left(\bZ\trans\bZ+\bOmega\right)\inv-\bOmega\inv+ |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}} |
|
|
\frac{{\widehat{\bb}}\trans}{\widehat{\sigma}}\right)\right]\\ |
|
|
&-\tr\left[\der_{i;r,c}\bOmega\left(\bZ\trans\bZ+\bOmega\right)\inv |
|
|
\der_{j;s,t}\bOmega\left(\bZ\trans\bZ+\bOmega\right)\inv\right]\\ |
|
|
&+\tr\left[\left(\der_{i;r,c}\bOmega\right)\bOmega\inv |
|
|
\left(\der_{j;s,t}\bOmega\right)\bOmega\inv\right]\\ |
|
|
&-2\frac{{\widehat{\bb}}\trans}{\widehat{\sigma}} |
|
|
\left(\der_{i;r,c}\bOmega\right)\vb |
|
|
\left(\der_{j;s,t}\bOmega\right)\frac{\widehat{\bb}}{\widehat{\sigma}}\\ |
|
|
&-\frac{1}{n} |
|
|
\left(\frac{{\widehat{\bb}}\trans}{\widehat{\sigma}} |
|
|
\left(\der_{i;r,c}\bOmega\right) |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}}\right) |
|
|
\left(\frac{{\widehat{\bb}}\trans}{\widehat{\sigma}} |
|
|
\left(\der_{j;s,t}\bOmega\right) |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}}\right) |
|
|
\end{aligned} |
|
|
\end{equation} |
|
|
for ML estimates, and |
|
288 |
\begin{equation} |
\begin{equation} |
289 |
\label{eq:HessianREML} |
\label{eq:lmePerm} |
290 |
\begin{aligned} |
\by=\bX\bbeta+\bZ\bP\trans\bP\bb+\beps\quad |
291 |
\left\{\nabla_{\bomega}^2 f_R\right\}_{i:r,c;j:s,t}= |
\beps\sim\mathcal{N}(\bzer,\sigma^2\bI), |
292 |
&\tr\left[\der_{i:r,c}\left(\der_{j:s,t}\bOmega\right) |
\bb\sim\mathcal{N}(\bzer,\sigma^2\bSigma), |
293 |
\left(\vb-\bOmega\inv+ |
\beps\perp\bb |
|
\frac{\widehat{\bb}}{\widehat{\sigma}} |
|
|
\frac{{\widehat{\bb}}\trans}{\widehat{\sigma}}\right)\right]\\ |
|
|
&-\tr\left[\der_{i;r,c}\bOmega\vb |
|
|
\der_{j;s,t}\bOmega\vb\right]\\ |
|
|
&+\tr\left[\left(\der_{i;r,c}\bOmega\right)\bOmega\inv |
|
|
\left(\der_{j;s,t}\bOmega\right)\bOmega\inv\right]\\ |
|
|
&-2\frac{{\widehat{\bb}}\trans}{\widehat{\sigma}_R} |
|
|
\left(\der_{i;r,c}\bOmega\right)\vb |
|
|
\left(\der_{j;s,t}\bOmega\right)\frac{\widehat{\bb}}{\widehat{\sigma}_R}\\ |
|
|
&-\frac{1}{n} |
|
|
\left(\frac{{\widehat{\bb}}\trans}{\widehat{\sigma}_R} |
|
|
\left(\der_{i;r,c}\bOmega\right) |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}_R}\right) |
|
|
\left(\frac{{\widehat{\bb}}\trans}{\widehat{\sigma}_R} |
|
|
\left(\der_{j;s,t}\bOmega\right) |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}_R}\right) |
|
|
\end{aligned} |
|
294 |
\end{equation} |
\end{equation} |
|
for REML. |
|
295 |
|
|
|
The matrix $\der_{j:s,t}\bOmega$ is constant so the first term in both |
|
|
(\ref{eq:HessianML}) and (\ref{eq:HessianREML}) is zero. |
|
296 |
|
|
297 |
\section{Unconstrained parameterization} |
\subsection{Extent of the sparsity} |
298 |
\label{sec:Unconstrained} |
\label{sec:sparsity} |
299 |
|
|
300 |
The vector |
Table~\ref{tbl:sparse} shows the extent of the sparsity of the |
301 |
$\btheta=\left[\btheta_1\trans,\dots,\btheta_k\trans\right]\trans$ is |
matrices $\bZ$, $\bZ\trans\bZ$ and $\bL$ in our examples. |
|
an unconstrained parameterization of $\bOmega$ based on the LDL form |
|
|
of the Cholesky decomposition of the $\bOmega_i$ |
|
|
\begin{equation} |
|
|
\label{eq:OmegaLDL} |
|
|
\bOmega_i=\bL_i\bD_i\bL_i\trans . |
|
|
\end{equation} |
|
|
where $\bL_i$ is a $q_i\times q_i$ unit lower triangular matrix and |
|
|
$\bD_i$ is a $q_i\times q_i$ diagonal matrix with positive diagonal |
|
|
elements. We use the $q_i$ logarithms of the diagonal elements of |
|
|
$\bD_i$, written $\bdelta_i$, and, when $q_i>1$, the row-wise |
|
|
concatenation of the $\binom{q_i}{2}$ elements in the strict lower |
|
|
triangle of $\bL_i$, written $\blambda_i$, as the $\binom{q+1}{2}$ |
|
|
dimensional unconstrained parameter |
|
|
$\btheta_i=\left[\bdelta_i\trans,\blambda_i\trans\right]\trans$ |
|
|
|
|
|
|
|
|
Let $g$ be any scalar function of the $\bOmega_i$ such that the |
|
|
$q_i\times q_i$ gradient matrices $\nabla_{\bdelta_i}f$ are symmetric. |
|
|
(Both $f$, the objective for ML estimation, and $f_R$, the objective |
|
|
for REML estimation, are such functions.) The components |
|
|
$\nabla_{\bdelta_i}g$ and $\nabla_{\blambda_i}g$ of the gradient |
|
|
vector $\nabla_{\btheta_i}g$ can be evaluated from the symmetric |
|
|
gradient matrix $\nabla_{\bOmega_i}g$ and the |
|
|
derivative of the product (\ref{eq:OmegaLDL}). For example, |
|
|
\begin{equation} |
|
|
\label{eq:deltaPartial} |
|
|
\begin{aligned} |
|
|
\left\{\nabla_{\bdelta_i}g\right\}_j |
|
|
&=\tr\left[\left(\nabla_{\bOmega_i}g\right) |
|
|
\frac{\partial \bOmega_i}{\partial \left\{\bdelta_i\right\}_j}\right]\\ |
|
|
&=\tr\left[\left(\nabla_{\bOmega_i}g\right)\bL_i\frac{\partial{\bD_i}} |
|
|
{\partial\left\{\bdelta_i\right\}_j}\bL_i\trans\right]\\ |
|
|
&=\exp{\left\{\bdelta_i\right\}_j}\bbe_j\trans |
|
|
\bL_i\left(\nabla_{\bOmega_i}g\right)\trans\bL_i\bbe_j\\ |
|
|
&=\left\{\bR_i\left(\nabla_{\bOmega_i}g\right)\bR_i\trans\right\}_{j,j} |
|
|
\end{aligned} |
|
|
\end{equation} |
|
|
for $j=1,\dots,q_i$, where $\bR_i$ is the upper Cholesky factor of $\bOmega_i$. |
|
|
The partial derivative with respect to |
|
|
element $t$ of $\blambda_i$, which determines the $r,c$th element of |
|
|
$\bL_i$, is |
|
|
\begin{equation} |
|
|
\label{eq:lambdaPartial} |
|
|
\begin{aligned} |
|
|
\left\{\nabla_{\blambda_i}g\right\}_t |
|
|
&=\tr\left[\left(\nabla_{\bOmega_i}g\right) |
|
|
\frac{\partial\bOmega_i}{\partial\left\{\blambda_i\right\}_t}\right]\\ |
|
|
&=\tr\left[\left(\nabla_{\bOmega_i}g\right) |
|
|
\bbe_r\bbe_c\trans\bD_i\bL_i\trans |
|
|
+\left(\nabla_{\bOmega_i}g\right) |
|
|
\bL_i\bD_i\bbe_c\bbe_r\trans\right]\\ |
|
|
&=\bbe_c\trans\bD_i\bL_i\trans\left(\nabla_{\bOmega_i}g\right)\bbe_r |
|
|
+\bbe_r\trans\left(\nabla_{\bOmega_i}g\right)\bL_i\bD_i\bbe_c\\ |
|
|
&=2\left\{\left(\nabla_{\bOmega_i}g\right)\bL_i\bD_i\right\}_{c,r}\\ |
|
|
&=2\left\{\left(\nabla_{\bOmega_i}g\right)\bR_i\trans\bD_i^{1/2}\right\}_{c,r} |
|
|
\end{aligned} |
|
|
\end{equation} |
|
302 |
|
|
303 |
\section{Acknowledgements} |
The matrix $\bL$ is the supernodal representation of the left Cholesky |
304 |
\label{sec:Ack} |
factor of $\bP\left(\bZ\trans\bZ+\bOmega\right)\bP\trans$. Because |
305 |
|
the fill-reducing permutation $\bP$ has been applied the number of |
306 |
This work was supported by U.S.{} Army Medical Research and Materiel |
nonzeros in $\bL$ will generally be less than the number of nonzeros |
307 |
Command under Contract No.{} DAMD17-02-C-0119. The views, opinions |
in the left Cholesky factor of $\bZ\trans\bZ+\bOmega$. However, when |
308 |
and/or findings contained in this report are those of the authors and |
any supernodes of $\bL$ contain more than one column there will be |
309 |
should not be construed as an official Department of the Army |
elements above the diagonal of $\bL$ stored and these elements are |
310 |
position, policy or decision unless so designated by other |
necessarily zero. They are stored in the supernodal factorization so |
311 |
documentation. |
that the diagonal block for a supernode can be treated as a dense |
312 |
|
rectangular matrix. |
313 |
\appendix |
|
314 |
|
Generally storing these few elements from above the diagonal is a |
315 |
\section{Sparse matrix formats} |
minor expense compared to the speed savings of using dense matrix |
316 |
\label{app:Sparse} |
operations and being about to take advantage of accelerated level-3 |
317 |
|
BLAS (Basic Linear Algebra Subroutines) code. Furthermore they are |
318 |
The basic form of sparse matrix storage, called the \emph{triplet} |
never used in computation because the diagonal block is treated as a |
319 |
form, represents the matrix as three arrays of length \code{nz}, which |
densely stored lower triangular matrix. We mention this so the |
320 |
is the number of nonredundant nonzero elements in the matrix. The |
interested reader can understand in \code{Cm1}, for example, |
321 |
\code{i} array contains the row indices, the \code{j} array the column |
$\nz(\bZ\trans\bZ)$ is 54 ($=18\times 3$) yet $\nz(\bL)$ is 72 |
322 |
indices, and the \code{x} array the values of the nonzero elements. A |
($=18\times 4$). In this example there is no fill-in so the number of |
323 |
\emph{column-oriented} triplet form requires that the elements of |
nonzeros in $\bL$ is actually 54 but the matrix is stored as 18 |
324 |
these arrays be such that \code{j} is nondecreasing. A sorted, |
lower-triangular blocks that are each $2\times 2$. |
325 |
column-oriented triplet form is such that elements in \code{i} |
\begin{table}[tb] |
326 |
corresponding to the same \code{j} index are in increasing order. That |
\centering |
327 |
is, the arrays are ordered by column and, within column, by row. |
\begin{tabular}{r r r r r r r r r r r} |
328 |
|
\hline\\ |
329 |
In column-oriented storage, multiple elements in the same column |
\multicolumn{1}{c}{Name}&\multicolumn{1}{c}{$n$}& \multicolumn{1}{c}{$k$}& |
330 |
produce runs in \code{j}. A \emph{compressed}, column-oriented |
\multicolumn{1}{c}{$q$}& |
331 |
storage form replaces \code{j} by \code{p}, an array of pointers to |
\multicolumn{1}{c}{$\nz(\bZ)$}&\multicolumn{1}{c}{$\spr(\bZ)$}& |
332 |
the indices (in \code{i} and \code{x}) of the first element in each |
\multicolumn{1}{c}{$\nz(\bZ\trans\bZ)$}&\multicolumn{1}{c}{$\spr(\bZ\trans\bZ)$}& |
333 |
column. If there are $n$ columns in the matrix, this array is of |
\multicolumn{1}{c}{$\nz(\bL)$}&\multicolumn{1}{c}{$\spr(\bL)$}\\ |
334 |
length $n+1$, with the last element of \code{p} being one greater than |
\hline |
335 |
the index of last element in the last column. Then the differences of |
\code{Sm1}& 180&1& 36& 360&0.0556& 54&0.0811& 72&0.1081\\ |
336 |
successive elements of \code{p} give the number of nonzeros in each |
\code{Cm1}&31022&2& 36& 360&0.0556& 54&0.0811& 72&0.1081\\ |
337 |
column. |
\hline |
338 |
|
\end{tabular} |
339 |
In the implementation in the \code{Matrix} package for R, the indices |
\caption{Summary of the sparsity of the model matrix $\bZ$, its |
340 |
\code{i}, \code{j}, and \code{p} are \code{0}-based, as in the C |
crossproduct matrix $\bZ\trans\bZ$ and the left Cholesky factor |
341 |
language, (i.e.{} the first element of an array has index \code{0}) |
$\bL$ in the examples. The notation $\nz{}$ indicates the number of nonzeros in the |
342 |
not \code{1}-based, as in the S language. Thus the first element of |
matrix and $\spr{}$ indicates the sparsity index (the fraction of the elements |
343 |
\code{p} is always zero. |
in the matrix that are non-zero). Because $\bZ\trans\bZ$ is |
344 |
|
symmetric only the nonzeros in the upper triangle are counted and |
345 |
|
the sparsity index is relative to the total number of elements in |
346 |
\section{Blocked sparse matrix operations} |
the upper triangle. The number of nonzeros and the sparsity index |
347 |
\label{app:blocked} |
of $\bL$ are calculated for the supernodal decomposition, which |
348 |
|
can include some elements that are above the diagonal and hence must |
349 |
We take advantage of the blocked sparse structure of $\bZ\trans\bZ$, |
be zero.} |
350 |
$\bOmega$, $\bL$ and $\bD$ in operations involving these matrices. |
\label{tbl:sparse} |
351 |
Consider, for example, the operation of solving for $\tRZX$ in the |
\end{table} |
|
system (\ref{eq:RZX}), |
|
|
$\bL\bD^{\mathsf{T}/2}\tRZX=\bZ\trans\tilde{\bX}$. Let $\bu$ be a |
|
|
column of $\bD^{\mathsf{T}/2}\tRZX$ and $\bv$ be the corresponding |
|
|
column of $\bZ\trans\tilde{\bX}$. The corresponding column of the |
|
|
result, which is $\bD^{-\mathsf{T}/2}\bu$, is easily evaluated from |
|
|
$\bu$. (Recall that $\bD^{-1/2}$ is block/block diagonal, i.e. block |
|
|
diagonal in $k$ outer blocks of sizes $m_i q_i\times m_i q_i$ each of |
|
|
which is itself block diagonal in $m_i$ blocks of size $q_i\times |
|
|
q_i$, and that $\bD^{-1/2}$ is calculated and stored during the |
|
|
inversion step.) Both $\bu$ and $\bv$ can be divided into $k$ outer |
|
|
blocks of sizes $m_i q_i$, which we denote $\bu_i$ and $\bv_i$, and |
|
|
each of the outer blocks can be divided into $m_i$ inner blocks of |
|
|
size $q_i$, which we denote $\bu_{i:j}$ and $\bv_{i:j}$. |
|
|
|
|
|
Because $\bL$ is lower triangular, the blocks of $\bv$ can be |
|
|
evaluated in a blockwise forward solve. That is, the blocked |
|
|
equations are solved in the order |
|
|
\begin{equation} |
|
|
\label{eq:blockwiseForward} |
|
|
\begin{aligned} |
|
|
\bL_{(1,1)}\bu_1 &= \bv_1\quad\Longrightarrow\quad \bI\bu_1 = |
|
|
\bu_1 = \bv_1\\ |
|
|
\bL_{(2,2)}\bu_2 &= \bv_2-\bL_{(2,1)}\bu_1\\ |
|
|
&\vdots\\ |
|
|
\bL_{(k,k)}\bu_k &= \bv_k-\bL_{(k,k-1)}\bu_{k-1}-\dots-\bL_{(k,1)}\bu_1 |
|
|
\end{aligned} |
|
|
\end{equation} |
|
|
Notice that the solution in the first block does not require any |
|
|
calculation because $\bL_{(1,1)}$ is always the identity. In many |
|
|
applications $k=1$ and the operation of solving for $\bL\bu=\bv$ can |
|
|
be skipped entirely. In most other applications the size of $\bu_1$ |
|
|
dominates the size of the other components so that being able to set |
|
|
$\bu_1=\bv_1$ represents a tremendous savings in computational effort. |
|
|
|
|
|
We take advantage of the blocked, sparse columns of the outer blocks |
|
|
of $\bL$ in evaluations of the form $\bv_2-\bL_{(2,1)}\bu_{1}$. That |
|
|
is, we evaluate $\bL_{(2,1)}\bu_{1}$ using the blocks of size |
|
|
$q_2\times q_1$ skipping blocks that we know will be zero as we update |
|
|
the inner blocks of $\bu_2$ in place. The solutions of systems of the |
|
|
form $\bL_{(i,i)}\bu_i=\tilde{\bv}_i$ where $\tilde{\bv}_i$ is the |
|
|
updated $\bv_i$ are also done in inner blocks taking advantage of the |
|
|
sparsity of $\bL_{(i,i)}^{-1}$. Recall that in the case of nested grouping |
|
|
factors the $\bL_{(i,i)}$ are all identity matrices and this step is |
|
|
skipped. |
|
352 |
|
|
353 |
|
|
354 |
\subsection{Accumulating diagonal blocks of the inverse} |
\section{Likelihood and restricted likelihood} |
355 |
\label{ssec:InverseBlocks} |
\label{sec:likelihood} |
356 |
|
|
357 |
The ECME steps and the gradient evaluation require |
In general the \emph{maximum likelihood estimates} of the parameters |
358 |
\begin{equation} |
in a statistical model are those values of the parameters that |
359 |
\label{eq:InverseBlocks} |
maximize the likelihood function, which is the same numerical value as |
360 |
\tr\left[\der_{\bOmega_i}\bOmega\left(\bZ\trans\bZ+\bOmega\right)^{-1}\right] |
the probability density of $\by$ given the parameters but regarded as |
361 |
= \tr\left[\left(\der_{\bOmega_i}\bOmega\right) |
a function of the parameters given $\by$, not as a function of $\by$ |
362 |
\left(\bD^{-\mathsf{T}/2}\bL\inv\right)\trans |
given the parameters. |
363 |
\left(\bD^{-\mathsf{T}/2}\bL\inv\right)\right] |
|
364 |
|
For model (\ref{eq:lmePerm}) the parameters are |
365 |
|
$\bbeta$, $\sigma^2$ and $\btheta$ (as described in |
366 |
|
\S\ref{sec:relativePrecision}, $\btheta$ and $\sigma^2$ jointly |
367 |
|
determine $\bSigma$) so we evaluate the likelihood $L(\bbeta, |
368 |
|
\sigma^2, \btheta|\by)$ as |
369 |
|
\begin{equation} |
370 |
|
\label{eq:Likelihood1} |
371 |
|
L(\bbeta, \sigma^2, \btheta|\by) = f_{\by|\bbeta,\sigma^2,\btheta}(\by|\bbeta,\sigma^2,\btheta) |
372 |
|
\end{equation} |
373 |
|
where $f_{\by|\bbeta,\sigma^2,\btheta}(\by|\bbeta,\sigma^2,\btheta)$ |
374 |
|
is the marginal probability density for $\by$ given the parameters. |
375 |
|
|
376 |
|
Because we will need to write several different marginal and |
377 |
|
conditional probability densities in this section and expressions like |
378 |
|
$f_{\by|\bbeta,\sigma^2,\btheta}(\by|\bbeta,\sigma^2,\btheta)$ are |
379 |
|
difficult to read, we will adopt a convention sometimes used in the |
380 |
|
Bayesian inference literature that a conditional expression in square |
381 |
|
brackets indicates the probability density of the quantity on the left |
382 |
|
of the $|$ given the quantities on the right of the $|$. That is |
383 |
|
\begin{equation} |
384 |
|
\label{eq:Bayesnotation} |
385 |
|
\left[\by|\bbeta,\sigma^2,\btheta\right]=f_{\by|\bbeta,\sigma^2,\btheta}(\by|\bbeta,\sigma^2,\btheta) |
386 |
|
\end{equation} |
387 |
|
|
388 |
|
Model (\ref{eq:lmePerm}) specifies the conditional distributions |
389 |
|
\begin{equation} |
390 |
|
\label{eq:ycond} |
391 |
|
\left[\by|\bbeta,\sigma^2,\bb\right]= |
392 |
|
\frac{\exp\left\{-\|\by-\bX\bbeta-\bZ\bP\trans\bP\bb\|^2/\left(2\sigma^2\right)\right\}} |
393 |
|
{\left(2\pi\sigma^2\right)^{n/2}} |
394 |
|
\end{equation} |
395 |
|
and |
396 |
|
\begin{equation} |
397 |
|
\label{eq:bmarg} |
398 |
|
\begin{split} |
399 |
|
\left[\bb|\btheta,\sigma^2\right]&= |
400 |
|
\frac{\exp\left\{-\bb\trans\bSigma^{-1}\bb/2\right\}} |
401 |
|
{|\bSigma|^{1/2}\left(2\pi\right)^{q/2}}\\ |
402 |
|
&=\frac{|\bOmega|^{1/2}\exp\left\{-\bb\trans\bOmega\bb/\left(2\sigma^2\right)\right\}} |
403 |
|
{\left(2\pi\sigma^2\right)^{q/2}} |
404 |
|
\end{split} |
405 |
|
\end{equation} |
406 |
|
from which we can derive the marginal distribution |
407 |
|
\begin{equation} |
408 |
|
\label{eq:ymarg} |
409 |
|
\begin{split} |
410 |
|
\left[\bb|\btheta,\sigma^2\right]&= |
411 |
|
\int_{\bb}\left[\by|\bbeta,\sigma^2,\bb\right]\left[\bb|\btheta,\sigma^2\right]\,d\bb\\ |
412 |
|
\frac{|\bOmega|^{1/2}}{\left(2\pi\sigma^2\right)^{n/2}} |
413 |
|
\int_{\bb} |
414 |
|
\frac{\exp\left\{-\left[\|\by-\bX\bbeta-\bZ\bP\trans\bP\bb\|^2+\bb\trans\bOmega\bb\right] |
415 |
|
/\left(2\sigma^2\right)\right\}} |
416 |
|
{\left(2\pi\sigma^2\right)^{q/2}}\,d\bb |
417 |
|
\end{split} |
418 |
\end{equation} |
\end{equation} |
|
which will be the sum of the crossproducts of the $m_i$ inner blocks |
|
|
of $q_i$ columns in the $i$th outer block of columns of |
|
|
$\bD^{-\mathsf{T}/2}\bL\inv$. |
|
|
|
|
|
When $i=k=1$ the result is |
|
|
$\sum_{j=1}^{m_1}\bD^{-1/2}_{1:j}\bD^{-\mathsf{T}/2}_{1:j}$, which can |
|
|
be evaluated in a single call to the level-3 BLAS routine |
|
|
\code{dsyrk}. (Having the 3 dimensional array containing the $m_i$ |
|
|
inner blocks of the $i$th outer block of $\bD^{-1/2}$ in the form that |
|
|
allows this to be done in a single, level-3 BLAS call is the reason |
|
|
that we store and invert the upper triangular factors of inner blocks |
|
|
of $\bD$.) |
|
|
|
|
|
When $i=1$ and $k>1$ we initialize the result from the $(1,1)$ block |
|
|
as above, then add the crossproducts of inner blocks of columns in the |
|
|
outer $(2,1)$ block, the outer $(3,1)$ block, and so on. These blocks |
|
|
of columns are calculated sparsely. During the initial decomposition |
|
|
we evaluate and store (in blocked, sparse format) $\bL^{-1}_{(i,i)}$ |
|
|
for $1<i\le k$. |
|
419 |
|
|
420 |
\bibliography{lme4} |
\bibliography{lme4} |
421 |
\end{document} |
\end{document} |
|
|
|
|
|
|
|
\section{Frechet derivatives} |
|
|
\label{sec:Frechet} |
|
|
|
|
|
In sections that follow we will use the symbol $\der$ for the Frechet |
|
|
derivative. When used with a vector subscript, such as in |
|
|
$\der_{\theta_i}\bOmega$, it represents a three dimensional array |
|
|
where each face is the same size as $\bOmega$ and the number of faces |
|
|
is the number of elements in $\btheta_i$. When used with a matrix |
|
|
subscript, such as $\der_{\bOmega_i}\bOmega$ it represents a four |
|
|
dimensional array. When used without a subscript it indicates the |
|
|
pattern that the expression will take when a specific subscript is |
|
|
used. |
|
|
|
|
|
In the gradient and ECME calculations the $\der$ symbol occurs in |
|
|
expressions of the form |
|
|
\begin{equation*} |
|
|
\tr\left[(\der\bOmega)(\bM)\right] |
|
|
\end{equation*} |
|
|
where $\tr$ denotes the trace operator and $\bM$ is an |
|
|
$M\times M$ matrix ($M=\sum_{i=1}^k m_i q_i$). If we use |
|
|
a vector subscript on $\der$, the expression evaluates to a vector of the same length |
|
|
as the subscript vector. If we use a matrix subscript, it |
|
|
evaluates to a matrix of the same dimensions as the subscript matrix. |
|
|
|
|
|
|
|
|
Let us consider the specific example of |
|
|
$\tr\left[\left(\der\bOmega\right)\bOmega^{-1}\right]$. |
|
|
\begin{equation} |
|
|
\label{eq:derOmegaInv} |
|
|
\begin{aligned} |
|
|
\tr\left[\left(\der_{\btheta_i}\bOmega\right)\bOmega^{-1}\right] |
|
|
&= \bJ_i\ovec\left(\tr\left[\left(\der_{\bOmega_i}\right)\bOmega^{-1} |
|
|
\right]\right)\\ |
|
|
&= \bJ_i\ovec\left(\sum_{j=1}^{m_i}\bOmega^{-1}_{i,i,j,j}\right)\\ |
|
|
&= m_i\bJ_i\ovec\left[\left(\bOmega_i\right)^{-1}\right] |
|
|
\end{aligned} |
|
|
\end{equation} |
|
|
|
|
|
\section{Gradient and ECME step evaluations} |
|
|
\label{sec:gradients} |
|
|
|
|
|
The gradients of \ref{eq:ProfiledLogLik} and \ref{eq:ProfiledLogRestLik} |
|
|
are |
|
|
\begin{align} |
|
|
\label{eq:gradDev} |
|
|
\nabla f&=\tr\left[\der\bOmega\left( |
|
|
(\bZ\trans\bZ+\bOmega)^{-1}-\bOmega^{-1}+ |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}} |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}}\trans\right)\right]\\ |
|
|
\nonumber |
|
|
&=\tr\left[\left(\der \bOmega\right)(\bZ\trans\bZ+\bOmega)^{-1}\right] |
|
|
-\tr\left[\left(\der \bOmega\right)\bOmega^{-1}\right] |
|
|
+\tr\left[\left(\der \bOmega\right)\left(\frac{\widehat{\bb}}{\widehat{\sigma}} |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}}\trans\right)\right]\\ |
|
|
\label{eq:gradDevRest} |
|
|
\nabla f_R&=\tr\left[\der\bOmega\left( |
|
|
\vb-\bOmega^{-1}+ |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}_R} |
|
|
\frac{\widehat{\bb}}{\widehat{\sigma}_R}\trans\right)\right]\\ |
|
|
\nonumber |
|
|
&=\tr\left[\left(\der \bOmega\right)\vb\right] |
|
|
-\tr\left[\left(\der \bOmega\right)\bOmega^{-1}\right] |
|
|
+\tr\left[\left(\der \bOmega\right)\left(\frac{\widehat{\bb}} |
|
|
{\widehat{\sigma}_R}\frac{\widehat{\bb}}{\widehat{\sigma}_R} |
|
|
\trans\right)\right] |
|
|
\end{align} |
|
|
where $\der$ denotes the Frechet derivative. |
|
|
|
|
|
The gradient calculation is closely related to the update step in the |
|
|
ECME algorithm described in \citep{bate:debr:2004}. This algorithm, |
|
|
which provides a stable way of refining starting estimates |
|
|
$\btheta^{(0)}$ to get into the region of the optimum, determines |
|
|
$\btheta^{(\nu+1)}$ from $\btheta^{(\nu)}$ as the value that satisfies |
|
|
|
|
|
Let us consider each of the terms in (\ref{eq:gradDev}), |
|
|
(\ref{eq:gradDevRest}), (\ref{eq:theta1}). and (\ref{eq:theta1R}) when |
|
|
the derivative is with respect to $\bOmega_i$. We have already shown that |
|
|
\begin{equation} |
|
|
\label{eq:InvDer} |
|
|
\tr\left[\left(\der_{\bOmega_i}\bOmega\right)\bOmega^{-1}\right] |
|
|
=m_i \bOmega_i^{-1} |
|
|
\end{equation} |
|
|
|
|
|
The Frechet derivative |
|
|
with respect to some parameterization of $\bOmega$ is a three |
|
|
dimensional array of size $M\times M\times \ell$ where |
|
|
$M=\sum_{i=1}^k m_i q_i$ is the size of $\bOmega$ and $\ell$ is the |
|
|
dimension of the parameter vector. Because each of element of |
|
|
$\btheta$ affects only one $\bOmega_i$, we can evaluate the gradients |
|
|
and the ECME updates by first evaluating the derivatives with respect to |
|
|
the elements of $\bOmega_i$. For each of the terms in the expressions |
|
|
for the gradients and the ECME update we will form a list of $k$ |
|
|
matrices, the $i$th of which is of size $q_i\times q_i$. For example, |
|
|
\begin{equation} |
|
|
\label{eq:gradOmegaInv} |
|
|
\tr\left[\left(\der_{\bOmega_i}\bOmega\right)\bOmega^{-1}\right] |
|
|
=m_i\tr\left[\left(\der_{\bOmega_i}\bOmega_i\right)\bOmega_i^{-1}\right] |
|
|
\end{equation} |
|
|
is a $q_i\times q_i$ matrix with |
|
|
\begin{equation} |
|
|
\label{eq:gradOmegaInvRC} |
|
|
\left\{\tr\left[\left(\der_{\bOmega_i}\bOmega\right)\bOmega^{-1} |
|
|
\right]\right\}_{r,c} |
|
|
=m_i\tr\left[\bbe_c\trans\bbe_r\bOmega_i^{-1}\right] |
|
|
=m_i \bbe_r\bOmega_i^{-1}\bbe_c\trans |
|
|
=m_i\left\{\bOmega_i^{-1}\right\}_{r,c} |
|
|
\end{equation} |
|
|
in row $r$ and column $c$. |
|
|
|
|
|
|
|
|
The slots include \code{ZtX}, which contains $\bZ\trans\tilde{\bX}$ |
|
|
stored densely; \code{XtX}, which contains |
|
|
$\tilde{\bX}\trans\tilde{\bX}$ stored densely; and the triplet of |
|
|
slots \code{ZZp}, \code{ZZi}, and \code{ZZx} that provide the sparse, |
|
|
symmetric, blocked storage for $\bZ\trans\bZ$. As described in |
|
|
Appendix~\ref{app:Sparse}, the column pointers \code{ZZp} and the row |
|
|
indices \code{ZZi} are from the sparse symmetric representation of the |
|
|
pairwise crosstabulation matrix. The column pointer vector is of |
|
|
length $M+1$ where $M=\sum_{i=1}^k m_i$ is the size of the pairwise |
|
|
crosstabulation. The length of the row index vector is the number of |
|
|
nonzeros in the upper triangle of the pairwise crosstabulation. The |
|
|
\code{ZZx} slot is a list of $\binom{k+1}{2}$ 3-dimensional numeric |
|
|
arrays, corresponding to the outer blocks of rows and columns in the |
|
|
upper triangle of $\bZ\trans\bZ$. The array for the $(i,j)$ block is |
|
|
of size $q_i\times q_j\times\mathtt{nz}_{i,j}$, where |
|
|
$\mathtt{nz}_{i,j}$ is the number of nonzeros in the $(i,j)$ block of |
|
|
the pairwise crosstabulation. |
|
|
|
|
|
The \code{Omega} slot is a list of $k$ matrices of sizes $q_i\times |
|
|
q_i$ that contain the $\bOmega_i$. Only the upper triangles of these |
|
|
symmetric matrices are stored. Initial values for these matrices are |
|
|
calculated from the $\bZ_i$ in a separate function call. |
|
|
|
|
|
|
|
|
The $\binom{q_i+1}{2}$ dimensional vector $\btheta_i$ is an |
|
|
unconstrained parameter for $\bOmega_i$. When $q_i=1$, $\bOmega_i$ is |
|
|
a $1\times 1$ positive definite matrix, which we can consider as a |
|
|
positive scalar, and we set |
|
|
$\btheta_i=\log\left(\left\{\bOmega_i\right\}_{1,1}\right)$. Details |
|
|
of the parameterization for $q_i>1$, including the |
|
|
$\binom{q_i+1}{2}\times q_i^2$ Jacobian |
|
|
$\bJ_i=\nabla_{\btheta_i}\ovec(\bOmega_i)$ and the |
|
|
$\binom{q_i+1}{2}\times\binom{q_i+1}{2}\times q_i^2$ second derivative |
|
|
array $\nabla_{\btheta_i}^2\ovec(\bOmega_i)$, are given in |
|
|
|
|
|
For $g_i$ a scalar function of $\bOmega_i$ we denote by |
|
|
$\nabla_{\Omega_i}g_i$ the $q_i\times q_i$ gradient matrix with elements |
|
|
\begin{equation} |
|
|
\label{eq:GradientMatrix} |
|
|
\left\{\nabla_{\Omega_i}g_i\right\}_{s,t}= |
|
|
\frac{\partial g_i}{\partial\left\{\bOmega_i\right\}_{s,t}}\quad s,t=1,\dots,q_i . |
|
|
\end{equation} |
|
|
Although $\bOmega_i$ is required to be symmetric, we do not impose |
|
|
this restriction on $\nabla_{\Omega_i}g_i$. (However, for all the |
|
|
functions $g_i(\bOmega_i)$ that we will consider this gradient is |
|
|
indeed symmetric.) |
|
|
|
|
|
The $\binom{q_i+1}{2}$ dimensional gradient |
|
|
\begin{equation} |
|
|
\label{eq:Chainrule} |
|
|
\nabla_{\btheta_i}g_i(\bOmega_i) |
|
|
=\bJ_i \ovec(\nabla_{\bOmega_i}g_i) |
|
|
\end{equation} |
|
|
The $\binom{q_i+1}{2}\times\binom{q_i+1}{2}$ Hessian |
|
|
\begin{equation} |
|
|
\label{eq:ChainHessian} |
|
|
\nabla_{\btheta_i}^2 g_i(\bOmega_i) |
|
|
= \bJ_i \ovec(\nabla_{\bOmega_i}^2 g_i)\bJ_i\trans |
|
|
+\left[\nabla_{\btheta_i}^2\ovec(\bOmega_i)\right] |
|
|
\left[\ovec(\nabla_{\bOmega_i} g_i)\right] |
|
|
\end{equation} |
|
|
where the ``square bracket'' multiplication in the last term |
|
|
represents an inner product over the last index in |
|
|
$\nabla_{\btheta_i}^2\ovec(\bOmega_i)$. |
|
|
|
|
|
Eventually we want to produce the $Q=\sum_{i=1}^k\binom{q_i+1}{2}$ |
|
|
dimensional vector $\tr\left[(\der_{\btheta}\bOmega)(\bM)\right]$ for |
|
|
some matrix $\bM$ of the same size as $\bOmega$. We obtain the |
|
|
gradient with respect to $\btheta_i$ as |
|
|
\begin{equation} |
|
|
\label{eq:derthetai} |
|
|
\tr\left[(\der_{\btheta_i}\bOmega)\bM\right]= |
|
|
\bJ_i\ovec\left(\tr\left[(\der_{\bOmega_i}\bOmega)\bM\right]\right) |
|
|
\end{equation} |
|
|
and concatenate these vectors to produce |
|
|
$\tr\left[(\der_{\btheta}\bOmega)(\bM)\right]$. |
|