# SCM Repository

[matrix] Annotation of /pkg/inst/doc/Comparisons.Rnw
 [matrix] / pkg / inst / doc / Comparisons.Rnw

# Annotation of /pkg/inst/doc/Comparisons.Rnw

 1 : bates 10 \documentclass{article} 2 : \usepackage{myVignette} 3 : \usepackage[authoryear,round]{natbib} 4 : \bibliographystyle{plainnat} 5 : %%\VignetteIndexEntry{Comparisons of Least Squares calculation speeds} 6 : %%\VignetteDepends{Matrix} 7 : \begin{document} 8 : \SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,width=5,height=3,strip.white=TRUE} 9 : \setkeys{Gin}{width=\textwidth} 10 : \title{Comparing Least Squares Calculations} 11 : \author{Douglas Bates\\R Development Core Team\\\email{Douglas.Bates@R-project.org}} 12 : \date{\today} 13 : \maketitle 14 : \begin{abstract} 15 : Many statistics methods require one or more least squares problems 16 : to be solved. There are several ways to perform this calculation, 17 : using objects from the base R system and using objects in the 18 : classes defined in the \code{Matrix} package. 19 : 20 : We compare the speed of some of these methods on a very small 21 : example and on a example for which the model matrix is large and 22 : sparse. 23 : \end{abstract} 24 : <>= 25 : options(width=75) 26 : bates 188 sysgc.time <- function(expr, gcFirst = TRUE) 27 : { 28 : if (!exists("proc.time")) 29 : return(rep(as.numeric(NA), 5)) 30 : loc.frame <- parent.frame() 31 : if (gcFirst) 32 : gc(FALSE) 33 : on.exit(cat("Timing stopped at:", proc.time() - time, "\n")) 34 : expr <- substitute(expr) 35 : time <- proc.time() 36 : eval(expr, envir = loc.frame) 37 : new.time <- proc.time() 38 : on.exit() 39 : if (length(new.time) == 3) 40 : new.time <- c(new.time, 0, 0) 41 : if (length(time) == 3) 42 : time <- c(time, 0, 0) 43 : new.time - time 44 : } 45 : bates 10 @ 46 : 47 : \section{Linear least squares calculations} 48 : \label{sec:LeastSquares} 49 : 50 : Many statistical techniques require least squares solutions 51 : 52 : \label{eq:LeastSquares} 53 : \widehat{\bm{\beta}}= 54 : \arg\min_{\bm{\beta}}\left\|\bm{y}-\bX\bm{\beta}\right\|^2 55 : 56 : where $\bX$ is an $n\times p$ model matrix ($p\leq n$), $\bm{y}$ is 57 : $n$-dimensional and $\bm{\beta}$ is $p$ dimensional. Most statistics 58 : texts state that the solution to (\ref{eq:LeastSquares}) is 59 : 60 : \label{eq:XPX} 61 : \widehat{\bm{\beta}}=\left(\bX\trans\bX\right)^{-1}\bX\trans\bm{y} 62 : 63 : when $\bX$ has full column rank (i.e. the columns of $\bX$ are 64 : linearly independent) and all too frequently it is calculated in 65 : exactly this way. 66 : 67 : 68 : \subsection{A small example} 69 : \label{sec:smallLSQ} 70 : 71 : As an example, let's create a model matrix, \code{mm}, and corresponding 72 : response vector, \code{y}, for a simple linear regression model using 73 : the \code{Formaldehyde} data. 74 : <>= 75 : data(Formaldehyde) 76 : str(Formaldehyde) 77 : print(mm <- cbind(1, Formaldehyde$carb)) 78 : print(y <- Formaldehyde$optden) 79 : @ 80 : Using \code{t} to evaluate 81 : the transpose, \code{solve} to take an inverse, and the \code{\%*\%} 82 : operator for matrix multiplication, we can translate \ref{eq:XPX} into 83 : the \Slang{} as 84 : <>= 85 : solve(t(mm) %*% mm) %*% t(mm) %*% y 86 : @ 87 : 88 : On modern computers this calculation is performed so quickly that it cannot 89 : be timed accurately in \RR{} 90 : <>= 91 : bates 188 sysgc.time(solve(t(mm) %*% mm) %*% t(mm) %*% y) 92 : bates 10 @ 93 : and it provides essentially the same results as the standard 94 : \code{lm.fit} function that is called by \code{lm}. 95 : <>= 96 : dput(c(solve(t(mm) %*% mm) %*% t(mm) %*% y)) 97 : dput(lm.fit(mm, y)$coefficients) 98 : @ 99 : 100 : 101 : \subsection{A large example} 102 : \label{sec:largeLSQ} 103 : 104 : For a large, ill-conditioned least squares problem, such as that 105 : described in \citet{koen:ng:2003}, the literal translation of 106 : (\ref{eq:XPX}) does not perform well. 107 : <>= 108 : library(Matrix) 109 : data(mm, package = "Matrix") 110 : data(y, package = "Matrix") 111 : mm = as(mm, "matrix") 112 : dim(mm) 113 : bates 188 sysgc.time(naive.sol <- solve(t(mm) %*% mm) %*% t(mm) %*% y) 114 : bates 10 @ 115 : 116 : Because the calculation of a cross-product'' matrix, such as 117 :$\bX\trans\bX$or$\bX\trans\bm{y}$, is a common operation in 118 : statistics, the \code{crossprod} function has been provided to do 119 : this efficiently. In the single argument form \code{crossprod(mm)} 120 : calculates$\bX\trans\bX$, taking advantage of the symmetry of the 121 : product. That is, instead of calculating the$712^2=506944$elements of 122 :$\bX\trans\bX$separately, it only calculates the$(712\cdot 123 : 713)/2=253828$elements in the upper triangle and replicates them in 124 : the lower triangle. Furthermore, there is no need to calculate the 125 : inverse of a matrix explicitly when solving a 126 : linear system of equations. When the two argument form of the \code{solve} 127 : function is used the linear system 128 : 129 : \label{eq:LSQsol} 130 : \left(\bX\trans\bX\right) \widehat{\bm{\beta}} = \bX\trans\by 131 : 132 : is solved directly. 133 : 134 : Combining these optimizations we obtain 135 : <>= 136 : bates 188 sysgc.time(cpod.sol <- solve(crossprod(mm), crossprod(mm,y))) 137 : bates 10 all.equal(naive.sol, cpod.sol) 138 : @ 139 : 140 : On this computer (2.0 GHz Pentium-4, 1 GB Memory, Goto's BLAS) the 141 : crossprod form of the calculation is about four times as fast as the 142 : naive calculation. In fact, the entire crossprod solution is 143 : faster than simply calculating$\bX\trans\bX\$ the naive way. 144 : <>= 145 : bates 188 sysgc.time(t(mm) %*% mm) 146 : bates 10 @ 147 : 148 : \subsection{Least squares calculations with Matrix classes} 149 : \label{sec:MatrixLSQ} 150 : 151 : The \code{crossprod} function applied to a single matrix takes 152 : advantage of symmetry when calculating the product but does not retain 153 : the information that the product is symmetric (and positive 154 : semidefinite). As a result the solution of (\ref{eq:LSQsol}) is 155 : performed using general linear system solver based on an LU 156 : decomposition when it would be faster, and more stable numerically, to 157 : use a Cholesky decomposition. The Cholesky decomposition could be used 158 : but it is rather awkward 159 : <>= 160 : bates 188 sysgc.time(ch <- chol(crossprod(mm))) 161 : sysgc.time(chol.sol <- backsolve(ch, 162 : bates 185 forwardsolve(ch, crossprod(mm, y), upper = TRUE, trans = TRUE))) 163 : bates 10 all.equal(chol.sol, naive.sol) 164 : @ 165 : 166 : The \code{Matrix} package uses the S4 class system 167 : \citep{R:Chambers:1998} to retain information on the structure of 168 : matrices from the intermediate calculations. A general matrix in 169 : dense storage, created by the \code{Matrix} function, has class 170 : \code{"geMatrix"} but its cross-product has class \code{"poMatrix"}. 171 : The \code{solve} methods for the \code{"poMatrix"} class use the 172 : Cholesky decomposition. 173 : <>= 174 : data(mm, package = "Matrix") 175 : mm = as(mm, "geMatrix") 176 : class(crossprod(mm)) 177 : bates 188 sysgc.time(Mat.sol <- solve(crossprod(mm), crossprod(mm, y))) 178 : bates 10 all.equal(naive.sol, as(Mat.sol, "matrix")) 179 : @ 180 : 181 : Furthermore, any method that calculates a 182 : decomposition or factorization stores the resulting factorization with 183 : the original object so that it can be reused without recalculation. 184 : <>= 185 : xpx = crossprod(mm) 186 : xpy = crossprod(mm, y) 187 : bates 188 sysgc.time(solve(xpx, xpy)) 188 : sysgc.time(solve(xpx, xpy)) 189 : bates 10 @ 190 : 191 : The model matrix \code{mm} is sparse; that is, most of the elements of 192 : \code{mm} are zero. The \code{Matrix} package incorporates special 193 : methods for sparse matrices, which produce the fastest results of all. 194 : 195 : <>= 196 : data(mm, package = "Matrix") 197 : class(mm) 198 : bates 188 sysgc.time(sparse.sol <- solve(crossprod(mm), crossprod(mm, y))) 199 : bates 10 all.equal(naive.sol, as(sparse.sol, "matrix")) 200 : @ 201 : 202 : As with other classes in the \code{Matrix} package, the 203 : \code{sscMatrix} retains any factorization that has been calculated 204 : although, in this case, the decomposition is so fast that it is 205 : difficult to determine the difference in the solution times. 206 : 207 : <>= 208 : xpx = crossprod(mm) 209 : xpy = crossprod(mm, y) 210 : bates 188 sysgc.time(solve(xpx, xpy)) 211 : sysgc.time(solve(xpx, xpy)) 212 : bates 10 @ 213 : 214 : \bibliography{Matrix} 215 : \end{document}