SCM

SCM Repository

[matrix] View of /pkg/Matrix/TODO
ViewVC logotype

View of /pkg/Matrix/TODO

Parent Directory Parent Directory | Revision Log Revision Log


Revision 2677 - (download) (annotate)
Sat Jun 25 19:18:12 2011 UTC (7 years, 8 months ago) by mmaechler
File size: 11268 byte(s)
finally a working C level Csparse_subassign() - to solve the memory-explode issue .. *BUT* it`s still too slow, even if I never remove but use drop0(.) in the end ... commit now before South America ... in case I get hit by a truck :-)
- Arm twisting ... and other convincing arguments by R core fellows:
  ==>  R/zzz.R --- do NOT redefine base, but define an S3 method
       ------- for as.array() etc

- Check for DimNames propagation in coercion and other operations.

- Report the problem in the Linux ldexp manual page.  The second and
  third calls in the Synopsis should be to ldexpf and ldexpl.

- provide methods for "dspMatrix" and "dppMatrix"!

- implement (more) methods for supporting "packed" (symmetric / triangular)
  matrices; particularly something like pack() and unpack()  [to/from our
  classes from/to "numeric"] --- have already man/unpack.Rd but no method yet!

  (have some dtr* <-> dtp*)

- combine the C functions for multiplication by special forms and
  solution wrt special forms by using a 'right' argument and a
  'classed' argument.
   [done with dgeMatrix_matrix_mm();  not yet for other classes;
    and for _crossprod()]

-----

- "Math2" , "Math", "Arith":
   keep triangular and symmetric Matrices when appropriate:
   particularly desirable for  "Math2": round(), signif()

- For triangular matrices, make sure the four rules of
  "triangular matrix algebra" (Golub+Van Loan 1996, 3.1.8, p.93) are fulfilled.
  - since 2008-03-06 ok for Csparse
  - since 2010-07-23 ok for <dtr> %*% <dtr>
  TODO: e.g. for  <ltr> %*% <dtC>

- "d" <-> "l" coercion for all "[TCR]" sparse matrices is really trivial:
  "d" -> "l" : drops the 'x' slot
  "l" -> "d" : construct an 'x' slot of all '1'
  We currently have many of these conversions explicitly, e.g.
   setAs("dsTMatrix", "lsTMatrix",
      function(from) new("lsTMatrix", i = from@i, j = from@j, uplo = from@uplo,
                         Dim = from@Dim, Dimnames = from@Dimnames))
  but I would rather want to automatically construct all these coercion
  methods at once by a ``method constructor'', i.e.,
  for all  "dsparse*" -> "lsparse*" and vice versa.
  How can one do this {in a documented way} ?

- Think of constructing  setAs(...) calls automatically in order to
  basically enable all ``sensible'' as(fromMatrix, toMatrix)  calls,
  possibly using canCoerce(.)

- When we have a packed matrix, it's a waste to go through "full" to "sparse":
  ==> implement
	setAs("dspMatrix", "sparseMatrix")
	setAs("dppMatrix", "sparseMatrix")
	setAs("dtpMatrix", "sparseMatrix")
  and the same for "lsp" , "ltp"  and  "nsp" , "ntp" !

- tcrossprod(x, y) : do provide methods for y != NULL
  calling Lapack's DGEMM for "dense"
  [2005-12-xx: done for dgeMatrix at least]

- Factorizations: LU done; also Schur()  for  *sparse*  Matrices.

- use  .Call(Csparse_drop, M, tol) in more places,
  both with 'tol = 0.' to drop "values that happen to be 0" and for
  zapsmall() methods for Csparse*

- implement .Call(Csparse_scale, ....) interfacing to cholmod_scale()
  in src/CHOLMOD/Include/cholmod_matrixops.h : for another function
  specifically for multiplying a cholmod_sparse object by a diagonal matrix.
  Use it in %*% and [t]crossprod methods.

- chol() should ``work'': proper result or "good" error message.

- make sure *all* group methods have (maybe "bail-out") setMethod for "Matrix".
  e.g. zapsmall(<pMatrix>) fails "badly"

- rbind2(<sparse>, <dense>) does not work  (e.g. <dgC>, <dge>)

- <sparse> %*% <dense>  {also in crossprod/tcrossprod}  currently always
  returns <dense>, since --> Csparse_dense_prod --> cholmod_sdmult
  and that does only return dense.
  When the sparse matrix is very sparse, i.e. has many rows with only zero
  entries, it would make much sense to return sparse.

- ! <symmetricMatrix>  loses symmetry, both for dense and sparse matrices.
  !M  where M is "sparseMatrix", currently always gives dense. This only
  makes sense when M is ``really sparse''.

- example(Cholesky, echo=FALSE) ; cm <- chol(mtm); str(cm); str(mtm)

  shows that chol() does not seem to make use of an already
  present factorization and rather uses one with more '0' in x slot.

- diag(m) <- val    currently automatically works via  m[cbind(i,i)] <- val
  This (`[<-` method) is now "smart" for diagonalMatrix, but needs also to
  be for triangularMatrix, and probably also "dense*general*Matrix" since the
  above currently goes via "matrix" and back instead of using the 'x' slot
  directly; in particular, the triangular* "class property" is lost!

  Note that 'diag(M[,-1]) <- val' is deadly slow (*) for large sparse M,
  but that's because of the 2nd line assignment in the equivalent
    tmpM <- `diag<-`(M[,-1], val)
    M[,-1] <- tmpM
  (*): gives *error* {about negative integer} when
       prod(dim(M)) > .Machine$integer.max, e.g. for square (n x n) M when
       n >= 46341 == ceiling(2^15.5)
  This is "the same" as Ashley Ford's report (25 Feb 2010),
  MM @ ~/R/MM/Pkg-ex/Matrix/nsp-2col-index-bug.R

- examples for solve( Cholesky(.), b, system = c("A", "LDLt"....))
  probably rather in man/CHMfactor-class.Rd than man/Cholesky.Rd

- LDL(<CHMsimpl>) looks relatively easy; via  "tCsparse_diag()"
   {diagonal entries of *triangular* Csparse}
  --> see comment in determinant(<dsC>) in R/dsCMatrix.R, will give
  faster determinant

- tr(A %*% B) {and even  tr(A %*% B %*% C) ...} are also needed
  frequently in some computations {conditional normal distr. ...}.
  Since this can be done faster than by
    sum(diag(A %*% B))  even for traditional matrices, e.g.
    	       sum(A * t(B)) or {even faster for "full" mat}
	       crossprod(as.vector(A), as.vector(B))
  and even more so for, e.g.  <sparse> %*% <dense>
  {used in Soeren's 'gR' computations},
  we should also provide a generic and methods.
- diag(A %*% B) might look like a "generalization" of tr(A %*% B),
  but as the above tricks show, is not really.
  Still, it's well worth to provide  diag.prod(A, B)

- qr.R(qr(x)) may differ for the "same" matrix, depending on it being
  sparse or dense:
    "qr.R(<sparse>) may differ from qr.R(<dense>) because of permutations"

  This is not really acceptable and currently influences  rcond() as well.

- eigen() should become generic, and get a method at least for diagonal,
  but also for symmetric -> dsyMatrix  [LAPACK dsyev() uses UPLO !],
  but also simply for dgeMatrix (without going via tradition matrices).
  What about Sparse?  There's fill-in, but it may still be sensible, e.g.
  mlist <- list(1, 2:3, diag(x=5:3), 27, cbind(1,3:6), 100:101)
  ee <- eigen(tcrossprod(bdiag(lapply(mlist, as.matrix))))
  Matrix( signif(ee$vectors, 3) )

- facmul() has no single method defined;  it looks like a good idea though
  (instead of the infamous qr.qy, qr.qty,.... functions)

- symmpart() and skewpart()  for *sparse* matrices still use (x +/- t(x))/2
  and could be made more efficient.
  Consider going via  asTuniq() or something very close to
  .Arith.Csparse() in R/Ops.R
  For a traditional "matrix" object, we should speedup, using C code ..

- many setAs(*, "[dl]..Matrix") are still needed, as long as e.g.
  replCmat() uses as_CspClass() and drop0(.) which itself call
  as_CspClass() quite a bit.  --> try to replace these by
  as(*, "CsparseMatrix"); forceSymmetric, etc.

- implement fast diag(<triangularCsparse>) via calling new
  src/Csparse.c's diag_tC_ptr()

- add examples (and tests!) for update(<CHMfactor>, ..) and
  Cholesky(......, Imult), also tests for hidden {hence no examples}
  ldetL2up() { R/CHMfactor.R }

- chol(<nsCMatrix>)  gives "temporarily disabled"
  but should give the *symbolic* factorization;
  similarly Cholesky(.) is not enabled

- writeMM(obj, file=stdout()) creates file "1" since file is silently
  assumed to be a string, i.e. cannot be a connection.
  An R (instead of C) version should be pretty simple, and would work with
  connections automatically ["lsparse" become either "real" or
  "pattern", "depending if they have NAs or not].

- <diagMatrix> o <ddenseMatrix> still works via sparse in some cases, but
  could return <diagMatrix> in the same cases where <diagMatrix> o <numeric> does.

- look at solve.QP.compact() in \pkg{quadprog} and how to do that using
  our sparse matrices.  Maybe this needs to be re-implemented using CHOLMOD
  routines.

- We allow "over-allocated" (i,x)-slots for CsparseMatrix objects,
  as per Csparse_validate() and the tests in tests/validObj.R. This is as
  in CHOLMOD/CSparse, where nzmax (>= .@p[n]) corresponds to length(.@i),
  and makes sense e.g. for M[.,.] <- v  assignments which could allocate in
  chunks and would not need to re-allocate anything in many cases.
  HOWEVER, replCmat() in R/Csparse.R is still far from making use of that.

- advertize  rbind2() / cbind2() and (rather?) rBind() / cBind()
  	     ------                            -----
  in all vignettes / talks / ... !!
  People erronously try rbind/cbind see that they don't work and then
  reinvent the wheel!

  --> Consider using the new  'dotMethods' functionality to define
      cbind() and rbind() versions that work with Matrix.
      The "Rmpfr" package does that now.

- In  all(M1 == M2)  for sparse large matrices M1, M2 (e.g. M2 <- M1 !),
  the intermediate 'M1 == M2' typically is dense, hence potentially using
  humongous amount of memory.
  We should/could devise something like   allCompare(M1, M2, `==`)
  which would remain sparse in all its computations.

--------

- Reconsider the linkages in the include files for the SuiteSparse
  packages.  It may be better simply to add all the src/<nm>/Include
  directories to the include path for all compilations.  I don't think
  there is a big overhead.  Right now we need to modify the include
  file src/SPQR/Include/SuiteSparseQR_C.h so that it does not expect
  to have src/UFsparse and src/CHOLMOD/Include on the include path.
  Maybe just those two should be added to the include path.

- (systematically check that LAPACK-calling functions check for
   0-dimensional input themselves; LAPACK gives an integer error code)

- the  f[,5762] <- thisCol # now ...
  line in tests/indexing.R uses very large objects unnecessarily;
  Improve  replTmat()  in R/Tsparse.R, making use of new "abIndex"
  vectors.

- {IS THIS CURRENT?}
  Sept. 2009:
  Subject: chol2inv() |-> solve(<CHMfactor>)

  when testing and documenting  chol2inv(),
  I found that it's pretty simple to also define a method for
  "CHMfactor" objects, namely simply the solve(*, Diagonal(.) "A")
  method.
  This is not particularly exciting, and also does *not*, I think
  help for defining a chol2inv() method for  *sparse* (upper)
  triangular matrices.

- sort(<sparseVector>, partial=..), needed, for mean(*, trim = .) or median().
  Note that defining xtfrm() does not "help" (as sort() then goes via dense
  index).  See "mean" in R/Matrix.R

- rcond(<sparseMatrix>) - for square currently goes via *dense* -- BAD --
  can we go via qr() in any case?
  In some cases, e.g. lmer()'s  "Lambda" (block triangular, small blocks)
       rcond(L) :=  1 / (norm(L) * norm(solve(L)))
  is simple {and remains sparse, as solve(L) is still block triangular}

- How can we ensure that     inst/include/cholmod.h		 remains
  correct and equivalent to  src/CHOLMOD/Include/cholmod_core.h  and  siblings ???
  {currently need to do this manually (Emacs M-x compare-windows) for the   typedefs}

- finalize and activate the new *unused* code in src/sparseVector.c

R-Forge@R-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business University of Wisconsin - Madison Powered By FusionForge