SCM

SCM Repository

[tm] Annotation of /trunk/tm/man/TextDocCol.Rd
ViewVC logotype

Annotation of /trunk/tm/man/TextDocCol.Rd

Parent Directory Parent Directory | Revision Log Revision Log


Revision 825 - (view) (download) (as text)

1 : feinerer 816 \name{Corpus}
2 : feinerer 58 \docType{methods}
3 : feinerer 816 \alias{Corpus}
4 :     \alias{Corpus,Source-method}
5 : feinerer 58 \title{Text document collection}
6 :     \description{
7 : feinerer 816 Constructs a text document collection (corpus).
8 : feinerer 58 }
9 :     \usage{
10 : feinerer 816 \S4method{Corpus}{Source}(object, readerControl = list(reader = object@DefaultReader,
11 : feinerer 817 language = "en_US", load = TRUE), dbControl = list(useDb = FALSE, dbName = "",
12 : feinerer 741 dbType = "DB1"), ...)
13 : feinerer 58 }
14 :     \arguments{
15 : feinerer 690 \item{object}{a \code{Source} object.}
16 : feinerer 726 \item{readerControl}{a list with the named components \code{reader}
17 :     representing a reading function capable of handling the file format
18 : feinerer 825 found in \code{object}, \code{language} giving the text's language
19 :     (preferably in \acronym{Iso} 639-1 format), and
20 : feinerer 726 \code{load} being a logical value indicating whether the text corpus of
21 :     documents should be loaded immediately into memory (\code{load = TRUE}) or loaded when
22 : feinerer 694 necessary (\code{load = FALSE}). This allows to minimize memory
23 :     demands for large document collections. If \code{object} does not
24 :     support load on demand the text corpus is automatically loaded,
25 : feinerer 733 i.e., this argument is overruled.}
26 : feinerer 726 \item{dbControl}{a list with the named components \code{useDb}
27 :     indicating that database support should be activated, \code{dbName}
28 :     giving the filename holding the sourced out objects (i.e., the
29 :     database), and \code{dbType} holding a valid database type as
30 : feinerer 741 supported by \pkg{filehash}. Under activated database
31 : feinerer 730 support the \code{tm} packages tries to keep as few as possible
32 :     resources in memory under usage of the database.}
33 : feinerer 733 \item{...}{optional arguments for the \code{reader}.}
34 : feinerer 58 }
35 :     \value{
36 : feinerer 816 An S4 object of class \code{Corpus} which extends the class
37 : feinerer 58 \code{list} containing a collection of text documents.
38 :     }
39 : feinerer 726 \examples{
40 :     txt <- system.file("texts", "txt", package = "tm")
41 : feinerer 816 \dontrun{(Corpus(DirSource(txt), readerControl = list(reader
42 : feinerer 726 = readPlain, language = "en_US", load = TRUE), dbControl = list(useDb =
43 :     TRUE, dbName = "oviddb", dbType = "DB1")))}
44 :     reut21578 <- system.file("texts", "reut21578", package = "tm")
45 : feinerer 816 Corpus(DirSource(reut21578), readerControl = list(reader = readReut21578XML, language = "en_US", load = FALSE))
46 : feinerer 726 }
47 : feinerer 62 \author{Ingo Feinerer}
48 : feinerer 58 \keyword{methods}

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge