SCM Repository

[tm] View of /trunk/tm/man/termFreq.Rd
ViewVC logotype

View of /trunk/tm/man/termFreq.Rd

Parent Directory Parent Directory | Revision Log Revision Log

Revision 795 - (download) (as text) (annotate)
Sat Oct 27 09:14:35 2007 UTC (11 years, 9 months ago) by feinerer
File size: 1716 byte(s)
Updated documentation
\title{Term Frequency Vector}
  Generate a term frequency vector from a text document.
termFreq(doc, control = list())
  \item{doc}{an object inheriting from \code{TextDocument}.}
  \item{control}{a list of control options. Possible settings are
      \item \code{tolower}: a function converting characters to lower
      case. Defaults to \code{base::tolower}.
      \item \code{tokenize}: a function tokenizing documents to single
      tokens. Defaults to \code{function(x) unlist(strsplit(gsub("[^[:alnum:]]+", " ", x), " ", fixed = TRUE)}.
      \item \code{stemming}: a Boolean value indicating whether tokens
      should be stemmed. Defaults to \code{FALSE}.
      \item \code{stopwords}: either a Boolean value indicating stopword
      removal using default language specific stopword lists shipped
      with this package or a character vector holding custom stopwords.
      \item \code{dictionary}: a character vector to be tabulated
      against. No other terms will be listed in the result. Defaults to
      no action (i.e., all terms are considered).
      \item \code{minDocFreq}: an integer value. Words that appear less
      often in \code{doc} than this number are discarded. Defaults to
      \code{1} (i.e., every token will be used).
      \item \code{minWordLength}: an integer value. Words smaller than
      this number are discarded. Defaults to length \code{3}.
  A named integer vector with term frequencies as values and tokens as
termFreq(crude[[1]], control = list(stemming = TRUE, minWordLength = 4))
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge