Simplified sFilter.
2007-03-21  Ingo Feinerer  <>

	* R/textdoccol.R (sFilter): Simplified sFilter significantly by
	using prescindMeta().

2007-03-18  Ingo Feinerer  <>

	* R/textdoccol.R: Improved database support.

2007-03-16  Ingo Feinerer  <>

	* R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.

	* R/resolve.R (resolveISOcode): Extracts the language from a ISO
	language code.

	* R/textdoccol.R (TextDocCol): Refactored several parser arguments
	into parserControl argument.

	* R/aobjects.R (TextDocument): Introduced the "Language" slot.

2007-03-14  Ingo Feinerer  <>

	* Work/tmDataSetup.R: The datasets acq and crude can now be
	created on the fly.

	* R/stopwords.R: Introduced a function returning the stopwords for
	a given language (English, German and French at the moment)

	* R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
	otherwise falls back to Snowball package.

2007-01-30  Ingo Feinerer  <>

	* man/dissimilarity-methods.Rd: Make clear that any method offered
	by "dists" from package "cba" can be used.

2007-01-22  Ingo Feinerer  <>

	* inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
	to Kurt's latex suggestion. Removed points and underscores in
	variable names for consistent naming.

	* DESCRIPTION: Update to version 0.1-2.

	* man/TextRepository.Rd: Fixed bug in documentation.

2007-01-12  Ingo Feinerer  <>

	* DESCRIPTION: Update to version 0.1-1.

2007-01-09  Ingo Feinerer  <>

	* R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of

2007-01-06  Ingo Feinerer  <>

	* R/: Changes due to Kurt's review.

2006-12-31  Ingo Feinerer  <>

	* R/: Implemented improvements based upon comments by David

2006-12-17  Ingo Feinerer  <>

	* inst/doc/: Rewrote vignette.

	* man/: Improved documentation.

2006-12-16  Ingo Feinerer  <>

	* man/: Updated documentation.

	* DESCRIPTION: Changed package name to "tm". Updated version to
	0.1 for first CRAN release.

	* inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
	list archive example.

	* inst/texts/ RSS Gmane R mailing list
	archive example.

	* R/preprocess.R (convert_mbox_eml): A simple e-mail converter
	from (several mails per box) mbox format to (single mail per file)
	eml format.

2006-12-08  Ingo Feinerer  <>

	* data/crude.rda: Rebuilt.

	* data/acq.rda: Rebuilt.

	* R/reader.R: Factored out reader and parser methods from

	* R/source.R: Factored out Source methods from aobjects.R and
	(GmaneRSource): Encapsulates Gmane R mailing list archive RSS

	* R/textdoccol.R (DirSource): Added support for recursive
	traversal of directories.

2006-12-07  Ingo Feinerer  <>

	* R/textdoccol.R ([[): Loads the document corpus automatically
	into memory upon access.
	(tm_transform, tm_filter): Removed several checks whether the
	document is already loaded ([[ ensures this now).
	(gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
	mailing list archive.

2006-12-06  Ingo Feinerer  <>

	* R/aobjects.R (TextDocument): Is now a virtual class.
	(Source): Is now a virtual class.

2006-12-05  Ingo Feinerer  <>

	* R/textdoccol.R (c): Support for an arbitrary number of document

2006-11-26  Ingo Feinerer  <>

	* R/textrepo.R: Updated TextRepository (constructor), append_elem,
	append_meta and remove_meta.

	* R/textdoccol.R: Removed modify_metadata method.

	* R/textrepo.R: Removed modify_metadata method.

	* R/textdoccol.R (remove_meta): Supports removal of document
	collection metadata and document (= in data frame) metadata.

2006-11-23  Ingo Feinerer  <>

	* R/textdoccol.R (append_doc): Bug fix for handling empty metadata.

	* data/crude.rda: Rebuilt.

	* data/acq.rda: Rebuilt.

	* inst/doc/textmin.Rnw: Updated vignette to reflect code changes.

	* R/textdoccol.R ([): Bug fix for subsetting a document
	collection's data frame.

2006-11-22  Ingo Feinerer  <>

	* R/textdoccol.R: Bug fixes in s_filter. Added full query support
	to s_filter.

	* R/textdoccol.R: Local text documents' metadata can now be copied
	to a document collection's data frame with prescind_meta.

2006-11-21  Ingo Feinerer  <>

	* R/: Text documents' slot metadata is now accessible in s_filter.

	* R/: Rewrote s_filter function (has still some restrictions).

2006-11-20  Ingo Feinerer  <>

	* R/: Various fixes in handling metadata.

	* R/: Added update mechanism for text document collections.

2006-11-19  Ingo Feinerer  <>

	* R/: Merging of document collections now creates a binary tree
	for reconstructing merged document collections.

	* R/: Redesign of metadata for document collections.

2006-11-07  Ingo Feinerer  <>

	* R/: Messages now use \code{ngettext}.

2006-11-03  Ingo Feinerer  <>

	* R/: Added functions for modifying and removing metadata.

2006-11-01  Ingo Feinerer  <>

	* man/: Updated some documentation.

	* R/: Corrected some connection issues.

	* inst/doc: Worked on the vignette.

2006-10-31  Ingo Feinerer  <>

	* inst/: Added texts and started vignette.

	* R/: Final changes based upon David's comments.

2006-10-29  Ingo Feinerer  <>

	* NAMESPACE: Corrected exports (generic methods need exportMethods

2006-10-26  Ingo Feinerer  <>

	* R/: Modified the TextDocCol constructur and various parsers. It
	is now modular and supports various file formats via plugins (see
	the new "Source" class).

2006-10-24  Ingo Feinerer  <>

	* man/: Revised documentation after previous code changes.

2006-10-23  Ingo Feinerer  <>

	* R/: Remaining changes as discussed with David.

2006-10-22  Ingo Feinerer  <>

	* R/: Some changes as suggested by David. The rest will follow
	within the next days.

2006-09-26  Ingo Feinerer  <>

	* man/: Finished documentation.

2006-09-25  Ingo Feinerer  <>

	* man/: Wrote some documentation.

2006-09-24  Ingo Feinerer  <>

	* R/: Further syntactic sugar in form of additional assignment and
	accessor methods.

2006-09-13  Ingo Feinerer  <>

	* R/: Syntactic sugar in form of "length", "show" and "summary"

2006-08-24  Ingo Feinerer  <>

	* R/: Diverse updates. Mainly on default operators ("[" or "c")
	and dissimilarities.

2006-08-12  Ingo Feinerer  <>

	* R/: Added similarity functions.

	* data/: Added english stopwords.

2006-08-07  Ingo Feinerer  <>

	* data/: Examples compiled for new features

	* R/: Changes due to new structure.

	* NAMESPACE: Corrected namespace to reflect new structure.

	* R/termdocmatrix.R: Adapted for new naming scheme.

2006-08-06  Ingo Feinerer  <>

	* R/textdoccol.R: Adapted code for new class structure. Wrote
	several transform and filter functions operating on text document
	collections (alias text document databases).

	* R/aobjects.R: Adapted class structure with inheritance,
	repositories and additional meta data. Loading files on demand is
	now possible.

2006-07-13  Ingo Feinerer  <>

	* R/: Some cosmetic cleanups.

	* inst/: Removed vignette on clustering. That and much more is now
	described in the JSS paper on text mining. Based upon that
	article an elaborated vignette will be incorporated in the future.

2006-07-01  Ingo Feinerer  <>

	* R/: Updated generic S4 methods to comply with signature changes
	in newer versions of R (> 2.3)

2006-03-12  Ingo Feinerer  <>

	* ext/R/importRIS.R: Automatic RIS import is now possible.

2006-02-14  Ingo Feinerer  <>

	* R/textdoccol.R: Added RIS HTML input format.

2006-01-21  Ingo Feinerer  <>

	* R/textdoccol.R: Removed bug that caused invalid text document
	collections when handling many input files.

2006-01-11  Ingo Feinerer  <>

	* R/textdoccol.R: Restructured and extended file import

	* inst/doc/clustering.Rnw: Adapted vignette for use with

	* man/ReutNews.Rd: Documentation for ReutNews.rda

	* data/ReutNews.rda: A tiny Reuters21578 example data set.

2005-12-22  Ingo Feinerer  <>

	* inst/doc/clustering.Rnw: Wrote a small vignette to present the
	clustering facilities of this package.

2005-12-15  Ingo Feinerer  <>

	* R/aobjects.R: Changed package document structure to avoid class
	dependency problems.

2005-12-06  Ingo Feinerer  <>

	* Wrote a script for the ModLewis Split for the Reuters-21578 XML
	data set.

	* Finished documentation and reordered directory structure. Now "R
	CMD check textmin" works without errors.

2005-12-04  Ingo Feinerer  <>

	* src/: Various splits can now be easily created for the
	Reuters21578 data set.

2005-12-03  Ingo Feinerer  <>

	* Updated documentation

2005-11-30  Ingo Feinerer  <>

	* Wrote R documentation for some classes and methods.

2005-11-19  Ingo Feinerer  <>

	* R/textdoccol.R: Constructor of textdoccol allows import of CSV
	files. See the questionnaire data/Umfrage.csv for such an example.
	We are now able to import files in Reuters-21578 XML format.

	* Changed class interfaces in various files. Weighting of the text
	matrix is now possible.

2005-11-08  Ingo Feinerer  <>

	* R/textdoccol.R: One can build term-document matrices if
	nessecary (with buildTDM(...)) and fill the field tdm from a text
	document collection with it.

	* R/textmatrix.R: Wrote S4 class for term-document matrices.

2005-11-06  Ingo Feinerer  <>

	* R/textdoccol.R: We now can read in a whole XML file with several
	news items.

2005-11-05  Ingo Feinerer  <>

	* R/textdoccol.R: Set up an S4 class for a collection of text
	documents. A first attempt to read in XML input (like the RCV1
	set) was made.

	* R/textdocument.R: Set up an S4 class for text documents. Wrote
	some accessor functions.

	* data/newsitem.xml: Added this XML file for testing purposes. It
	contains a single news item from the Reuters Corpus Volume 1
	(RCV1) XML set.

2005-10-07  Ingo Feinerer  <>

	* R/textmatrix.R (textmatrix): Removed the transpose of the original
	textmatrix as k-means clustering provided by R (kmeans) now works on
	this textmatrix. The result is a k-means text clustering with a
	similarity measure based upon word frequences.

2005-10-05  Ingo Feinerer  <>

	* R/textmatrix.R: Adapted the preprocessing code from the R
	package "lsa" written by Fridolin Wild to build a document text matrix.

2005-10-02  Ingo Feinerer  <>

	* Set up the R Text Mining Package infrastructure.
