SCM Repository

[tm] View of /pkg/ChangeLog
ViewVC logotype

View of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log

Revision 972 - (download) (annotate)
Fri Jul 3 16:16:59 2009 UTC (10 years, 3 months ago) by feinerer
File size: 27688 byte(s)
Move removeCitation, removeMultipart, and removeSignature to the tau package.
2009-07-03  Ingo Feinerer  <>

	* R/transform.R: Move removeCitation, removeMultipart, and
	removeSignature to the tau package since they are mainly utility
	functions (for handling e-mails) and not very framework specific.

2009-06-28  Ingo Feinerer  <>

	* man/: Fix documentation.

2009-06-26  Ingo Feinerer  <>

	* R/reader.R (readReut21578XMLasPlain): New reader which returns a
	plain text document instead of an XML document for texts of the
	Reuters-21578 dataset.

	* R/sparse.R: Removed since the slam package is now available on

	* DESCRIPTION (Depends): Add slam package.

2009-06-17  Ingo Feinerer  <>

	* R/transform.R (stemDoc): Fix character(0) handling.

2009-06-12  Ingo Feinerer  <>

	* R/doc.R (show): Pretty print.

2009-05-27  Ingo Feinerer  <>

	* R/matrix.R (print.TermDocumentMatrix): Handle empty matrices

2009-05-13  Ingo Feinerer  <>

	* R/corpus.R: Make corpus virtual. Implement corpus with standard
	and permanent storage semantics.

	* DESCRIPTION: New major release. A *lot* of improvements.

2009-05-04   Ingo Feinerer <>

	* NAMESPACE: Export some simple_triplet_matrix functions.

2009-04-28   Ingo Feinerer <>

	* R/weight.R: Adapt tf-idf to new matrix format.

2009-04-27  Ingo Feinerer  <>

	* R/matrix.R: Create two distinct classes for term-document and
	document-term matrices.

2009-04-26  Ingo Feinerer  <>

	* R/termdocmatrix.R: No longer use Matrix package. This reduces
	package start-up time significantly.

2009-04-11  Ingo Feinerer  <>

	* inst/doc/tm.Rnw: Fix code/documentation mismatch.

2009-04-04  Ingo Feinerer  <>

	* R/transform.R (tmReduce): Combine multiple maps into one

2009-04-03  Ingo Feinerer  <>

	* R/weight.R: Remove weightLogical since it does not return a

	* R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
	or TermDocumentMatrix instead.

2009-03-28  Ingo Feinerer  <>

	* inst/doc/extensions.Rnw: Finished vignette.

2009-03-27  Ingo Feinerer  <>

	* R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
	DocumentTermMatrix representations.

2009-03-23  Ingo Feinerer  <>

	* R/reader.R (readXML): New reader for arbitrary XML files.

2009-03-22  Ingo Feinerer  <>

	* R/source.R (CSVSource): Defunct (use DataframeSource instead).
	(XMLSource): New XMLSource class for arbitrary XML files.
	(Source): New slot Vectorized.

2009-03-21  Ingo Feinerer  <>

	* R/reader.R (readTabular): Experimental reader for tabular data
	structures which can be customized via user-defined mappings.

	* R/reader.R: Always use UTC time zone.

	* R/AAA.R (.onLoad): No longer try to start a MPI cluster.

2009-03-20  Ingo Feinerer  <>

	* R/reader.R (readDOC): Options can be passed over to antiword.

	* R/reader.R (readPDF): Options can be passed over to pdfinfo and

2009-03-10  Ingo Feinerer  <>

	* R/source.R (DirSource): Add pattern and arguments
	which are internally passed over to list.files().

2009-03-02  Ingo Feinerer  <>

	* inst/doc/tm.Rnw: Suppress pointless loading message.

2009-01-29  Ingo Feinerer  <>

	* DESCRIPTION: Speed up package loading (via moving packages not
	strictly necessary for normal operation to Suggests instead of

2009-01-08  Ingo Feinerer  <>

	* R/reader.R (readNewsgroup): The date format is now configurable.

2008-12-20  Ingo Feinerer  <>

	* R/preprocess.R (convertMboxEml): Fix off-by-one error.

2008-12-16  Ingo Feinerer  <>

	* R/termdocmatrix.R (TermDocMatrix): Sort row indices.

2008-12-06  Ingo Feinerer  <>

	* R/source.R (DataframeSource): New source class for data frames.

	* R/source.R: Fixed non-standard call evaluation.

2008-11-29  Ingo Feinerer  <>

	* R/source.R (URISource): New source class for a single document.

2008-11-27  Ingo Feinerer  <>

	* R/source.R: Refactoring.

2008-11-25  Ingo Feinerer  <>

	* R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
	Rmpi installations more gracefully.

2008-11-08  Ingo Feinerer  <>

	* R/source.R (Source): Add Length slot.

2008-11-06  Ingo Feinerer  <>

	* R/AAA.R: Unify duplicated .onLoad function.

2008-11-03  Ingo Feinerer  <>

	* DESCRIPTION (Suggests): Added Rmpi.

2008-11-02  Ingo Feinerer  <>

	* R/source.R (getElem): Fix 'no visible binding' warning.

	* man/WeightFunction.Rd: Fix signature.

2008-08-03  Ingo Feinerer  <>

	* R/weight.R: Introduce name abbreviations for weighting functions.

2008-07-24  Ingo Feinerer  <>

	* R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.

	* R/cluster.R: Provide convenience functions for using a MPI

	* R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if

	* R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if

2008-07-17  Ingo Feinerer  <>

	* R/textdoccol.R (lapply): Removed debug print out.

2008-06-06  Ingo Feinerer  <>

	* R/reader.R (readRCV1): Improved meta data extraction from
	Reuters Corpus Volume 1 documents.

2008-05-25  Ingo Feinerer  <>

	* R/transform.R: Ensure that all mappings preserve multiline

2008-05-24  Ingo Feinerer  <>

	* R/filter.R: Every filter has now an attribute indicating whether
	it sould be applied to document level (doclevel).

	* R/textdoccol.R (tmFilter): Set searchFullText as new default

2008-04-23  Ingo Feinerer  <>

	* R/transform.R (replacePatterns): Replaced removeWords by
	replacePatterns. Suggested by Christian Buchta.

	* R/textdoccol.R (inspect): Improved formatting.

2008-04-19  Ingo Feinerer  <>

	* inst/CITATION: Updated JSS article information.

	* R/textdoccol.R (setAs): Added coerce method from list to

	* R/meta.R (meta): Improved meta data handling.

2008-03-21  Ingo Feinerer  <>

	* R/textdoccol.R (materialize, tmMap): Improvements suggested by
	Christian Buchta.

	* inst/CITATION: Added template to include JSS article reference.

2008-03-12  Ingo Feinerer  <>

	* R/textdoccol.R (tmMap): Introduced lazy mapping.

	* R/source.R: Added VectorSource.

2008-02-23  Ingo Feinerer  <>

	* man/: Language codes should be in ISO 639-1 format.

	* R/textdoccol.R (asPlain): Preserve local meta data.

2008-01-31  Ingo Feinerer  <>

	* R/textdoccol.R (writeCorpus): Function for writing a corpus
	containing plain text documents to disk.

2008-01-30  Ingo Feinerer  <>

	* R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
	always set correctly.

	* R/textdoccol.R: Set load = TRUE as default for load on demand
	since in most cases this is the wanted behaviour.

2008-01-24  Ingo Feinerer  <>

	* R/: Renamed TextDocCol to Corpus, and Corpus to Content.

	* DESCRIPTION: Updated Version to 0.3 due to core name changes.

2008-01-22  Ingo Feinerer  <>

	* R/meta.R (meta): New function for consistent access to meta data
	of document collections, repositories, and texts.

2008-01-21  Ingo Feinerer  <>

	* R/: Better support for encodings.

2008-01-13  Ingo Feinerer  <>

	* R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
	selection when no reader argument is given.

2008-01-05  Ingo Feinerer  <>

	* R/source.R (CSVSource): Now uses read.csv instead of scan

2008-01-02  Ingo Feinerer  <>

	* R/reader.R (getReaders): Returns available reader functions.

	* R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
	as default.

2007-12-02  Ingo Feinerer  <>

	* R/stopwords.R (stopwords): Shortened code, removed codetools
	variable warnings.

	* man/: Documentation for showMeta, added an example for tmMap.

	* inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
	some minor typos fixed.

2007-12-01  Ingo Feinerer  <>

	* R/aobjects.R (showMeta): Added method for pretty printing a
	text document's meta data.

2007-11-29  Ingo Feinerer  <>

	* R/textdoccol.R (TextDocCol): Better handling of empty

	* NAMESPACE: Exported readDOC.

	* man/completeStems.Rd: Added an example.

2007-11-18  Ingo Feinerer  <>

	* R/stopwords.R (stopwords): Look up .dat files at every
	call. Allows users to modify stopword .dat files interactively.

2007-11-06  Ingo Feinerer  <>

	* R/termdocmatrix.R (termFreq): Correct processing of empty

2007-10-27  Ingo Feinerer  <>

	* man/: Updated documentation.

2007-10-21  Ingo Feinerer  <>

	* R/complete.R (completeStems): Completes (heuristically) word

	* R/termdocmatrix.R (TermDocMatrix2): New modular

	* NAMESPACE: Exported termFreq.

2007-10-16  Ingo Feinerer  <>

	* R/reader.R (readDOC): Added MS Word reader (using antiword).

2007-10-14  Ingo Feinerer  <>

	* R/weight.R: Weighting functions for TermDocMatrix.

2007-10-13  Ingo Feinerer  <>

	* R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
	functions for accessing dimension, column, and row names.

	* R/plot.R (plot.TermDocMatrix): Plot correlations between terms.

2007-09-11  Ingo Feinerer  <>

	* man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.

2007-08-28  Ingo Feinerer  <>

	* R/fungen.R: Use S4 class for function generators instead of S3 attributes.

2007-07-29  Ingo Feinerer  <>

	* R/reader.R (readPDF): Removed manual checks for pdftotext and
	pdfinfo. The system call gives a warning anyway.

2007-07-28  Ingo Feinerer  <>

	* R/textdoccol.R (asPlain): Conversion from
	StructuredTextDocuments to PlainTextDocuments.

2007-07-21  Ingo Feinerer  <>

	* R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
	for accessing term-document matrices.

	* inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
	are installed.

2007-07-17  Ingo Feinerer  <>

	* R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
	Christian Buchta.

2007-07-15  Ingo Feinerer  <>

	* inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).

2007-07-14  Ingo Feinerer  <>

	* R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.

	* R/reader.R (readPDF): Added PDF reader.

2007-07-13  Ingo Feinerer  <>

	* DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.

	* inst/stopwords/english.dat: Added the term "yes" to stopwords.

	* R/termdocmatrix.R (dim): dim function for TermDocMatrix.

	* R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.

2007-07-11  Ingo Feinerer  <>

	* R/distmeasure.R (dissimilarity): Replaced dists call from
	package cba by new dist call from package proxy.

2007-07-10  Ingo Feinerer  <>

	* inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.

2007-06-21  Ingo Feinerer  <>

	* R/termdocmatrix.R: require() uses the quietly option to suppress
	loading messages.

2007-06-12  Ingo Feinerer  <>

	* R/dictionary.R: Added dictionary support.

2007-06-07  Ingo Feinerer  <>

	* R/aobjects.R: Added classes for Reuters21578 XML and RCV1
	documents. This simplifies some functions, e.g., asPlain.

2007-06-06  Ingo Feinerer  <>

	* inst/doc/tm.Rnw: Fixed some typos in vignette.

2007-06-03  Ingo Feinerer  <>

	* R/textdoccol.R (replaceWords): Added method to replace a set of
	words by a single word. Useful for synonyms.

2007-05-22  Ingo Feinerer  <>

	* man/TermDocMatrix.Rd: Fixed documentation on Data slot.

2007-05-19  Ingo Feinerer  <>

	* R/termdocmatrix.R (textvector): Small fix for dealing with empty
	vectors. Thanks to Ariel Maguyon for his error report.
	(removeSparseTerms): New function to remove columns from a
	term-document matrix exceeding a sparse factor.

2007-05-15  Ingo Feinerer  <>

	* man/tmUpdate.Rd: Corrected documentation on readerControl parameter.

2007-05-11  Ingo Feinerer  <>

	* man/sFilter.Rd: Corrected documentation on statement format (use
	'==' instead of '=').

2007-05-08  Ingo Feinerer  <>

	* R/aobjects.R (StructuredTextDocument): Inherits from

2007-05-04  Ingo Feinerer  <>

	* R/termdocmatrix.R (findFreqTerms): Perform efficient computation
	on sparse matrices as proposed by Martin Maechler.

2007-04-27  Ingo Feinerer  <>

	* R/textdoccol.R: Removed \code{dbDisconnect} calls since last
	\pkg{filehash} version makes them deprecated.

2007-04-22  Ingo Feinerer  <>

	* R/termdocmatrix.R (textvector): Stemming is now performed before
	erasing stopwords.
	(weightMatrix): Adapted to handle sparse matrices.
	(TermDocMatrix): Sparse matrix is now efficiently built by
	direct stepwise insertion of row values into it.

2007-04-21  Ingo Feinerer  <>

	* DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
	due to ongoing problems. For our purposes the latter is as useful
	as the replaced package.

2007-04-20  Ingo Feinerer  <>

	* man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.

	* man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.

2007-04-15  Ingo Feinerer  <>

	* R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
	languages with available stopwords.

2007-04-14  Ingo Feinerer  <>

	* inst/doc/tm.Rnw: Minor corrections in the vignette.

2007-04-11  Ingo Feinerer  <>

	* DESCRIPTION: Update to version 0.2, since a lot of new features
	have been integrated.

	* inst/stopwords: Updated existing stopwords and added stopwords
	for various other languages.

2007-04-10  Ingo Feinerer  <>

	* man/: Updated documentation.

	* Work/testDb.R: Script to test database stuff.

	* R/: Fixed various database related bugs. Seems to be rather
	useable now, i.e., consider as alpha status for now.

2007-04-08  Ingo Feinerer  <>

	* R/: Fixed some bugs related to database support.

2007-04-07  Ingo Feinerer  <>

	* man/: Added a lot of examples to the manuals.

2007-04-05  Ingo Feinerer  <>

	* man/: Updated parts of the documentation.

	* R/textdoccol.R (asPlain): Added conversion from newsgroup
	documents to plain text documents.

2007-04-01  Ingo Feinerer  <>

	* R/textdoccol.R: Finished experimental database support. Not yet
	intensively tested.

	* R/source.R: Now each source has a default reader.

	* R/reader.R: \code{FunctionGenerator} is now an attribute, not a
	class anymore.

	* R/plaintextdoc.R: Custom show method for plain text documents.

	* R/aobjects.R: Added a class for structured text documents.

	* R/reader.R: Replaced remaining \code{parser} occurrences with

	* R/textdoccol.R (summary): Indent tags. 

	* R/textdoccol.R (removePunctuation): Transform method to remove
	punctuation marks.

2007-03-21  Ingo Feinerer  <>

	* R/textdoccol.R (sFilter): Simplified sFilter significantly by
	using prescindMeta().

2007-03-18  Ingo Feinerer  <>

	* R/textdoccol.R: Improved database support.

2007-03-16  Ingo Feinerer  <>

	* R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.

	* R/resolve.R (resolveISOcode): Extracts the language from a ISO
	language code.

	* R/textdoccol.R (TextDocCol): Refactored several parser arguments
	into parserControl argument.

	* R/aobjects.R (TextDocument): Introduced the "Language" slot.

2007-03-14  Ingo Feinerer  <>

	* Work/tmDataSetup.R: The datasets acq and crude can now be
	created on the fly.

	* R/stopwords.R: Introduced a function returning the stopwords for
	a given language (English, German and French at the moment)

	* R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
	otherwise falls back to Snowball package.

2007-01-30  Ingo Feinerer  <>

	* man/dissimilarity-methods.Rd: Make clear that any method offered
	by "dists" from package "cba" can be used.

2007-01-22  Ingo Feinerer  <>

	* inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
	to Kurt's latex suggestion. Removed points and underscores in
	variable names for consistent naming.

	* DESCRIPTION: Update to version 0.1-2.

	* man/TextRepository.Rd: Fixed bug in documentation.

2007-01-12  Ingo Feinerer  <>

	* DESCRIPTION: Update to version 0.1-1.

2007-01-09  Ingo Feinerer  <>

	* R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of

2007-01-06  Ingo Feinerer  <>

	* R/: Changes due to Kurt's review.

2006-12-31  Ingo Feinerer  <>

	* R/: Implemented improvements based upon comments by David

2006-12-17  Ingo Feinerer  <>

	* inst/doc/: Rewrote vignette.

	* man/: Improved documentation.

2006-12-16  Ingo Feinerer  <>

	* man/: Updated documentation.

	* DESCRIPTION: Changed package name to "tm". Updated version to
	0.1 for first CRAN release.

	* inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
	list archive example.

	* inst/texts/ RSS Gmane R mailing list
	archive example.

	* R/preprocess.R (convert_mbox_eml): A simple e-mail converter
	from (several mails per box) mbox format to (single mail per file)
	eml format.

2006-12-08  Ingo Feinerer  <>

	* data/crude.rda: Rebuilt.

	* data/acq.rda: Rebuilt.

	* R/reader.R: Factored out reader and parser methods from

	* R/source.R: Factored out Source methods from aobjects.R and
	(GmaneRSource): Encapsulates Gmane R mailing list archive RSS

	* R/textdoccol.R (DirSource): Added support for recursive
	traversal of directories.

2006-12-07  Ingo Feinerer  <>

	* R/textdoccol.R ([[): Loads the document corpus automatically
	into memory upon access.
	(tm_transform, tm_filter): Removed several checks whether the
	document is already loaded ([[ ensures this now).
	(gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
	mailing list archive.

2006-12-06  Ingo Feinerer  <>

	* R/aobjects.R (TextDocument): Is now a virtual class.
	(Source): Is now a virtual class.

2006-12-05  Ingo Feinerer  <>

	* R/textdoccol.R (c): Support for an arbitrary number of document

2006-11-26  Ingo Feinerer  <>

	* R/textrepo.R: Updated TextRepository (constructor), append_elem,
	append_meta and remove_meta.

	* R/textdoccol.R: Removed modify_metadata method.

	* R/textrepo.R: Removed modify_metadata method.

	* R/textdoccol.R (remove_meta): Supports removal of document
	collection metadata and document (= in data frame) metadata.

2006-11-23  Ingo Feinerer  <>

	* R/textdoccol.R (append_doc): Bug fix for handling empty metadata.

	* data/crude.rda: Rebuilt.

	* data/acq.rda: Rebuilt.

	* inst/doc/textmin.Rnw: Updated vignette to reflect code changes.

	* R/textdoccol.R ([): Bug fix for subsetting a document
	collection's data frame.

2006-11-22  Ingo Feinerer  <>

	* R/textdoccol.R: Bug fixes in s_filter. Added full query support
	to s_filter.

	* R/textdoccol.R: Local text documents' metadata can now be copied
	to a document collection's data frame with prescind_meta.

2006-11-21  Ingo Feinerer  <>

	* R/: Text documents' slot metadata is now accessible in s_filter.

	* R/: Rewrote s_filter function (has still some restrictions).

2006-11-20  Ingo Feinerer  <>

	* R/: Various fixes in handling metadata.

	* R/: Added update mechanism for text document collections.

2006-11-19  Ingo Feinerer  <>

	* R/: Merging of document collections now creates a binary tree
	for reconstructing merged document collections.

	* R/: Redesign of metadata for document collections.

2006-11-07  Ingo Feinerer  <>

	* R/: Messages now use \code{ngettext}.

2006-11-03  Ingo Feinerer  <>

	* R/: Added functions for modifying and removing metadata.

2006-11-01  Ingo Feinerer  <>

	* man/: Updated some documentation.

	* R/: Corrected some connection issues.

	* inst/doc: Worked on the vignette.

2006-10-31  Ingo Feinerer  <>

	* inst/: Added texts and started vignette.

	* R/: Final changes based upon David's comments.

2006-10-29  Ingo Feinerer  <>

	* NAMESPACE: Corrected exports (generic methods need exportMethods

2006-10-26  Ingo Feinerer  <>

	* R/: Modified the TextDocCol constructur and various parsers. It
	is now modular and supports various file formats via plugins (see
	the new "Source" class).

2006-10-24  Ingo Feinerer  <>

	* man/: Revised documentation after previous code changes.

2006-10-23  Ingo Feinerer  <>

	* R/: Remaining changes as discussed with David.

2006-10-22  Ingo Feinerer  <>

	* R/: Some changes as suggested by David. The rest will follow
	within the next days.

2006-09-26  Ingo Feinerer  <>

	* man/: Finished documentation.

2006-09-25  Ingo Feinerer  <>

	* man/: Wrote some documentation.

2006-09-24  Ingo Feinerer  <>

	* R/: Further syntactic sugar in form of additional assignment and
	accessor methods.

2006-09-13  Ingo Feinerer  <>

	* R/: Syntactic sugar in form of "length", "show" and "summary"

2006-08-24  Ingo Feinerer  <>

	* R/: Diverse updates. Mainly on default operators ("[" or "c")
	and dissimilarities.

2006-08-12  Ingo Feinerer  <>

	* R/: Added similarity functions.

	* data/: Added english stopwords.

2006-08-07  Ingo Feinerer  <>

	* data/: Examples compiled for new features

	* R/: Changes due to new structure.

	* NAMESPACE: Corrected namespace to reflect new structure.

	* R/termdocmatrix.R: Adapted for new naming scheme.

2006-08-06  Ingo Feinerer  <>

	* R/textdoccol.R: Adapted code for new class structure. Wrote
	several transform and filter functions operating on text document
	collections (alias text document databases).

	* R/aobjects.R: Adapted class structure with inheritance,
	repositories and additional meta data. Loading files on demand is
	now possible.

2006-07-13  Ingo Feinerer  <>

	* R/: Some cosmetic cleanups.

	* inst/: Removed vignette on clustering. That and much more is now
	described in the JSS paper on text mining. Based upon that
	article an elaborated vignette will be incorporated in the future.

2006-07-01  Ingo Feinerer  <>

	* R/: Updated generic S4 methods to comply with signature changes
	in newer versions of R (> 2.3)

2006-03-12  Ingo Feinerer  <>

	* ext/R/importRIS.R: Automatic RIS import is now possible.

2006-02-14  Ingo Feinerer  <>

	* R/textdoccol.R: Added RIS HTML input format.

2006-01-21  Ingo Feinerer  <>

	* R/textdoccol.R: Removed bug that caused invalid text document
	collections when handling many input files.

2006-01-11  Ingo Feinerer  <>

	* R/textdoccol.R: Restructured and extended file import

	* inst/doc/clustering.Rnw: Adapted vignette for use with

	* man/ReutNews.Rd: Documentation for ReutNews.rda

	* data/ReutNews.rda: A tiny Reuters21578 example data set.

2005-12-22  Ingo Feinerer  <>

	* inst/doc/clustering.Rnw: Wrote a small vignette to present the
	clustering facilities of this package.

2005-12-15  Ingo Feinerer  <>

	* R/aobjects.R: Changed package document structure to avoid class
	dependency problems.

2005-12-06  Ingo Feinerer  <>

	*  Wrote a script for the ModLewis Split for the Reuters-21578 XML
	data set.

	*  Finished documentation and reordered directory structure. Now "R
	CMD check textmin" works without errors.

2005-12-04  Ingo Feinerer  <>

	* src/: Various splits can now be easily created for the
	Reuters21578 data set.

2005-12-03  Ingo Feinerer  <>

	*  Updated documentation

2005-11-30  Ingo Feinerer  <>

	*  Wrote R documentation for some classes and methods.

2005-11-19  Ingo Feinerer  <>

	* R/textdoccol.R: Constructor of textdoccol allows import of CSV
	files. See the questionnaire data/Umfrage.csv for such an example.
	We are now able to import files in Reuters-21578 XML format.

	*  Changed class interfaces in various files. Weighting of the text
	matrix is now possible.

2005-11-08  Ingo Feinerer  <>

	* R/textdoccol.R: One can build term-document matrices if
	nessecary (with buildTDM(...)) and fill the field tdm from a text
	document collection with it.

	* R/textmatrix.R: Wrote S4 class for term-document matrices.

2005-11-06  Ingo Feinerer  <>

	* R/textdoccol.R: We now can read in a whole XML file with several
	news items.

2005-11-05  Ingo Feinerer  <>

	* R/textdoccol.R: Set up an S4 class for a collection of text
	documents. A first attempt to read in XML input (like the RCV1
	set) was made.

	* R/textdocument.R: Set up an S4 class for text documents. Wrote
	some accessor functions.

	* data/newsitem.xml: Added this XML file for testing purposes. It
	contains a single news item from the Reuters Corpus Volume 1
	(RCV1) XML set.

2005-10-07  Ingo Feinerer  <>

	* R/textmatrix.R (textmatrix): Removed the transpose of the original
	textmatrix as k-means clustering provided by R (kmeans) now works on
	this textmatrix. The result is a k-means text clustering with a
	similarity measure based upon word frequences.

2005-10-05  Ingo Feinerer  <>

	* R/textmatrix.R: Adapted the preprocessing code from the R
	package "lsa" written by Fridolin Wild to build a document text matrix.

2005-10-02  Ingo Feinerer  <>

	*  Set up the R Text Mining Package infrastructure.
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge