SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC trunk/tm/ChangeLog revision 807, Sat Jan 5 10:35:53 2008 UTC
# Line 1  Line 1 
1    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/source.R (CSVSource): Now uses read.csv instead of scan
4            internally.
5    
6    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
7    
8            * R/reader.R (getReaders): Returns available reader functions.
9    
10            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
11            as default.
12    
13    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
14    
15            * R/stopwords.R (stopwords): Shortened code, removed codetools
16            variable warnings.
17    
18            * man/: Documentation for showMeta, added an example for tmMap.
19    
20            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
21            some minor typos fixed.
22    
23    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
24    
25            * R/aobjects.R (showMeta): Added method for pretty printing a
26            text document's meta data.
27    
28    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
29    
30            * R/textdoccol.R (TextDocCol): Better handling of empty
31            arguments.
32    
33            * NAMESPACE: Exported readDOC.
34    
35            * man/completeStems.Rd: Added an example.
36    
37    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
38    
39            * R/stopwords.R (stopwords): Look up .dat files at every
40            call. Allows users to modify stopword .dat files interactively.
41    
42    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
43    
44            * R/termdocmatrix.R (termFreq): Correct processing of empty
45            documents.
46    
47    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
48    
49            * man/: Updated documentation.
50    
51    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
52    
53            * R/complete.R (completeStems): Completes (heuristically) word
54            stems.
55    
56            * R/termdocmatrix.R (TermDocMatrix2): New modular
57            constructor.
58    
59            * NAMESPACE: Exported termFreq.
60    
61    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
62    
63            * R/reader.R (readDOC): Added MS Word reader (using antiword).
64    
65    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
66    
67            * R/weight.R: Weighting functions for TermDocMatrix.
68    
69    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
70    
71            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
72            functions for accessing dimension, column, and row names.
73    
74            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
75    
76    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
77    
78            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
79    
80    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
81    
82            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
83    
84    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
85    
86            * R/reader.R (readPDF): Removed manual checks for pdftotext and
87            pdfinfo. The system call gives a warning anyway.
88    
89    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
90    
91            * R/textdoccol.R (asPlain): Conversion from
92            StructuredTextDocuments to PlainTextDocuments.
93    
94    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
95    
96            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
97            for accessing term-document matrices.
98    
99            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
100            are installed.
101    
102    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
103    
104            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
105            Christian Buchta.
106    
107    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
108    
109            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
110    
111    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
112    
113            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
114    
115            * R/reader.R (readPDF): Added PDF reader.
116    
117    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
118    
119            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
120    
121            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
122    
123            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
124    
125            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
126    
127    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
128    
129            * R/distmeasure.R (dissimilarity): Replaced dists call from
130            package cba by new dist call from package proxy.
131    
132    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
133    
134            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
135    
136    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
137    
138            * R/termdocmatrix.R: require() uses the quietly option to suppress
139            loading messages.
140    
141    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
142    
143            * R/dictionary.R: Added dictionary support.
144    
145    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
146    
147            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
148            documents. This simplifies some functions, e.g., asPlain.
149    
150    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
151    
152            * inst/doc/tm.Rnw: Fixed some typos in vignette.
153    
154    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
155    
156            * R/textdoccol.R (replaceWords): Added method to replace a set of
157            words by a single word. Useful for synonyms.
158    
159    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
160    
161            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
162    
163    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
164    
165            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
166            vectors. Thanks to Ariel Maguyon for his error report.
167            (removeSparseTerms): New function to remove columns from a
168            term-document matrix exceeding a sparse factor.
169    
170    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
171    
172            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
173    
174    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
175    
176            * man/sFilter.Rd: Corrected documentation on statement format (use
177            '==' instead of '=').
178    
179    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
180    
181            * R/aobjects.R (StructuredTextDocument): Inherits from
182            TextDocument.
183    
184    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
185    
186            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
187            on sparse matrices as proposed by Martin Maechler.
188    
189    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
190    
191            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
192            \pkg{filehash} version makes them deprecated.
193    
194    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
195    
196            * R/termdocmatrix.R (textvector): Stemming is now performed before
197            erasing stopwords.
198            (weightMatrix): Adapted to handle sparse matrices.
199            (TermDocMatrix): Sparse matrix is now efficiently built by
200            direct stepwise insertion of row values into it.
201    
202    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
203    
204            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
205            due to ongoing problems. For our purposes the latter is as useful
206            as the replaced package.
207    
208    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
209    
210            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
211    
212            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
213    
214    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
215    
216            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
217            languages with available stopwords.
218    
219    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
220    
221            * inst/doc/tm.Rnw: Minor corrections in the vignette.
222    
223    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
224    
225            * DESCRIPTION: Update to version 0.2, since a lot of new features
226            have been integrated.
227    
228            * inst/stopwords: Updated existing stopwords and added stopwords
229            for various other languages.
230    
231    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
232    
233            * man/: Updated documentation.
234    
235            * Work/testDb.R: Script to test database stuff.
236    
237            * R/: Fixed various database related bugs. Seems to be rather
238            useable now, i.e., consider as alpha status for now.
239    
240    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
241    
242            * R/: Fixed some bugs related to database support.
243    
244    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
245    
246            * man/: Added a lot of examples to the manuals.
247    
248    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
249    
250            * man/: Updated parts of the documentation.
251    
252            * R/textdoccol.R (asPlain): Added conversion from newsgroup
253            documents to plain text documents.
254    
255    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
256    
257            * R/textdoccol.R: Finished experimental database support. Not yet
258            intensively tested.
259    
260            * R/source.R: Now each source has a default reader.
261    
262            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
263            class anymore.
264    
265            * R/plaintextdoc.R: Custom show method for plain text documents.
266    
267            * R/aobjects.R: Added a class for structured text documents.
268    
269            * R/reader.R: Replaced remaining \code{parser} occurrences with
270            \code{reader}.
271    
272            * R/textdoccol.R (summary): Indent tags.
273    
274            * R/textdoccol.R (removePunctuation): Transform method to remove
275            punctuation marks.
276    
277    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
278    
279            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
280            using prescindMeta().
281    
282    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
283    
284            * R/textdoccol.R: Improved database support.
285    
286    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
287    
288            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
289    
290            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
291            language code.
292    
293            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
294            into parserControl argument.
295    
296            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
297    
298    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
299    
300            * Work/tmDataSetup.R: The datasets acq and crude can now be
301            created on the fly.
302    
303            * R/stopwords.R: Introduced a function returning the stopwords for
304            a given language (English, German and French at the moment)
305    
306            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
307            otherwise falls back to Snowball package.
308    
309    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
310    
311            * man/dissimilarity-methods.Rd: Make clear that any method offered
312            by "dists" from package "cba" can be used.
313    
314    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
315    
316            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
317            to Kurt's latex suggestion. Removed points and underscores in
318            variable names for consistent naming.
319    
320            * DESCRIPTION: Update to version 0.1-2.
321    
322            * man/TextRepository.Rd: Fixed bug in documentation.
323    
324    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
325    
326            * DESCRIPTION: Update to version 0.1-1.
327    
328    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
329    
330            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
331            wordStem.
332    
333    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
334    
335            * R/: Changes due to Kurt's review.
336    
337    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
338    
339            * R/: Implemented improvements based upon comments by David
340            Meyer.
341    
342    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
343    
344            * inst/doc/: Rewrote vignette.
345    
346            * man/: Improved documentation.
347    
348    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
349    
350            * man/: Updated documentation.
351    
352            * DESCRIPTION: Changed package name to "tm". Updated version to
353            0.1 for first CRAN release.
354    
355            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
356            list archive example.
357    
358            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
359            archive example.
360    
361            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
362            from (several mails per box) mbox format to (single mail per file)
363            eml format.
364    
365    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
366    
367            * data/crude.rda: Rebuilt.
368    
369            * data/acq.rda: Rebuilt.
370    
371            * R/reader.R: Factored out reader and parser methods from
372            textdoccol.R.
373    
374            * R/source.R: Factored out Source methods from aobjects.R and
375            textdoccol.R.
376            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
377            feeds.
378    
379            * R/textdoccol.R (DirSource): Added support for recursive
380            traversal of directories.
381    
382    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
383    
384            * R/textdoccol.R ([[): Loads the document corpus automatically
385            into memory upon access.
386            (tm_transform, tm_filter): Removed several checks whether the
387            document is already loaded ([[ ensures this now).
388            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
389            mailing list archive.
390    
391    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
392    
393            * R/aobjects.R (TextDocument): Is now a virtual class.
394            (Source): Is now a virtual class.
395    
396    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
397    
398            * R/textdoccol.R (c): Support for an arbitrary number of document
399            collections.
400    
401    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
404            append_meta and remove_meta.
405    
406            * R/textdoccol.R: Removed modify_metadata method.
407    
408            * R/textrepo.R: Removed modify_metadata method.
409    
410            * R/textdoccol.R (remove_meta): Supports removal of document
411            collection metadata and document (= in data frame) metadata.
412    
413    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
414    
415            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
416    
417            * data/crude.rda: Rebuilt.
418    
419            * data/acq.rda: Rebuilt.
420    
421            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
422    
423            * R/textdoccol.R ([): Bug fix for subsetting a document
424            collection's data frame.
425    
426    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
427    
428            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
429            to s_filter.
430    
431            * R/textdoccol.R: Local text documents' metadata can now be copied
432            to a document collection's data frame with prescind_meta.
433    
434    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * R/: Text documents' slot metadata is now accessible in s_filter.
437    
438            * R/: Rewrote s_filter function (has still some restrictions).
439    
440    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
441    
442            * R/: Various fixes in handling metadata.
443    
444            * R/: Added update mechanism for text document collections.
445    
446    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
447    
448            * R/: Merging of document collections now creates a binary tree
449            for reconstructing merged document collections.
450    
451            * R/: Redesign of metadata for document collections.
452    
453    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/: Messages now use \code{ngettext}.
456    
457    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
458    
459            * R/: Added functions for modifying and removing metadata.
460    
461    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
462    
463            * man/: Updated some documentation.
464    
465            * R/: Corrected some connection issues.
466    
467            * inst/doc: Worked on the vignette.
468    
469    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
470    
471            * inst/: Added texts and started vignette.
472    
473            * R/: Final changes based upon David's comments.
474    
475    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
476    
477            * NAMESPACE: Corrected exports (generic methods need exportMethods
478            directives!).
479    
480    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
481    
482            * R/: Modified the TextDocCol constructur and various parsers. It
483            is now modular and supports various file formats via plugins (see
484            the new "Source" class).
485    
486    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
487    
488            * man/: Revised documentation after previous code changes.
489    
490    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
491    
492            * R/: Remaining changes as discussed with David.
493    
494    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * R/: Some changes as suggested by David. The rest will follow
497            within the next days.
498    
499    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
500    
501            * man/: Finished documentation.
502    
503    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
504    
505            * man/: Wrote some documentation.
506    
507    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
508    
509            * R/: Further syntactic sugar in form of additional assignment and
510            accessor methods.
511    
512    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
513    
514            * R/: Syntactic sugar in form of "length", "show" and "summary"
515            operators.
516    
517    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * R/: Diverse updates. Mainly on default operators ("[" or "c")
520            and dissimilarities.
521    
522    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
523    
524            * R/: Added similarity functions.
525    
526            * data/: Added english stopwords.
527    
528    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
529    
530            * data/: Examples compiled for new features
531    
532            * R/: Changes due to new structure.
533    
534            * NAMESPACE: Corrected namespace to reflect new structure.
535    
536            * R/termdocmatrix.R: Adapted for new naming scheme.
537    
538    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
539    
540            * R/textdoccol.R: Adapted code for new class structure. Wrote
541            several transform and filter functions operating on text document
542            collections (alias text document databases).
543    
544            * R/aobjects.R: Adapted class structure with inheritance,
545            repositories and additional meta data. Loading files on demand is
546            now possible.
547    
548    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
549    
550            * R/: Some cosmetic cleanups.
551    
552            * inst/: Removed vignette on clustering. That and much more is now
553            described in the JSS paper on text mining. Based upon that
554            article an elaborated vignette will be incorporated in the future.
555    
556    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
557    
558            * R/: Updated generic S4 methods to comply with signature changes
559            in newer versions of R (> 2.3)
560    
561    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
562    
563            * ext/R/importRIS.R: Automatic RIS import is now possible.
564    
565    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
566    
567            * R/textdoccol.R: Added RIS HTML input format.
568    
569    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
570    
571            * R/textdoccol.R: Removed bug that caused invalid text document
572            collections when handling many input files.
573    
574    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
575    
576            * R/textdoccol.R: Restructured and extended file import
577            mechanism.
578    
579            * inst/doc/clustering.Rnw: Adapted vignette for use with
580            ReutNews.rda
581    
582            * man/ReutNews.Rd: Documentation for ReutNews.rda
583    
584            * data/ReutNews.rda: A tiny Reuters21578 example data set.
585    
586    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
587    
588            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
589            clustering facilities of this package.
590    
591    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
592    
593            * R/aobjects.R: Changed package document structure to avoid class
594            dependency problems.
595    
596  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
597    
598            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
599            data set.
600    
601          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
602          CMD check textmin" works without errors.          CMD check textmin" works without errors.
603    

Legend:
Removed from v.28  
changed lines
  Added in v.807

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge