SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC trunk/tm/ChangeLog revision 806, Wed Jan 2 10:29:14 2008 UTC
# Line 1  Line 1 
1    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/reader.R (getReaders): Returns available reader functions.
4    
5            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
6            as default.
7    
8    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
9    
10            * R/stopwords.R (stopwords): Shortened code, removed codetools
11            variable warnings.
12    
13            * man/: Documentation for showMeta, added an example for tmMap.
14    
15            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
16            some minor typos fixed.
17    
18    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
19    
20            * R/aobjects.R (showMeta): Added method for pretty printing a
21            text document's meta data.
22    
23    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
24    
25            * R/textdoccol.R (TextDocCol): Better handling of empty
26            arguments.
27    
28            * NAMESPACE: Exported readDOC.
29    
30            * man/completeStems.Rd: Added an example.
31    
32    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
33    
34            * R/stopwords.R (stopwords): Look up .dat files at every
35            call. Allows users to modify stopword .dat files interactively.
36    
37    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
38    
39            * R/termdocmatrix.R (termFreq): Correct processing of empty
40            documents.
41    
42    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
43    
44            * man/: Updated documentation.
45    
46    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
47    
48            * R/complete.R (completeStems): Completes (heuristically) word
49            stems.
50    
51            * R/termdocmatrix.R (TermDocMatrix2): New modular
52            constructor.
53    
54            * NAMESPACE: Exported termFreq.
55    
56    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
57    
58            * R/reader.R (readDOC): Added MS Word reader (using antiword).
59    
60    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
61    
62            * R/weight.R: Weighting functions for TermDocMatrix.
63    
64    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
65    
66            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
67            functions for accessing dimension, column, and row names.
68    
69            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
70    
71    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
72    
73            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
74    
75    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
76    
77            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
78    
79    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
80    
81            * R/reader.R (readPDF): Removed manual checks for pdftotext and
82            pdfinfo. The system call gives a warning anyway.
83    
84    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
85    
86            * R/textdoccol.R (asPlain): Conversion from
87            StructuredTextDocuments to PlainTextDocuments.
88    
89    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
90    
91            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
92            for accessing term-document matrices.
93    
94            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
95            are installed.
96    
97    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
98    
99            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
100            Christian Buchta.
101    
102    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
103    
104            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
105    
106    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
107    
108            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
109    
110            * R/reader.R (readPDF): Added PDF reader.
111    
112    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
113    
114            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
115    
116            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
117    
118            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
119    
120            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
121    
122    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
123    
124            * R/distmeasure.R (dissimilarity): Replaced dists call from
125            package cba by new dist call from package proxy.
126    
127    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
128    
129            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
130    
131    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
132    
133            * R/termdocmatrix.R: require() uses the quietly option to suppress
134            loading messages.
135    
136    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
137    
138            * R/dictionary.R: Added dictionary support.
139    
140    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
141    
142            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
143            documents. This simplifies some functions, e.g., asPlain.
144    
145    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
146    
147            * inst/doc/tm.Rnw: Fixed some typos in vignette.
148    
149    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
150    
151            * R/textdoccol.R (replaceWords): Added method to replace a set of
152            words by a single word. Useful for synonyms.
153    
154    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
155    
156            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
157    
158    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
159    
160            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
161            vectors. Thanks to Ariel Maguyon for his error report.
162            (removeSparseTerms): New function to remove columns from a
163            term-document matrix exceeding a sparse factor.
164    
165    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
166    
167            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
168    
169    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
170    
171            * man/sFilter.Rd: Corrected documentation on statement format (use
172            '==' instead of '=').
173    
174    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
175    
176            * R/aobjects.R (StructuredTextDocument): Inherits from
177            TextDocument.
178    
179    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
180    
181            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
182            on sparse matrices as proposed by Martin Maechler.
183    
184    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
185    
186            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
187            \pkg{filehash} version makes them deprecated.
188    
189    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
190    
191            * R/termdocmatrix.R (textvector): Stemming is now performed before
192            erasing stopwords.
193            (weightMatrix): Adapted to handle sparse matrices.
194            (TermDocMatrix): Sparse matrix is now efficiently built by
195            direct stepwise insertion of row values into it.
196    
197    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
198    
199            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
200            due to ongoing problems. For our purposes the latter is as useful
201            as the replaced package.
202    
203    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
204    
205            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
206    
207            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
208    
209    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
210    
211            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
212            languages with available stopwords.
213    
214    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
215    
216            * inst/doc/tm.Rnw: Minor corrections in the vignette.
217    
218    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
219    
220            * DESCRIPTION: Update to version 0.2, since a lot of new features
221            have been integrated.
222    
223            * inst/stopwords: Updated existing stopwords and added stopwords
224            for various other languages.
225    
226    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
227    
228            * man/: Updated documentation.
229    
230            * Work/testDb.R: Script to test database stuff.
231    
232            * R/: Fixed various database related bugs. Seems to be rather
233            useable now, i.e., consider as alpha status for now.
234    
235    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
236    
237            * R/: Fixed some bugs related to database support.
238    
239    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
240    
241            * man/: Added a lot of examples to the manuals.
242    
243    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
244    
245            * man/: Updated parts of the documentation.
246    
247            * R/textdoccol.R (asPlain): Added conversion from newsgroup
248            documents to plain text documents.
249    
250    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
251    
252            * R/textdoccol.R: Finished experimental database support. Not yet
253            intensively tested.
254    
255            * R/source.R: Now each source has a default reader.
256    
257            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
258            class anymore.
259    
260            * R/plaintextdoc.R: Custom show method for plain text documents.
261    
262            * R/aobjects.R: Added a class for structured text documents.
263    
264            * R/reader.R: Replaced remaining \code{parser} occurrences with
265            \code{reader}.
266    
267            * R/textdoccol.R (summary): Indent tags.
268    
269            * R/textdoccol.R (removePunctuation): Transform method to remove
270            punctuation marks.
271    
272    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
273    
274            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
275            using prescindMeta().
276    
277    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
278    
279            * R/textdoccol.R: Improved database support.
280    
281    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
282    
283            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
284    
285            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
286            language code.
287    
288            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
289            into parserControl argument.
290    
291            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
292    
293    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
294    
295            * Work/tmDataSetup.R: The datasets acq and crude can now be
296            created on the fly.
297    
298            * R/stopwords.R: Introduced a function returning the stopwords for
299            a given language (English, German and French at the moment)
300    
301            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
302            otherwise falls back to Snowball package.
303    
304    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
305    
306            * man/dissimilarity-methods.Rd: Make clear that any method offered
307            by "dists" from package "cba" can be used.
308    
309    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
310    
311            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
312            to Kurt's latex suggestion. Removed points and underscores in
313            variable names for consistent naming.
314    
315            * DESCRIPTION: Update to version 0.1-2.
316    
317            * man/TextRepository.Rd: Fixed bug in documentation.
318    
319    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
320    
321            * DESCRIPTION: Update to version 0.1-1.
322    
323    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
324    
325            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
326            wordStem.
327    
328    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
329    
330            * R/: Changes due to Kurt's review.
331    
332    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
333    
334            * R/: Implemented improvements based upon comments by David
335            Meyer.
336    
337    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
338    
339            * inst/doc/: Rewrote vignette.
340    
341            * man/: Improved documentation.
342    
343    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
344    
345            * man/: Updated documentation.
346    
347            * DESCRIPTION: Changed package name to "tm". Updated version to
348            0.1 for first CRAN release.
349    
350            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
351            list archive example.
352    
353            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
354            archive example.
355    
356            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
357            from (several mails per box) mbox format to (single mail per file)
358            eml format.
359    
360    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
361    
362            * data/crude.rda: Rebuilt.
363    
364            * data/acq.rda: Rebuilt.
365    
366            * R/reader.R: Factored out reader and parser methods from
367            textdoccol.R.
368    
369            * R/source.R: Factored out Source methods from aobjects.R and
370            textdoccol.R.
371            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
372            feeds.
373    
374            * R/textdoccol.R (DirSource): Added support for recursive
375            traversal of directories.
376    
377    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
378    
379            * R/textdoccol.R ([[): Loads the document corpus automatically
380            into memory upon access.
381            (tm_transform, tm_filter): Removed several checks whether the
382            document is already loaded ([[ ensures this now).
383            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
384            mailing list archive.
385    
386    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
387    
388            * R/aobjects.R (TextDocument): Is now a virtual class.
389            (Source): Is now a virtual class.
390    
391    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
392    
393            * R/textdoccol.R (c): Support for an arbitrary number of document
394            collections.
395    
396    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
397    
398            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
399            append_meta and remove_meta.
400    
401            * R/textdoccol.R: Removed modify_metadata method.
402    
403            * R/textrepo.R: Removed modify_metadata method.
404    
405            * R/textdoccol.R (remove_meta): Supports removal of document
406            collection metadata and document (= in data frame) metadata.
407    
408    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
409    
410            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
411    
412            * data/crude.rda: Rebuilt.
413    
414            * data/acq.rda: Rebuilt.
415    
416            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
417    
418            * R/textdoccol.R ([): Bug fix for subsetting a document
419            collection's data frame.
420    
421    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
422    
423            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
424            to s_filter.
425    
426            * R/textdoccol.R: Local text documents' metadata can now be copied
427            to a document collection's data frame with prescind_meta.
428    
429    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
430    
431            * R/: Text documents' slot metadata is now accessible in s_filter.
432    
433            * R/: Rewrote s_filter function (has still some restrictions).
434    
435    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
436    
437            * R/: Various fixes in handling metadata.
438    
439            * R/: Added update mechanism for text document collections.
440    
441    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
442    
443            * R/: Merging of document collections now creates a binary tree
444            for reconstructing merged document collections.
445    
446            * R/: Redesign of metadata for document collections.
447    
448    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
449    
450            * R/: Messages now use \code{ngettext}.
451    
452    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
453    
454            * R/: Added functions for modifying and removing metadata.
455    
456    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
457    
458            * man/: Updated some documentation.
459    
460            * R/: Corrected some connection issues.
461    
462            * inst/doc: Worked on the vignette.
463    
464    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
465    
466            * inst/: Added texts and started vignette.
467    
468            * R/: Final changes based upon David's comments.
469    
470    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
471    
472            * NAMESPACE: Corrected exports (generic methods need exportMethods
473            directives!).
474    
475    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
476    
477            * R/: Modified the TextDocCol constructur and various parsers. It
478            is now modular and supports various file formats via plugins (see
479            the new "Source" class).
480    
481    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * man/: Revised documentation after previous code changes.
484    
485    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
486    
487            * R/: Remaining changes as discussed with David.
488    
489    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
490    
491            * R/: Some changes as suggested by David. The rest will follow
492            within the next days.
493    
494    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * man/: Finished documentation.
497    
498    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * man/: Wrote some documentation.
501    
502    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
503    
504            * R/: Further syntactic sugar in form of additional assignment and
505            accessor methods.
506    
507    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
508    
509            * R/: Syntactic sugar in form of "length", "show" and "summary"
510            operators.
511    
512    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
513    
514            * R/: Diverse updates. Mainly on default operators ("[" or "c")
515            and dissimilarities.
516    
517    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * R/: Added similarity functions.
520    
521            * data/: Added english stopwords.
522    
523    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
524    
525            * data/: Examples compiled for new features
526    
527            * R/: Changes due to new structure.
528    
529            * NAMESPACE: Corrected namespace to reflect new structure.
530    
531            * R/termdocmatrix.R: Adapted for new naming scheme.
532    
533    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
534    
535            * R/textdoccol.R: Adapted code for new class structure. Wrote
536            several transform and filter functions operating on text document
537            collections (alias text document databases).
538    
539            * R/aobjects.R: Adapted class structure with inheritance,
540            repositories and additional meta data. Loading files on demand is
541            now possible.
542    
543    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * R/: Some cosmetic cleanups.
546    
547            * inst/: Removed vignette on clustering. That and much more is now
548            described in the JSS paper on text mining. Based upon that
549            article an elaborated vignette will be incorporated in the future.
550    
551    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
552    
553            * R/: Updated generic S4 methods to comply with signature changes
554            in newer versions of R (> 2.3)
555    
556    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
557    
558            * ext/R/importRIS.R: Automatic RIS import is now possible.
559    
560    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
561    
562            * R/textdoccol.R: Added RIS HTML input format.
563    
564    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
565    
566            * R/textdoccol.R: Removed bug that caused invalid text document
567            collections when handling many input files.
568    
569    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
570    
571            * R/textdoccol.R: Restructured and extended file import
572            mechanism.
573    
574            * inst/doc/clustering.Rnw: Adapted vignette for use with
575            ReutNews.rda
576    
577            * man/ReutNews.Rd: Documentation for ReutNews.rda
578    
579            * data/ReutNews.rda: A tiny Reuters21578 example data set.
580    
581    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
582    
583            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
584            clustering facilities of this package.
585    
586    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
587    
588            * R/aobjects.R: Changed package document structure to avoid class
589            dependency problems.
590    
591  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
592    
593            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
594            data set.
595    
596          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
597          CMD check textmin" works without errors.          CMD check textmin" works without errors.
598    

Legend:
Removed from v.28  
changed lines
  Added in v.806

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge