SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC trunk/tm/ChangeLog revision 797, Sun Nov 18 09:54:14 2007 UTC
# Line 1  Line 1 
1    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/stopwords.R (stopwords): Look up .dat files at every
4            call. Allows users to modify stopword .dat files interactively.
5    
6    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
7    
8            * R/termdocmatrix.R (termFreq): Correct processing of empty
9            documents.
10    
11    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
12    
13            * man/: Updated documentation.
14    
15    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
16    
17            * R/complete.R (completeStems): Completes (heuristically) word
18            stems.
19    
20            * R/termdocmatrix.R (TermDocMatrix2): New modular
21            constructor.
22    
23            * NAMESPACE: Exported termFreq.
24    
25    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
26    
27            * R/reader.R (readDOC): Added MS Word reader (using antiword).
28    
29    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
30    
31            * R/weight.R: Weighting functions for TermDocMatrix.
32    
33    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
34    
35            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
36            functions for accessing dimension, column, and row names.
37    
38            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
39    
40    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
41    
42            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
43    
44    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
45    
46            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
47    
48    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
49    
50            * R/reader.R (readPDF): Removed manual checks for pdftotext and
51            pdfinfo. The system call gives a warning anyway.
52    
53    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
54    
55            * R/textdoccol.R (asPlain): Conversion from
56            StructuredTextDocuments to PlainTextDocuments.
57    
58    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
59    
60            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
61            for accessing term-document matrices.
62    
63            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
64            are installed.
65    
66    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
67    
68            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
69            Christian Buchta.
70    
71    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
72    
73            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
74    
75    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
76    
77            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
78    
79            * R/reader.R (readPDF): Added PDF reader.
80    
81    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
82    
83            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
84    
85            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
86    
87            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
88    
89            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
90    
91    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
92    
93            * R/distmeasure.R (dissimilarity): Replaced dists call from
94            package cba by new dist call from package proxy.
95    
96    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
97    
98            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
99    
100    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
101    
102            * R/termdocmatrix.R: require() uses the quietly option to suppress
103            loading messages.
104    
105    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
106    
107            * R/dictionary.R: Added dictionary support.
108    
109    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
110    
111            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
112            documents. This simplifies some functions, e.g., asPlain.
113    
114    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
115    
116            * inst/doc/tm.Rnw: Fixed some typos in vignette.
117    
118    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
119    
120            * R/textdoccol.R (replaceWords): Added method to replace a set of
121            words by a single word. Useful for synonyms.
122    
123    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
124    
125            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
126    
127    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
128    
129            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
130            vectors. Thanks to Ariel Maguyon for his error report.
131            (removeSparseTerms): New function to remove columns from a
132            term-document matrix exceeding a sparse factor.
133    
134    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
135    
136            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
137    
138    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
139    
140            * man/sFilter.Rd: Corrected documentation on statement format (use
141            '==' instead of '=').
142    
143    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
144    
145            * R/aobjects.R (StructuredTextDocument): Inherits from
146            TextDocument.
147    
148    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
149    
150            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
151            on sparse matrices as proposed by Martin Maechler.
152    
153    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
154    
155            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
156            \pkg{filehash} version makes them deprecated.
157    
158    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
159    
160            * R/termdocmatrix.R (textvector): Stemming is now performed before
161            erasing stopwords.
162            (weightMatrix): Adapted to handle sparse matrices.
163            (TermDocMatrix): Sparse matrix is now efficiently built by
164            direct stepwise insertion of row values into it.
165    
166    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
167    
168            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
169            due to ongoing problems. For our purposes the latter is as useful
170            as the replaced package.
171    
172    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
173    
174            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
175    
176            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
177    
178    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
179    
180            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
181            languages with available stopwords.
182    
183    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
184    
185            * inst/doc/tm.Rnw: Minor corrections in the vignette.
186    
187    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
188    
189            * DESCRIPTION: Update to version 0.2, since a lot of new features
190            have been integrated.
191    
192            * inst/stopwords: Updated existing stopwords and added stopwords
193            for various other languages.
194    
195    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
196    
197            * man/: Updated documentation.
198    
199            * Work/testDb.R: Script to test database stuff.
200    
201            * R/: Fixed various database related bugs. Seems to be rather
202            useable now, i.e., consider as alpha status for now.
203    
204    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
205    
206            * R/: Fixed some bugs related to database support.
207    
208    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
209    
210            * man/: Added a lot of examples to the manuals.
211    
212    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
213    
214            * man/: Updated parts of the documentation.
215    
216            * R/textdoccol.R (asPlain): Added conversion from newsgroup
217            documents to plain text documents.
218    
219    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
220    
221            * R/textdoccol.R: Finished experimental database support. Not yet
222            intensively tested.
223    
224            * R/source.R: Now each source has a default reader.
225    
226            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
227            class anymore.
228    
229            * R/plaintextdoc.R: Custom show method for plain text documents.
230    
231            * R/aobjects.R: Added a class for structured text documents.
232    
233            * R/reader.R: Replaced remaining \code{parser} occurrences with
234            \code{reader}.
235    
236            * R/textdoccol.R (summary): Indent tags.
237    
238            * R/textdoccol.R (removePunctuation): Transform method to remove
239            punctuation marks.
240    
241    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
242    
243            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
244            using prescindMeta().
245    
246    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
247    
248            * R/textdoccol.R: Improved database support.
249    
250    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
251    
252            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
253    
254            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
255            language code.
256    
257            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
258            into parserControl argument.
259    
260            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
261    
262    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
263    
264            * Work/tmDataSetup.R: The datasets acq and crude can now be
265            created on the fly.
266    
267            * R/stopwords.R: Introduced a function returning the stopwords for
268            a given language (English, German and French at the moment)
269    
270            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
271            otherwise falls back to Snowball package.
272    
273    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
274    
275            * man/dissimilarity-methods.Rd: Make clear that any method offered
276            by "dists" from package "cba" can be used.
277    
278    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
279    
280            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
281            to Kurt's latex suggestion. Removed points and underscores in
282            variable names for consistent naming.
283    
284            * DESCRIPTION: Update to version 0.1-2.
285    
286            * man/TextRepository.Rd: Fixed bug in documentation.
287    
288    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
289    
290            * DESCRIPTION: Update to version 0.1-1.
291    
292    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
293    
294            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
295            wordStem.
296    
297    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
298    
299            * R/: Changes due to Kurt's review.
300    
301    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
302    
303            * R/: Implemented improvements based upon comments by David
304            Meyer.
305    
306    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
307    
308            * inst/doc/: Rewrote vignette.
309    
310            * man/: Improved documentation.
311    
312    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
313    
314            * man/: Updated documentation.
315    
316            * DESCRIPTION: Changed package name to "tm". Updated version to
317            0.1 for first CRAN release.
318    
319            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
320            list archive example.
321    
322            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
323            archive example.
324    
325            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
326            from (several mails per box) mbox format to (single mail per file)
327            eml format.
328    
329    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
330    
331            * data/crude.rda: Rebuilt.
332    
333            * data/acq.rda: Rebuilt.
334    
335            * R/reader.R: Factored out reader and parser methods from
336            textdoccol.R.
337    
338            * R/source.R: Factored out Source methods from aobjects.R and
339            textdoccol.R.
340            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
341            feeds.
342    
343            * R/textdoccol.R (DirSource): Added support for recursive
344            traversal of directories.
345    
346    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
347    
348            * R/textdoccol.R ([[): Loads the document corpus automatically
349            into memory upon access.
350            (tm_transform, tm_filter): Removed several checks whether the
351            document is already loaded ([[ ensures this now).
352            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
353            mailing list archive.
354    
355    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
356    
357            * R/aobjects.R (TextDocument): Is now a virtual class.
358            (Source): Is now a virtual class.
359    
360    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
361    
362            * R/textdoccol.R (c): Support for an arbitrary number of document
363            collections.
364    
365    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
366    
367            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
368            append_meta and remove_meta.
369    
370            * R/textdoccol.R: Removed modify_metadata method.
371    
372            * R/textrepo.R: Removed modify_metadata method.
373    
374            * R/textdoccol.R (remove_meta): Supports removal of document
375            collection metadata and document (= in data frame) metadata.
376    
377    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
378    
379            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
380    
381            * data/crude.rda: Rebuilt.
382    
383            * data/acq.rda: Rebuilt.
384    
385            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
386    
387            * R/textdoccol.R ([): Bug fix for subsetting a document
388            collection's data frame.
389    
390    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
391    
392            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
393            to s_filter.
394    
395            * R/textdoccol.R: Local text documents' metadata can now be copied
396            to a document collection's data frame with prescind_meta.
397    
398    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
399    
400            * R/: Text documents' slot metadata is now accessible in s_filter.
401    
402            * R/: Rewrote s_filter function (has still some restrictions).
403    
404    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
405    
406            * R/: Various fixes in handling metadata.
407    
408            * R/: Added update mechanism for text document collections.
409    
410    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
411    
412            * R/: Merging of document collections now creates a binary tree
413            for reconstructing merged document collections.
414    
415            * R/: Redesign of metadata for document collections.
416    
417    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
418    
419            * R/: Messages now use \code{ngettext}.
420    
421    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
422    
423            * R/: Added functions for modifying and removing metadata.
424    
425    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
426    
427            * man/: Updated some documentation.
428    
429            * R/: Corrected some connection issues.
430    
431            * inst/doc: Worked on the vignette.
432    
433    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
434    
435            * inst/: Added texts and started vignette.
436    
437            * R/: Final changes based upon David's comments.
438    
439    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * NAMESPACE: Corrected exports (generic methods need exportMethods
442            directives!).
443    
444    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
445    
446            * R/: Modified the TextDocCol constructur and various parsers. It
447            is now modular and supports various file formats via plugins (see
448            the new "Source" class).
449    
450    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
451    
452            * man/: Revised documentation after previous code changes.
453    
454    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * R/: Remaining changes as discussed with David.
457    
458    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/: Some changes as suggested by David. The rest will follow
461            within the next days.
462    
463    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * man/: Finished documentation.
466    
467    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
468    
469            * man/: Wrote some documentation.
470    
471    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
472    
473            * R/: Further syntactic sugar in form of additional assignment and
474            accessor methods.
475    
476    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
477    
478            * R/: Syntactic sugar in form of "length", "show" and "summary"
479            operators.
480    
481    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * R/: Diverse updates. Mainly on default operators ("[" or "c")
484            and dissimilarities.
485    
486    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
487    
488            * R/: Added similarity functions.
489    
490            * data/: Added english stopwords.
491    
492    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * data/: Examples compiled for new features
495    
496            * R/: Changes due to new structure.
497    
498            * NAMESPACE: Corrected namespace to reflect new structure.
499    
500            * R/termdocmatrix.R: Adapted for new naming scheme.
501    
502    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
503    
504            * R/textdoccol.R: Adapted code for new class structure. Wrote
505            several transform and filter functions operating on text document
506            collections (alias text document databases).
507    
508            * R/aobjects.R: Adapted class structure with inheritance,
509            repositories and additional meta data. Loading files on demand is
510            now possible.
511    
512    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
513    
514            * R/: Some cosmetic cleanups.
515    
516            * inst/: Removed vignette on clustering. That and much more is now
517            described in the JSS paper on text mining. Based upon that
518            article an elaborated vignette will be incorporated in the future.
519    
520    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
521    
522            * R/: Updated generic S4 methods to comply with signature changes
523            in newer versions of R (> 2.3)
524    
525    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
526    
527            * ext/R/importRIS.R: Automatic RIS import is now possible.
528    
529    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
530    
531            * R/textdoccol.R: Added RIS HTML input format.
532    
533    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
534    
535            * R/textdoccol.R: Removed bug that caused invalid text document
536            collections when handling many input files.
537    
538    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
539    
540            * R/textdoccol.R: Restructured and extended file import
541            mechanism.
542    
543            * inst/doc/clustering.Rnw: Adapted vignette for use with
544            ReutNews.rda
545    
546            * man/ReutNews.Rd: Documentation for ReutNews.rda
547    
548            * data/ReutNews.rda: A tiny Reuters21578 example data set.
549    
550    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
551    
552            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
553            clustering facilities of this package.
554    
555    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
556    
557            * R/aobjects.R: Changed package document structure to avoid class
558            dependency problems.
559    
560  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
561    
562            * Wrote a script for the ModLewis Split for the Reuters-21578 XML
563            data set.
564    
565          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
566          CMD check textmin" works without errors.          CMD check textmin" works without errors.
567    

Legend:
Removed from v.28  
changed lines
  Added in v.797

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge