SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 20, Tue Nov 8 16:40:52 2005 UTC trunk/tm/ChangeLog revision 795, Sat Oct 27 09:14:35 2007 UTC
# Line 1  Line 1 
1    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * man/: Updated documentation.
4    
5    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
6    
7            * R/complete.R (completeStems): Completes (heuristically) word
8            stems.
9    
10            * R/termdocmatrix.R (TermDocMatrix2): New modular
11            constructor.
12    
13            * NAMESPACE: Exported termFreq.
14    
15    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
16    
17            * R/reader.R (readDOC): Added MS Word reader (using antiword).
18    
19    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
20    
21            * R/weight.R: Weighting functions for TermDocMatrix.
22    
23    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
24    
25            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
26            functions for accessing dimension, column, and row names.
27    
28            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
29    
30    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
31    
32            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
33    
34    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
35    
36            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
37    
38    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
39    
40            * R/reader.R (readPDF): Removed manual checks for pdftotext and
41            pdfinfo. The system call gives a warning anyway.
42    
43    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
44    
45            * R/textdoccol.R (asPlain): Conversion from
46            StructuredTextDocuments to PlainTextDocuments.
47    
48    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
49    
50            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
51            for accessing term-document matrices.
52    
53            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
54            are installed.
55    
56    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
57    
58            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
59            Christian Buchta.
60    
61    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
62    
63            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
64    
65    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
66    
67            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
68    
69            * R/reader.R (readPDF): Added PDF reader.
70    
71    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
72    
73            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
74    
75            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
76    
77            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
78    
79            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
80    
81    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
82    
83            * R/distmeasure.R (dissimilarity): Replaced dists call from
84            package cba by new dist call from package proxy.
85    
86    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
87    
88            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
89    
90    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
91    
92            * R/termdocmatrix.R: require() uses the quietly option to suppress
93            loading messages.
94    
95    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
96    
97            * R/dictionary.R: Added dictionary support.
98    
99    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
100    
101            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
102            documents. This simplifies some functions, e.g., asPlain.
103    
104    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
105    
106            * inst/doc/tm.Rnw: Fixed some typos in vignette.
107    
108    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
109    
110            * R/textdoccol.R (replaceWords): Added method to replace a set of
111            words by a single word. Useful for synonyms.
112    
113    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
114    
115            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
116    
117    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
118    
119            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
120            vectors. Thanks to Ariel Maguyon for his error report.
121            (removeSparseTerms): New function to remove columns from a
122            term-document matrix exceeding a sparse factor.
123    
124    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
125    
126            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
127    
128    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
129    
130            * man/sFilter.Rd: Corrected documentation on statement format (use
131            '==' instead of '=').
132    
133    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
134    
135            * R/aobjects.R (StructuredTextDocument): Inherits from
136            TextDocument.
137    
138    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
139    
140            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
141            on sparse matrices as proposed by Martin Maechler.
142    
143    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
144    
145            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
146            \pkg{filehash} version makes them deprecated.
147    
148    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
149    
150            * R/termdocmatrix.R (textvector): Stemming is now performed before
151            erasing stopwords.
152            (weightMatrix): Adapted to handle sparse matrices.
153            (TermDocMatrix): Sparse matrix is now efficiently built by
154            direct stepwise insertion of row values into it.
155    
156    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
157    
158            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
159            due to ongoing problems. For our purposes the latter is as useful
160            as the replaced package.
161    
162    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
163    
164            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
165    
166            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
167    
168    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
169    
170            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
171            languages with available stopwords.
172    
173    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
174    
175            * inst/doc/tm.Rnw: Minor corrections in the vignette.
176    
177    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
178    
179            * DESCRIPTION: Update to version 0.2, since a lot of new features
180            have been integrated.
181    
182            * inst/stopwords: Updated existing stopwords and added stopwords
183            for various other languages.
184    
185    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
186    
187            * man/: Updated documentation.
188    
189            * Work/testDb.R: Script to test database stuff.
190    
191            * R/: Fixed various database related bugs. Seems to be rather
192            useable now, i.e., consider as alpha status for now.
193    
194    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
195    
196            * R/: Fixed some bugs related to database support.
197    
198    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
199    
200            * man/: Added a lot of examples to the manuals.
201    
202    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
203    
204            * man/: Updated parts of the documentation.
205    
206            * R/textdoccol.R (asPlain): Added conversion from newsgroup
207            documents to plain text documents.
208    
209    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
210    
211            * R/textdoccol.R: Finished experimental database support. Not yet
212            intensively tested.
213    
214            * R/source.R: Now each source has a default reader.
215    
216            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
217            class anymore.
218    
219            * R/plaintextdoc.R: Custom show method for plain text documents.
220    
221            * R/aobjects.R: Added a class for structured text documents.
222    
223            * R/reader.R: Replaced remaining \code{parser} occurrences with
224            \code{reader}.
225    
226            * R/textdoccol.R (summary): Indent tags.
227    
228            * R/textdoccol.R (removePunctuation): Transform method to remove
229            punctuation marks.
230    
231    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
232    
233            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
234            using prescindMeta().
235    
236    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
237    
238            * R/textdoccol.R: Improved database support.
239    
240    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
241    
242            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
243    
244            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
245            language code.
246    
247            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
248            into parserControl argument.
249    
250            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
251    
252    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
253    
254            * Work/tmDataSetup.R: The datasets acq and crude can now be
255            created on the fly.
256    
257            * R/stopwords.R: Introduced a function returning the stopwords for
258            a given language (English, German and French at the moment)
259    
260            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
261            otherwise falls back to Snowball package.
262    
263    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
264    
265            * man/dissimilarity-methods.Rd: Make clear that any method offered
266            by "dists" from package "cba" can be used.
267    
268    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
269    
270            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
271            to Kurt's latex suggestion. Removed points and underscores in
272            variable names for consistent naming.
273    
274            * DESCRIPTION: Update to version 0.1-2.
275    
276            * man/TextRepository.Rd: Fixed bug in documentation.
277    
278    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
279    
280            * DESCRIPTION: Update to version 0.1-1.
281    
282    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
283    
284            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
285            wordStem.
286    
287    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
288    
289            * R/: Changes due to Kurt's review.
290    
291    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
292    
293            * R/: Implemented improvements based upon comments by David
294            Meyer.
295    
296    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
297    
298            * inst/doc/: Rewrote vignette.
299    
300            * man/: Improved documentation.
301    
302    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
303    
304            * man/: Updated documentation.
305    
306            * DESCRIPTION: Changed package name to "tm". Updated version to
307            0.1 for first CRAN release.
308    
309            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
310            list archive example.
311    
312            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
313            archive example.
314    
315            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
316            from (several mails per box) mbox format to (single mail per file)
317            eml format.
318    
319    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
320    
321            * data/crude.rda: Rebuilt.
322    
323            * data/acq.rda: Rebuilt.
324    
325            * R/reader.R: Factored out reader and parser methods from
326            textdoccol.R.
327    
328            * R/source.R: Factored out Source methods from aobjects.R and
329            textdoccol.R.
330            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
331            feeds.
332    
333            * R/textdoccol.R (DirSource): Added support for recursive
334            traversal of directories.
335    
336    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
337    
338            * R/textdoccol.R ([[): Loads the document corpus automatically
339            into memory upon access.
340            (tm_transform, tm_filter): Removed several checks whether the
341            document is already loaded ([[ ensures this now).
342            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
343            mailing list archive.
344    
345    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
346    
347            * R/aobjects.R (TextDocument): Is now a virtual class.
348            (Source): Is now a virtual class.
349    
350    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
351    
352            * R/textdoccol.R (c): Support for an arbitrary number of document
353            collections.
354    
355    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
356    
357            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
358            append_meta and remove_meta.
359    
360            * R/textdoccol.R: Removed modify_metadata method.
361    
362            * R/textrepo.R: Removed modify_metadata method.
363    
364            * R/textdoccol.R (remove_meta): Supports removal of document
365            collection metadata and document (= in data frame) metadata.
366    
367    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
368    
369            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
370    
371            * data/crude.rda: Rebuilt.
372    
373            * data/acq.rda: Rebuilt.
374    
375            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
376    
377            * R/textdoccol.R ([): Bug fix for subsetting a document
378            collection's data frame.
379    
380    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
381    
382            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
383            to s_filter.
384    
385            * R/textdoccol.R: Local text documents' metadata can now be copied
386            to a document collection's data frame with prescind_meta.
387    
388    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
389    
390            * R/: Text documents' slot metadata is now accessible in s_filter.
391    
392            * R/: Rewrote s_filter function (has still some restrictions).
393    
394    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
395    
396            * R/: Various fixes in handling metadata.
397    
398            * R/: Added update mechanism for text document collections.
399    
400    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
401    
402            * R/: Merging of document collections now creates a binary tree
403            for reconstructing merged document collections.
404    
405            * R/: Redesign of metadata for document collections.
406    
407    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
408    
409            * R/: Messages now use \code{ngettext}.
410    
411    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * R/: Added functions for modifying and removing metadata.
414    
415    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
416    
417            * man/: Updated some documentation.
418    
419            * R/: Corrected some connection issues.
420    
421            * inst/doc: Worked on the vignette.
422    
423    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
424    
425            * inst/: Added texts and started vignette.
426    
427            * R/: Final changes based upon David's comments.
428    
429    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
430    
431            * NAMESPACE: Corrected exports (generic methods need exportMethods
432            directives!).
433    
434    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * R/: Modified the TextDocCol constructur and various parsers. It
437            is now modular and supports various file formats via plugins (see
438            the new "Source" class).
439    
440    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
441    
442            * man/: Revised documentation after previous code changes.
443    
444    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
445    
446            * R/: Remaining changes as discussed with David.
447    
448    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
449    
450            * R/: Some changes as suggested by David. The rest will follow
451            within the next days.
452    
453    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * man/: Finished documentation.
456    
457    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
458    
459            * man/: Wrote some documentation.
460    
461    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
462    
463            * R/: Further syntactic sugar in form of additional assignment and
464            accessor methods.
465    
466    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
467    
468            * R/: Syntactic sugar in form of "length", "show" and "summary"
469            operators.
470    
471    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
472    
473            * R/: Diverse updates. Mainly on default operators ("[" or "c")
474            and dissimilarities.
475    
476    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
477    
478            * R/: Added similarity functions.
479    
480            * data/: Added english stopwords.
481    
482    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * data/: Examples compiled for new features
485    
486            * R/: Changes due to new structure.
487    
488            * NAMESPACE: Corrected namespace to reflect new structure.
489    
490            * R/termdocmatrix.R: Adapted for new naming scheme.
491    
492    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * R/textdoccol.R: Adapted code for new class structure. Wrote
495            several transform and filter functions operating on text document
496            collections (alias text document databases).
497    
498            * R/aobjects.R: Adapted class structure with inheritance,
499            repositories and additional meta data. Loading files on demand is
500            now possible.
501    
502    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
503    
504            * R/: Some cosmetic cleanups.
505    
506            * inst/: Removed vignette on clustering. That and much more is now
507            described in the JSS paper on text mining. Based upon that
508            article an elaborated vignette will be incorporated in the future.
509    
510    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
511    
512            * R/: Updated generic S4 methods to comply with signature changes
513            in newer versions of R (> 2.3)
514    
515    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
516    
517            * ext/R/importRIS.R: Automatic RIS import is now possible.
518    
519    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
520    
521            * R/textdoccol.R: Added RIS HTML input format.
522    
523    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
524    
525            * R/textdoccol.R: Removed bug that caused invalid text document
526            collections when handling many input files.
527    
528    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
529    
530            * R/textdoccol.R: Restructured and extended file import
531            mechanism.
532    
533            * inst/doc/clustering.Rnw: Adapted vignette for use with
534            ReutNews.rda
535    
536            * man/ReutNews.Rd: Documentation for ReutNews.rda
537    
538            * data/ReutNews.rda: A tiny Reuters21578 example data set.
539    
540    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
541    
542            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
543            clustering facilities of this package.
544    
545    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
546    
547            * R/aobjects.R: Changed package document structure to avoid class
548            dependency problems.
549    
550    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
551    
552            * Wrote a script for the ModLewis Split for the Reuters-21578 XML
553            data set.
554    
555            * Finished documentation and reordered directory structure. Now "R
556            CMD check textmin" works without errors.
557    
558    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
559    
560            * src/: Various splits can now be easily created for the
561            Reuters21578 data set.
562    
563    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
564    
565            * Updated documentation
566    
567    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
568    
569            * Wrote R documentation for some classes and methods.
570    
571    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
572    
573            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
574            files. See the questionnaire data/Umfrage.csv for such an example.
575            We are now able to import files in Reuters-21578 XML format.
576    
577            * Changed class interfaces in various files. Weighting of the text
578            matrix is now possible.
579    
580  2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
581    
582          * R/textdoccol.R: One can build term-document matrices if          * R/textdoccol.R: One can build term-document matrices if

Legend:
Removed from v.20  
changed lines
  Added in v.795

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge