SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC trunk/tm/ChangeLog revision 801, Sat Dec 1 09:27:24 2007 UTC
# Line 1  Line 1 
1    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/aobjects.R (showMeta): Added method for pretty printing a
4            text document's meta data.
5    
6    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
7    
8            * R/textdoccol.R (TextDocCol): Better handling of empty
9            arguments.
10    
11            * NAMESPACE: Exported readDOC.
12    
13            * man/completeStems.Rd: Added an example.
14    
15    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
16    
17            * R/stopwords.R (stopwords): Look up .dat files at every
18            call. Allows users to modify stopword .dat files interactively.
19    
20    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
21    
22            * R/termdocmatrix.R (termFreq): Correct processing of empty
23            documents.
24    
25    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
26    
27            * man/: Updated documentation.
28    
29    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
30    
31            * R/complete.R (completeStems): Completes (heuristically) word
32            stems.
33    
34            * R/termdocmatrix.R (TermDocMatrix2): New modular
35            constructor.
36    
37            * NAMESPACE: Exported termFreq.
38    
39    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
40    
41            * R/reader.R (readDOC): Added MS Word reader (using antiword).
42    
43    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
44    
45            * R/weight.R: Weighting functions for TermDocMatrix.
46    
47    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
48    
49            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
50            functions for accessing dimension, column, and row names.
51    
52            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
53    
54    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
55    
56            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
57    
58    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
59    
60            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
61    
62    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
63    
64            * R/reader.R (readPDF): Removed manual checks for pdftotext and
65            pdfinfo. The system call gives a warning anyway.
66    
67    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
68    
69            * R/textdoccol.R (asPlain): Conversion from
70            StructuredTextDocuments to PlainTextDocuments.
71    
72    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
73    
74            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
75            for accessing term-document matrices.
76    
77            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
78            are installed.
79    
80    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
81    
82            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
83            Christian Buchta.
84    
85    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
86    
87            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
88    
89    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
90    
91            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
92    
93            * R/reader.R (readPDF): Added PDF reader.
94    
95    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
96    
97            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
98    
99            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
100    
101            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
102    
103            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
104    
105    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
106    
107            * R/distmeasure.R (dissimilarity): Replaced dists call from
108            package cba by new dist call from package proxy.
109    
110    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
111    
112            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
113    
114    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
115    
116            * R/termdocmatrix.R: require() uses the quietly option to suppress
117            loading messages.
118    
119    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
120    
121            * R/dictionary.R: Added dictionary support.
122    
123    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
124    
125            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
126            documents. This simplifies some functions, e.g., asPlain.
127    
128    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
129    
130            * inst/doc/tm.Rnw: Fixed some typos in vignette.
131    
132    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
133    
134            * R/textdoccol.R (replaceWords): Added method to replace a set of
135            words by a single word. Useful for synonyms.
136    
137    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
138    
139            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
140    
141    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
142    
143            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
144            vectors. Thanks to Ariel Maguyon for his error report.
145            (removeSparseTerms): New function to remove columns from a
146            term-document matrix exceeding a sparse factor.
147    
148    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
149    
150            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
151    
152    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
153    
154            * man/sFilter.Rd: Corrected documentation on statement format (use
155            '==' instead of '=').
156    
157    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
158    
159            * R/aobjects.R (StructuredTextDocument): Inherits from
160            TextDocument.
161    
162    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
163    
164            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
165            on sparse matrices as proposed by Martin Maechler.
166    
167    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
168    
169            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
170            \pkg{filehash} version makes them deprecated.
171    
172    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
173    
174            * R/termdocmatrix.R (textvector): Stemming is now performed before
175            erasing stopwords.
176            (weightMatrix): Adapted to handle sparse matrices.
177            (TermDocMatrix): Sparse matrix is now efficiently built by
178            direct stepwise insertion of row values into it.
179    
180    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
181    
182            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
183            due to ongoing problems. For our purposes the latter is as useful
184            as the replaced package.
185    
186    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
187    
188            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
189    
190            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
191    
192    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
193    
194            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
195            languages with available stopwords.
196    
197    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
198    
199            * inst/doc/tm.Rnw: Minor corrections in the vignette.
200    
201    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
202    
203            * DESCRIPTION: Update to version 0.2, since a lot of new features
204            have been integrated.
205    
206            * inst/stopwords: Updated existing stopwords and added stopwords
207            for various other languages.
208    
209    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
210    
211            * man/: Updated documentation.
212    
213            * Work/testDb.R: Script to test database stuff.
214    
215            * R/: Fixed various database related bugs. Seems to be rather
216            useable now, i.e., consider as alpha status for now.
217    
218    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
219    
220            * R/: Fixed some bugs related to database support.
221    
222    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
223    
224            * man/: Added a lot of examples to the manuals.
225    
226    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
227    
228            * man/: Updated parts of the documentation.
229    
230            * R/textdoccol.R (asPlain): Added conversion from newsgroup
231            documents to plain text documents.
232    
233    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
234    
235            * R/textdoccol.R: Finished experimental database support. Not yet
236            intensively tested.
237    
238            * R/source.R: Now each source has a default reader.
239    
240            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
241            class anymore.
242    
243            * R/plaintextdoc.R: Custom show method for plain text documents.
244    
245            * R/aobjects.R: Added a class for structured text documents.
246    
247            * R/reader.R: Replaced remaining \code{parser} occurrences with
248            \code{reader}.
249    
250            * R/textdoccol.R (summary): Indent tags.
251    
252            * R/textdoccol.R (removePunctuation): Transform method to remove
253            punctuation marks.
254    
255    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
256    
257            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
258            using prescindMeta().
259    
260    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
261    
262            * R/textdoccol.R: Improved database support.
263    
264    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
265    
266            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
267    
268            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
269            language code.
270    
271            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
272            into parserControl argument.
273    
274            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
275    
276    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
277    
278            * Work/tmDataSetup.R: The datasets acq and crude can now be
279            created on the fly.
280    
281            * R/stopwords.R: Introduced a function returning the stopwords for
282            a given language (English, German and French at the moment)
283    
284            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
285            otherwise falls back to Snowball package.
286    
287    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
288    
289            * man/dissimilarity-methods.Rd: Make clear that any method offered
290            by "dists" from package "cba" can be used.
291    
292    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
293    
294            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
295            to Kurt's latex suggestion. Removed points and underscores in
296            variable names for consistent naming.
297    
298            * DESCRIPTION: Update to version 0.1-2.
299    
300            * man/TextRepository.Rd: Fixed bug in documentation.
301    
302    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
303    
304            * DESCRIPTION: Update to version 0.1-1.
305    
306    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
307    
308            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
309            wordStem.
310    
311    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
312    
313            * R/: Changes due to Kurt's review.
314    
315    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
316    
317            * R/: Implemented improvements based upon comments by David
318            Meyer.
319    
320    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
321    
322            * inst/doc/: Rewrote vignette.
323    
324            * man/: Improved documentation.
325    
326    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
327    
328            * man/: Updated documentation.
329    
330            * DESCRIPTION: Changed package name to "tm". Updated version to
331            0.1 for first CRAN release.
332    
333            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
334            list archive example.
335    
336            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
337            archive example.
338    
339            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
340            from (several mails per box) mbox format to (single mail per file)
341            eml format.
342    
343    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
344    
345            * data/crude.rda: Rebuilt.
346    
347            * data/acq.rda: Rebuilt.
348    
349            * R/reader.R: Factored out reader and parser methods from
350            textdoccol.R.
351    
352            * R/source.R: Factored out Source methods from aobjects.R and
353            textdoccol.R.
354            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
355            feeds.
356    
357            * R/textdoccol.R (DirSource): Added support for recursive
358            traversal of directories.
359    
360    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
361    
362            * R/textdoccol.R ([[): Loads the document corpus automatically
363            into memory upon access.
364            (tm_transform, tm_filter): Removed several checks whether the
365            document is already loaded ([[ ensures this now).
366            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
367            mailing list archive.
368    
369    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
370    
371            * R/aobjects.R (TextDocument): Is now a virtual class.
372            (Source): Is now a virtual class.
373    
374    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
375    
376            * R/textdoccol.R (c): Support for an arbitrary number of document
377            collections.
378    
379    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
380    
381            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
382            append_meta and remove_meta.
383    
384            * R/textdoccol.R: Removed modify_metadata method.
385    
386            * R/textrepo.R: Removed modify_metadata method.
387    
388            * R/textdoccol.R (remove_meta): Supports removal of document
389            collection metadata and document (= in data frame) metadata.
390    
391    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
392    
393            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
394    
395            * data/crude.rda: Rebuilt.
396    
397            * data/acq.rda: Rebuilt.
398    
399            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
400    
401            * R/textdoccol.R ([): Bug fix for subsetting a document
402            collection's data frame.
403    
404    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
405    
406            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
407            to s_filter.
408    
409            * R/textdoccol.R: Local text documents' metadata can now be copied
410            to a document collection's data frame with prescind_meta.
411    
412    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
413    
414            * R/: Text documents' slot metadata is now accessible in s_filter.
415    
416            * R/: Rewrote s_filter function (has still some restrictions).
417    
418    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
419    
420            * R/: Various fixes in handling metadata.
421    
422            * R/: Added update mechanism for text document collections.
423    
424    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
425    
426            * R/: Merging of document collections now creates a binary tree
427            for reconstructing merged document collections.
428    
429            * R/: Redesign of metadata for document collections.
430    
431    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
432    
433            * R/: Messages now use \code{ngettext}.
434    
435    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
436    
437            * R/: Added functions for modifying and removing metadata.
438    
439    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * man/: Updated some documentation.
442    
443            * R/: Corrected some connection issues.
444    
445            * inst/doc: Worked on the vignette.
446    
447    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
448    
449            * inst/: Added texts and started vignette.
450    
451            * R/: Final changes based upon David's comments.
452    
453    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * NAMESPACE: Corrected exports (generic methods need exportMethods
456            directives!).
457    
458    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/: Modified the TextDocCol constructur and various parsers. It
461            is now modular and supports various file formats via plugins (see
462            the new "Source" class).
463    
464    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
465    
466            * man/: Revised documentation after previous code changes.
467    
468    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/: Remaining changes as discussed with David.
471    
472    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
473    
474            * R/: Some changes as suggested by David. The rest will follow
475            within the next days.
476    
477    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
478    
479            * man/: Finished documentation.
480    
481    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * man/: Wrote some documentation.
484    
485    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
486    
487            * R/: Further syntactic sugar in form of additional assignment and
488            accessor methods.
489    
490    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
491    
492            * R/: Syntactic sugar in form of "length", "show" and "summary"
493            operators.
494    
495    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497            * R/: Diverse updates. Mainly on default operators ("[" or "c")
498            and dissimilarities.
499    
500    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
501    
502            * R/: Added similarity functions.
503    
504            * data/: Added english stopwords.
505    
506    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * data/: Examples compiled for new features
509    
510            * R/: Changes due to new structure.
511    
512            * NAMESPACE: Corrected namespace to reflect new structure.
513    
514            * R/termdocmatrix.R: Adapted for new naming scheme.
515    
516    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/textdoccol.R: Adapted code for new class structure. Wrote
519            several transform and filter functions operating on text document
520            collections (alias text document databases).
521    
522            * R/aobjects.R: Adapted class structure with inheritance,
523            repositories and additional meta data. Loading files on demand is
524            now possible.
525    
526    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/: Some cosmetic cleanups.
529    
530            * inst/: Removed vignette on clustering. That and much more is now
531            described in the JSS paper on text mining. Based upon that
532            article an elaborated vignette will be incorporated in the future.
533    
534    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
535    
536            * R/: Updated generic S4 methods to comply with signature changes
537            in newer versions of R (> 2.3)
538    
539    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
540    
541            * ext/R/importRIS.R: Automatic RIS import is now possible.
542    
543    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * R/textdoccol.R: Added RIS HTML input format.
546    
547    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
548    
549            * R/textdoccol.R: Removed bug that caused invalid text document
550            collections when handling many input files.
551    
552    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
553    
554            * R/textdoccol.R: Restructured and extended file import
555            mechanism.
556    
557            * inst/doc/clustering.Rnw: Adapted vignette for use with
558            ReutNews.rda
559    
560            * man/ReutNews.Rd: Documentation for ReutNews.rda
561    
562            * data/ReutNews.rda: A tiny Reuters21578 example data set.
563    
564    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
565    
566            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
567            clustering facilities of this package.
568    
569    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
570    
571            * R/aobjects.R: Changed package document structure to avoid class
572            dependency problems.
573    
574  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
575    
576            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
577            data set.
578    
579          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
580          CMD check textmin" works without errors.          CMD check textmin" works without errors.
581    

Legend:
Removed from v.28  
changed lines
  Added in v.801

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge