SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC trunk/tm/ChangeLog revision 802, Sun Dec 2 09:28:41 2007 UTC
# Line 1  Line 1 
1    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/stopwords.R (stopwords): Shortened code, removed codetools
4            variable warnings.
5    
6            * man/: Documentation for showMeta, added an example for tmMap.
7    
8            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
9            some minor typos fixed.
10    
11    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
12    
13            * R/aobjects.R (showMeta): Added method for pretty printing a
14            text document's meta data.
15    
16    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
17    
18            * R/textdoccol.R (TextDocCol): Better handling of empty
19            arguments.
20    
21            * NAMESPACE: Exported readDOC.
22    
23            * man/completeStems.Rd: Added an example.
24    
25    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
26    
27            * R/stopwords.R (stopwords): Look up .dat files at every
28            call. Allows users to modify stopword .dat files interactively.
29    
30    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
31    
32            * R/termdocmatrix.R (termFreq): Correct processing of empty
33            documents.
34    
35    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
36    
37            * man/: Updated documentation.
38    
39    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
40    
41            * R/complete.R (completeStems): Completes (heuristically) word
42            stems.
43    
44            * R/termdocmatrix.R (TermDocMatrix2): New modular
45            constructor.
46    
47            * NAMESPACE: Exported termFreq.
48    
49    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
50    
51            * R/reader.R (readDOC): Added MS Word reader (using antiword).
52    
53    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
54    
55            * R/weight.R: Weighting functions for TermDocMatrix.
56    
57    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
58    
59            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
60            functions for accessing dimension, column, and row names.
61    
62            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
63    
64    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
65    
66            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
67    
68    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
69    
70            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
71    
72    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
73    
74            * R/reader.R (readPDF): Removed manual checks for pdftotext and
75            pdfinfo. The system call gives a warning anyway.
76    
77    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
78    
79            * R/textdoccol.R (asPlain): Conversion from
80            StructuredTextDocuments to PlainTextDocuments.
81    
82    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
83    
84            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
85            for accessing term-document matrices.
86    
87            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
88            are installed.
89    
90    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
91    
92            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
93            Christian Buchta.
94    
95    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
96    
97            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
98    
99    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
100    
101            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
102    
103            * R/reader.R (readPDF): Added PDF reader.
104    
105    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
106    
107            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
108    
109            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
110    
111            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
112    
113            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
114    
115    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
116    
117            * R/distmeasure.R (dissimilarity): Replaced dists call from
118            package cba by new dist call from package proxy.
119    
120    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
121    
122            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
123    
124    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
125    
126            * R/termdocmatrix.R: require() uses the quietly option to suppress
127            loading messages.
128    
129    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
130    
131            * R/dictionary.R: Added dictionary support.
132    
133    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
134    
135            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
136            documents. This simplifies some functions, e.g., asPlain.
137    
138    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
139    
140            * inst/doc/tm.Rnw: Fixed some typos in vignette.
141    
142    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
143    
144            * R/textdoccol.R (replaceWords): Added method to replace a set of
145            words by a single word. Useful for synonyms.
146    
147    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
148    
149            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
150    
151    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
152    
153            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
154            vectors. Thanks to Ariel Maguyon for his error report.
155            (removeSparseTerms): New function to remove columns from a
156            term-document matrix exceeding a sparse factor.
157    
158    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
159    
160            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
161    
162    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
163    
164            * man/sFilter.Rd: Corrected documentation on statement format (use
165            '==' instead of '=').
166    
167    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
168    
169            * R/aobjects.R (StructuredTextDocument): Inherits from
170            TextDocument.
171    
172    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
173    
174            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
175            on sparse matrices as proposed by Martin Maechler.
176    
177    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
178    
179            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
180            \pkg{filehash} version makes them deprecated.
181    
182    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
183    
184            * R/termdocmatrix.R (textvector): Stemming is now performed before
185            erasing stopwords.
186            (weightMatrix): Adapted to handle sparse matrices.
187            (TermDocMatrix): Sparse matrix is now efficiently built by
188            direct stepwise insertion of row values into it.
189    
190    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
191    
192            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
193            due to ongoing problems. For our purposes the latter is as useful
194            as the replaced package.
195    
196    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
197    
198            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
199    
200            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
201    
202    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
203    
204            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
205            languages with available stopwords.
206    
207    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
208    
209            * inst/doc/tm.Rnw: Minor corrections in the vignette.
210    
211    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
212    
213            * DESCRIPTION: Update to version 0.2, since a lot of new features
214            have been integrated.
215    
216            * inst/stopwords: Updated existing stopwords and added stopwords
217            for various other languages.
218    
219    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
220    
221            * man/: Updated documentation.
222    
223            * Work/testDb.R: Script to test database stuff.
224    
225            * R/: Fixed various database related bugs. Seems to be rather
226            useable now, i.e., consider as alpha status for now.
227    
228    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
229    
230            * R/: Fixed some bugs related to database support.
231    
232    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
233    
234            * man/: Added a lot of examples to the manuals.
235    
236    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
237    
238            * man/: Updated parts of the documentation.
239    
240            * R/textdoccol.R (asPlain): Added conversion from newsgroup
241            documents to plain text documents.
242    
243    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
244    
245            * R/textdoccol.R: Finished experimental database support. Not yet
246            intensively tested.
247    
248            * R/source.R: Now each source has a default reader.
249    
250            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
251            class anymore.
252    
253            * R/plaintextdoc.R: Custom show method for plain text documents.
254    
255            * R/aobjects.R: Added a class for structured text documents.
256    
257            * R/reader.R: Replaced remaining \code{parser} occurrences with
258            \code{reader}.
259    
260            * R/textdoccol.R (summary): Indent tags.
261    
262            * R/textdoccol.R (removePunctuation): Transform method to remove
263            punctuation marks.
264    
265    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
266    
267            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
268            using prescindMeta().
269    
270    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
271    
272            * R/textdoccol.R: Improved database support.
273    
274    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
275    
276            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
277    
278            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
279            language code.
280    
281            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
282            into parserControl argument.
283    
284            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
285    
286    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
287    
288            * Work/tmDataSetup.R: The datasets acq and crude can now be
289            created on the fly.
290    
291            * R/stopwords.R: Introduced a function returning the stopwords for
292            a given language (English, German and French at the moment)
293    
294            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
295            otherwise falls back to Snowball package.
296    
297    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
298    
299            * man/dissimilarity-methods.Rd: Make clear that any method offered
300            by "dists" from package "cba" can be used.
301    
302    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
303    
304            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
305            to Kurt's latex suggestion. Removed points and underscores in
306            variable names for consistent naming.
307    
308            * DESCRIPTION: Update to version 0.1-2.
309    
310            * man/TextRepository.Rd: Fixed bug in documentation.
311    
312    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
313    
314            * DESCRIPTION: Update to version 0.1-1.
315    
316    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
317    
318            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
319            wordStem.
320    
321    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
322    
323            * R/: Changes due to Kurt's review.
324    
325    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
326    
327            * R/: Implemented improvements based upon comments by David
328            Meyer.
329    
330    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
331    
332            * inst/doc/: Rewrote vignette.
333    
334            * man/: Improved documentation.
335    
336    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
337    
338            * man/: Updated documentation.
339    
340            * DESCRIPTION: Changed package name to "tm". Updated version to
341            0.1 for first CRAN release.
342    
343            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
344            list archive example.
345    
346            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
347            archive example.
348    
349            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
350            from (several mails per box) mbox format to (single mail per file)
351            eml format.
352    
353    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
354    
355            * data/crude.rda: Rebuilt.
356    
357            * data/acq.rda: Rebuilt.
358    
359            * R/reader.R: Factored out reader and parser methods from
360            textdoccol.R.
361    
362            * R/source.R: Factored out Source methods from aobjects.R and
363            textdoccol.R.
364            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
365            feeds.
366    
367            * R/textdoccol.R (DirSource): Added support for recursive
368            traversal of directories.
369    
370    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
371    
372            * R/textdoccol.R ([[): Loads the document corpus automatically
373            into memory upon access.
374            (tm_transform, tm_filter): Removed several checks whether the
375            document is already loaded ([[ ensures this now).
376            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
377            mailing list archive.
378    
379    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
380    
381            * R/aobjects.R (TextDocument): Is now a virtual class.
382            (Source): Is now a virtual class.
383    
384    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
385    
386            * R/textdoccol.R (c): Support for an arbitrary number of document
387            collections.
388    
389    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
390    
391            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
392            append_meta and remove_meta.
393    
394            * R/textdoccol.R: Removed modify_metadata method.
395    
396            * R/textrepo.R: Removed modify_metadata method.
397    
398            * R/textdoccol.R (remove_meta): Supports removal of document
399            collection metadata and document (= in data frame) metadata.
400    
401    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
404    
405            * data/crude.rda: Rebuilt.
406    
407            * data/acq.rda: Rebuilt.
408    
409            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
410    
411            * R/textdoccol.R ([): Bug fix for subsetting a document
412            collection's data frame.
413    
414    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
415    
416            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
417            to s_filter.
418    
419            * R/textdoccol.R: Local text documents' metadata can now be copied
420            to a document collection's data frame with prescind_meta.
421    
422    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
423    
424            * R/: Text documents' slot metadata is now accessible in s_filter.
425    
426            * R/: Rewrote s_filter function (has still some restrictions).
427    
428    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
429    
430            * R/: Various fixes in handling metadata.
431    
432            * R/: Added update mechanism for text document collections.
433    
434    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * R/: Merging of document collections now creates a binary tree
437            for reconstructing merged document collections.
438    
439            * R/: Redesign of metadata for document collections.
440    
441    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
442    
443            * R/: Messages now use \code{ngettext}.
444    
445    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
446    
447            * R/: Added functions for modifying and removing metadata.
448    
449    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
450    
451            * man/: Updated some documentation.
452    
453            * R/: Corrected some connection issues.
454    
455            * inst/doc: Worked on the vignette.
456    
457    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
458    
459            * inst/: Added texts and started vignette.
460    
461            * R/: Final changes based upon David's comments.
462    
463    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * NAMESPACE: Corrected exports (generic methods need exportMethods
466            directives!).
467    
468    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/: Modified the TextDocCol constructur and various parsers. It
471            is now modular and supports various file formats via plugins (see
472            the new "Source" class).
473    
474    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
475    
476            * man/: Revised documentation after previous code changes.
477    
478    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * R/: Remaining changes as discussed with David.
481    
482    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * R/: Some changes as suggested by David. The rest will follow
485            within the next days.
486    
487    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
488    
489            * man/: Finished documentation.
490    
491    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
492    
493            * man/: Wrote some documentation.
494    
495    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497            * R/: Further syntactic sugar in form of additional assignment and
498            accessor methods.
499    
500    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
501    
502            * R/: Syntactic sugar in form of "length", "show" and "summary"
503            operators.
504    
505    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
506    
507            * R/: Diverse updates. Mainly on default operators ("[" or "c")
508            and dissimilarities.
509    
510    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
511    
512            * R/: Added similarity functions.
513    
514            * data/: Added english stopwords.
515    
516    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * data/: Examples compiled for new features
519    
520            * R/: Changes due to new structure.
521    
522            * NAMESPACE: Corrected namespace to reflect new structure.
523    
524            * R/termdocmatrix.R: Adapted for new naming scheme.
525    
526    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/textdoccol.R: Adapted code for new class structure. Wrote
529            several transform and filter functions operating on text document
530            collections (alias text document databases).
531    
532            * R/aobjects.R: Adapted class structure with inheritance,
533            repositories and additional meta data. Loading files on demand is
534            now possible.
535    
536    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
537    
538            * R/: Some cosmetic cleanups.
539    
540            * inst/: Removed vignette on clustering. That and much more is now
541            described in the JSS paper on text mining. Based upon that
542            article an elaborated vignette will be incorporated in the future.
543    
544    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
545    
546            * R/: Updated generic S4 methods to comply with signature changes
547            in newer versions of R (> 2.3)
548    
549    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
550    
551            * ext/R/importRIS.R: Automatic RIS import is now possible.
552    
553    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
554    
555            * R/textdoccol.R: Added RIS HTML input format.
556    
557    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
558    
559            * R/textdoccol.R: Removed bug that caused invalid text document
560            collections when handling many input files.
561    
562    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
563    
564            * R/textdoccol.R: Restructured and extended file import
565            mechanism.
566    
567            * inst/doc/clustering.Rnw: Adapted vignette for use with
568            ReutNews.rda
569    
570            * man/ReutNews.Rd: Documentation for ReutNews.rda
571    
572            * data/ReutNews.rda: A tiny Reuters21578 example data set.
573    
574    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
575    
576            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
577            clustering facilities of this package.
578    
579    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
580    
581            * R/aobjects.R: Changed package document structure to avoid class
582            dependency problems.
583    
584  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
585    
586            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
587            data set.
588    
589          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
590          CMD check textmin" works without errors.          CMD check textmin" works without errors.
591    

Legend:
Removed from v.28  
changed lines
  Added in v.802

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge