SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC trunk/tm/ChangeLog revision 808, Sun Jan 13 16:18:27 2008 UTC
# Line 1  Line 1 
1    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
4            selection when no reader argument is given.
5    
6    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
7    
8            * R/source.R (CSVSource): Now uses read.csv instead of scan
9            internally.
10    
11    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
12    
13            * R/reader.R (getReaders): Returns available reader functions.
14    
15            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
16            as default.
17    
18    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
19    
20            * R/stopwords.R (stopwords): Shortened code, removed codetools
21            variable warnings.
22    
23            * man/: Documentation for showMeta, added an example for tmMap.
24    
25            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
26            some minor typos fixed.
27    
28    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
29    
30            * R/aobjects.R (showMeta): Added method for pretty printing a
31            text document's meta data.
32    
33    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
34    
35            * R/textdoccol.R (TextDocCol): Better handling of empty
36            arguments.
37    
38            * NAMESPACE: Exported readDOC.
39    
40            * man/completeStems.Rd: Added an example.
41    
42    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
43    
44            * R/stopwords.R (stopwords): Look up .dat files at every
45            call. Allows users to modify stopword .dat files interactively.
46    
47    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
48    
49            * R/termdocmatrix.R (termFreq): Correct processing of empty
50            documents.
51    
52    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
53    
54            * man/: Updated documentation.
55    
56    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
57    
58            * R/complete.R (completeStems): Completes (heuristically) word
59            stems.
60    
61            * R/termdocmatrix.R (TermDocMatrix2): New modular
62            constructor.
63    
64            * NAMESPACE: Exported termFreq.
65    
66    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
67    
68            * R/reader.R (readDOC): Added MS Word reader (using antiword).
69    
70    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
71    
72            * R/weight.R: Weighting functions for TermDocMatrix.
73    
74    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
75    
76            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
77            functions for accessing dimension, column, and row names.
78    
79            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
80    
81    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
82    
83            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
84    
85    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
86    
87            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
88    
89    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
90    
91            * R/reader.R (readPDF): Removed manual checks for pdftotext and
92            pdfinfo. The system call gives a warning anyway.
93    
94    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
95    
96            * R/textdoccol.R (asPlain): Conversion from
97            StructuredTextDocuments to PlainTextDocuments.
98    
99    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
100    
101            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
102            for accessing term-document matrices.
103    
104            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
105            are installed.
106    
107    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
108    
109            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
110            Christian Buchta.
111    
112    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
113    
114            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
115    
116    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
117    
118            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
119    
120            * R/reader.R (readPDF): Added PDF reader.
121    
122    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
123    
124            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
125    
126            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
127    
128            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
129    
130            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
131    
132    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
133    
134            * R/distmeasure.R (dissimilarity): Replaced dists call from
135            package cba by new dist call from package proxy.
136    
137    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
138    
139            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
140    
141    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
142    
143            * R/termdocmatrix.R: require() uses the quietly option to suppress
144            loading messages.
145    
146    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
147    
148            * R/dictionary.R: Added dictionary support.
149    
150    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
151    
152            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
153            documents. This simplifies some functions, e.g., asPlain.
154    
155    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
156    
157            * inst/doc/tm.Rnw: Fixed some typos in vignette.
158    
159    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
160    
161            * R/textdoccol.R (replaceWords): Added method to replace a set of
162            words by a single word. Useful for synonyms.
163    
164    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
165    
166            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
167    
168    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
169    
170            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
171            vectors. Thanks to Ariel Maguyon for his error report.
172            (removeSparseTerms): New function to remove columns from a
173            term-document matrix exceeding a sparse factor.
174    
175    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
176    
177            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
178    
179    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
180    
181            * man/sFilter.Rd: Corrected documentation on statement format (use
182            '==' instead of '=').
183    
184    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
185    
186            * R/aobjects.R (StructuredTextDocument): Inherits from
187            TextDocument.
188    
189    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
190    
191            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
192            on sparse matrices as proposed by Martin Maechler.
193    
194    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
195    
196            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
197            \pkg{filehash} version makes them deprecated.
198    
199    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
200    
201            * R/termdocmatrix.R (textvector): Stemming is now performed before
202            erasing stopwords.
203            (weightMatrix): Adapted to handle sparse matrices.
204            (TermDocMatrix): Sparse matrix is now efficiently built by
205            direct stepwise insertion of row values into it.
206    
207    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
208    
209            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
210            due to ongoing problems. For our purposes the latter is as useful
211            as the replaced package.
212    
213    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
214    
215            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
216    
217            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
218    
219    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
220    
221            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
222            languages with available stopwords.
223    
224    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
225    
226            * inst/doc/tm.Rnw: Minor corrections in the vignette.
227    
228    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
229    
230            * DESCRIPTION: Update to version 0.2, since a lot of new features
231            have been integrated.
232    
233            * inst/stopwords: Updated existing stopwords and added stopwords
234            for various other languages.
235    
236    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
237    
238            * man/: Updated documentation.
239    
240            * Work/testDb.R: Script to test database stuff.
241    
242            * R/: Fixed various database related bugs. Seems to be rather
243            useable now, i.e., consider as alpha status for now.
244    
245    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
246    
247            * R/: Fixed some bugs related to database support.
248    
249    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
250    
251            * man/: Added a lot of examples to the manuals.
252    
253    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
254    
255            * man/: Updated parts of the documentation.
256    
257            * R/textdoccol.R (asPlain): Added conversion from newsgroup
258            documents to plain text documents.
259    
260    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
261    
262            * R/textdoccol.R: Finished experimental database support. Not yet
263            intensively tested.
264    
265            * R/source.R: Now each source has a default reader.
266    
267            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
268            class anymore.
269    
270            * R/plaintextdoc.R: Custom show method for plain text documents.
271    
272            * R/aobjects.R: Added a class for structured text documents.
273    
274            * R/reader.R: Replaced remaining \code{parser} occurrences with
275            \code{reader}.
276    
277            * R/textdoccol.R (summary): Indent tags.
278    
279            * R/textdoccol.R (removePunctuation): Transform method to remove
280            punctuation marks.
281    
282    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
283    
284            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
285            using prescindMeta().
286    
287    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
288    
289            * R/textdoccol.R: Improved database support.
290    
291    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
292    
293            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
294    
295            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
296            language code.
297    
298            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
299            into parserControl argument.
300    
301            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
302    
303    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
304    
305            * Work/tmDataSetup.R: The datasets acq and crude can now be
306            created on the fly.
307    
308            * R/stopwords.R: Introduced a function returning the stopwords for
309            a given language (English, German and French at the moment)
310    
311            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
312            otherwise falls back to Snowball package.
313    
314    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
315    
316            * man/dissimilarity-methods.Rd: Make clear that any method offered
317            by "dists" from package "cba" can be used.
318    
319    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
320    
321            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
322            to Kurt's latex suggestion. Removed points and underscores in
323            variable names for consistent naming.
324    
325            * DESCRIPTION: Update to version 0.1-2.
326    
327            * man/TextRepository.Rd: Fixed bug in documentation.
328    
329    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
330    
331            * DESCRIPTION: Update to version 0.1-1.
332    
333    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
334    
335            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
336            wordStem.
337    
338    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
339    
340            * R/: Changes due to Kurt's review.
341    
342    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
343    
344            * R/: Implemented improvements based upon comments by David
345            Meyer.
346    
347    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
348    
349            * inst/doc/: Rewrote vignette.
350    
351            * man/: Improved documentation.
352    
353    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
354    
355            * man/: Updated documentation.
356    
357            * DESCRIPTION: Changed package name to "tm". Updated version to
358            0.1 for first CRAN release.
359    
360            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
361            list archive example.
362    
363            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
364            archive example.
365    
366            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
367            from (several mails per box) mbox format to (single mail per file)
368            eml format.
369    
370    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
371    
372            * data/crude.rda: Rebuilt.
373    
374            * data/acq.rda: Rebuilt.
375    
376            * R/reader.R: Factored out reader and parser methods from
377            textdoccol.R.
378    
379            * R/source.R: Factored out Source methods from aobjects.R and
380            textdoccol.R.
381            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
382            feeds.
383    
384            * R/textdoccol.R (DirSource): Added support for recursive
385            traversal of directories.
386    
387    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
388    
389            * R/textdoccol.R ([[): Loads the document corpus automatically
390            into memory upon access.
391            (tm_transform, tm_filter): Removed several checks whether the
392            document is already loaded ([[ ensures this now).
393            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
394            mailing list archive.
395    
396    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
397    
398            * R/aobjects.R (TextDocument): Is now a virtual class.
399            (Source): Is now a virtual class.
400    
401    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * R/textdoccol.R (c): Support for an arbitrary number of document
404            collections.
405    
406    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
407    
408            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
409            append_meta and remove_meta.
410    
411            * R/textdoccol.R: Removed modify_metadata method.
412    
413            * R/textrepo.R: Removed modify_metadata method.
414    
415            * R/textdoccol.R (remove_meta): Supports removal of document
416            collection metadata and document (= in data frame) metadata.
417    
418    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
419    
420            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
421    
422            * data/crude.rda: Rebuilt.
423    
424            * data/acq.rda: Rebuilt.
425    
426            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
427    
428            * R/textdoccol.R ([): Bug fix for subsetting a document
429            collection's data frame.
430    
431    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
432    
433            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
434            to s_filter.
435    
436            * R/textdoccol.R: Local text documents' metadata can now be copied
437            to a document collection's data frame with prescind_meta.
438    
439    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * R/: Text documents' slot metadata is now accessible in s_filter.
442    
443            * R/: Rewrote s_filter function (has still some restrictions).
444    
445    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
446    
447            * R/: Various fixes in handling metadata.
448    
449            * R/: Added update mechanism for text document collections.
450    
451    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
452    
453            * R/: Merging of document collections now creates a binary tree
454            for reconstructing merged document collections.
455    
456            * R/: Redesign of metadata for document collections.
457    
458    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/: Messages now use \code{ngettext}.
461    
462    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
463    
464            * R/: Added functions for modifying and removing metadata.
465    
466    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
467    
468            * man/: Updated some documentation.
469    
470            * R/: Corrected some connection issues.
471    
472            * inst/doc: Worked on the vignette.
473    
474    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
475    
476            * inst/: Added texts and started vignette.
477    
478            * R/: Final changes based upon David's comments.
479    
480    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
481    
482            * NAMESPACE: Corrected exports (generic methods need exportMethods
483            directives!).
484    
485    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
486    
487            * R/: Modified the TextDocCol constructur and various parsers. It
488            is now modular and supports various file formats via plugins (see
489            the new "Source" class).
490    
491    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
492    
493            * man/: Revised documentation after previous code changes.
494    
495    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497            * R/: Remaining changes as discussed with David.
498    
499    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
500    
501            * R/: Some changes as suggested by David. The rest will follow
502            within the next days.
503    
504    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
505    
506            * man/: Finished documentation.
507    
508    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * man/: Wrote some documentation.
511    
512    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
513    
514            * R/: Further syntactic sugar in form of additional assignment and
515            accessor methods.
516    
517    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * R/: Syntactic sugar in form of "length", "show" and "summary"
520            operators.
521    
522    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
523    
524            * R/: Diverse updates. Mainly on default operators ("[" or "c")
525            and dissimilarities.
526    
527    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
528    
529            * R/: Added similarity functions.
530    
531            * data/: Added english stopwords.
532    
533    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
534    
535            * data/: Examples compiled for new features
536    
537            * R/: Changes due to new structure.
538    
539            * NAMESPACE: Corrected namespace to reflect new structure.
540    
541            * R/termdocmatrix.R: Adapted for new naming scheme.
542    
543    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * R/textdoccol.R: Adapted code for new class structure. Wrote
546            several transform and filter functions operating on text document
547            collections (alias text document databases).
548    
549            * R/aobjects.R: Adapted class structure with inheritance,
550            repositories and additional meta data. Loading files on demand is
551            now possible.
552    
553    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
554    
555            * R/: Some cosmetic cleanups.
556    
557            * inst/: Removed vignette on clustering. That and much more is now
558            described in the JSS paper on text mining. Based upon that
559            article an elaborated vignette will be incorporated in the future.
560    
561    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
562    
563            * R/: Updated generic S4 methods to comply with signature changes
564            in newer versions of R (> 2.3)
565    
566    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
567    
568            * ext/R/importRIS.R: Automatic RIS import is now possible.
569    
570    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
571    
572            * R/textdoccol.R: Added RIS HTML input format.
573    
574    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
575    
576            * R/textdoccol.R: Removed bug that caused invalid text document
577            collections when handling many input files.
578    
579    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
580    
581            * R/textdoccol.R: Restructured and extended file import
582            mechanism.
583    
584            * inst/doc/clustering.Rnw: Adapted vignette for use with
585            ReutNews.rda
586    
587            * man/ReutNews.Rd: Documentation for ReutNews.rda
588    
589            * data/ReutNews.rda: A tiny Reuters21578 example data set.
590    
591  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
592    
593          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.808

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge