SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC trunk/tm/ChangeLog revision 796, Tue Nov 6 15:22:34 2007 UTC
# Line 1  Line 1 
1    2007-11-06  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/termdocmatrix.R (termFreq): Correct processing of empty
4            documents.
5    
6    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
7    
8            * man/: Updated documentation.
9    
10    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
11    
12            * R/complete.R (completeStems): Completes (heuristically) word
13            stems.
14    
15            * R/termdocmatrix.R (TermDocMatrix2): New modular
16            constructor.
17    
18            * NAMESPACE: Exported termFreq.
19    
20    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
21    
22            * R/reader.R (readDOC): Added MS Word reader (using antiword).
23    
24    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
25    
26            * R/weight.R: Weighting functions for TermDocMatrix.
27    
28    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
29    
30            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
31            functions for accessing dimension, column, and row names.
32    
33            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
34    
35    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
36    
37            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
38    
39    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
40    
41            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
42    
43    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
44    
45            * R/reader.R (readPDF): Removed manual checks for pdftotext and
46            pdfinfo. The system call gives a warning anyway.
47    
48    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
49    
50            * R/textdoccol.R (asPlain): Conversion from
51            StructuredTextDocuments to PlainTextDocuments.
52    
53    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
54    
55            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
56            for accessing term-document matrices.
57    
58            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
59            are installed.
60    
61    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
62    
63            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
64            Christian Buchta.
65    
66    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
67    
68            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
69    
70    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
71    
72            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
73    
74            * R/reader.R (readPDF): Added PDF reader.
75    
76    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
77    
78            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
79    
80            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
81    
82            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
83    
84            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
85    
86    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
87    
88            * R/distmeasure.R (dissimilarity): Replaced dists call from
89            package cba by new dist call from package proxy.
90    
91    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
92    
93            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
94    
95    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
96    
97            * R/termdocmatrix.R: require() uses the quietly option to suppress
98            loading messages.
99    
100    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
101    
102            * R/dictionary.R: Added dictionary support.
103    
104    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
105    
106            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
107            documents. This simplifies some functions, e.g., asPlain.
108    
109    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
110    
111            * inst/doc/tm.Rnw: Fixed some typos in vignette.
112    
113    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
114    
115            * R/textdoccol.R (replaceWords): Added method to replace a set of
116            words by a single word. Useful for synonyms.
117    
118    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
119    
120            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
121    
122    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
123    
124            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
125            vectors. Thanks to Ariel Maguyon for his error report.
126            (removeSparseTerms): New function to remove columns from a
127            term-document matrix exceeding a sparse factor.
128    
129    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
130    
131            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
132    
133    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
134    
135            * man/sFilter.Rd: Corrected documentation on statement format (use
136            '==' instead of '=').
137    
138    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
139    
140            * R/aobjects.R (StructuredTextDocument): Inherits from
141            TextDocument.
142    
143    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
144    
145            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
146            on sparse matrices as proposed by Martin Maechler.
147    
148    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
149    
150            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
151            \pkg{filehash} version makes them deprecated.
152    
153    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
154    
155            * R/termdocmatrix.R (textvector): Stemming is now performed before
156            erasing stopwords.
157            (weightMatrix): Adapted to handle sparse matrices.
158            (TermDocMatrix): Sparse matrix is now efficiently built by
159            direct stepwise insertion of row values into it.
160    
161    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
162    
163            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
164            due to ongoing problems. For our purposes the latter is as useful
165            as the replaced package.
166    
167    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
168    
169            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
170    
171            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
172    
173    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
174    
175            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
176            languages with available stopwords.
177    
178    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
179    
180            * inst/doc/tm.Rnw: Minor corrections in the vignette.
181    
182    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
183    
184            * DESCRIPTION: Update to version 0.2, since a lot of new features
185            have been integrated.
186    
187            * inst/stopwords: Updated existing stopwords and added stopwords
188            for various other languages.
189    
190    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
191    
192            * man/: Updated documentation.
193    
194            * Work/testDb.R: Script to test database stuff.
195    
196            * R/: Fixed various database related bugs. Seems to be rather
197            useable now, i.e., consider as alpha status for now.
198    
199    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
200    
201            * R/: Fixed some bugs related to database support.
202    
203    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
204    
205            * man/: Added a lot of examples to the manuals.
206    
207    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
208    
209            * man/: Updated parts of the documentation.
210    
211            * R/textdoccol.R (asPlain): Added conversion from newsgroup
212            documents to plain text documents.
213    
214    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
215    
216            * R/textdoccol.R: Finished experimental database support. Not yet
217            intensively tested.
218    
219            * R/source.R: Now each source has a default reader.
220    
221            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
222            class anymore.
223    
224            * R/plaintextdoc.R: Custom show method for plain text documents.
225    
226            * R/aobjects.R: Added a class for structured text documents.
227    
228            * R/reader.R: Replaced remaining \code{parser} occurrences with
229            \code{reader}.
230    
231            * R/textdoccol.R (summary): Indent tags.
232    
233            * R/textdoccol.R (removePunctuation): Transform method to remove
234            punctuation marks.
235    
236    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
237    
238            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
239            using prescindMeta().
240    
241    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
242    
243            * R/textdoccol.R: Improved database support.
244    
245    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
246    
247            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
248    
249            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
250            language code.
251    
252            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
253            into parserControl argument.
254    
255            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
256    
257    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
258    
259            * Work/tmDataSetup.R: The datasets acq and crude can now be
260            created on the fly.
261    
262            * R/stopwords.R: Introduced a function returning the stopwords for
263            a given language (English, German and French at the moment)
264    
265            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
266            otherwise falls back to Snowball package.
267    
268    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
269    
270            * man/dissimilarity-methods.Rd: Make clear that any method offered
271            by "dists" from package "cba" can be used.
272    
273    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
274    
275            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
276            to Kurt's latex suggestion. Removed points and underscores in
277            variable names for consistent naming.
278    
279            * DESCRIPTION: Update to version 0.1-2.
280    
281            * man/TextRepository.Rd: Fixed bug in documentation.
282    
283    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
284    
285            * DESCRIPTION: Update to version 0.1-1.
286    
287    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
288    
289            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
290            wordStem.
291    
292    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
293    
294            * R/: Changes due to Kurt's review.
295    
296    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
297    
298            * R/: Implemented improvements based upon comments by David
299            Meyer.
300    
301    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
302    
303            * inst/doc/: Rewrote vignette.
304    
305            * man/: Improved documentation.
306    
307    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
308    
309            * man/: Updated documentation.
310    
311            * DESCRIPTION: Changed package name to "tm". Updated version to
312            0.1 for first CRAN release.
313    
314            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
315            list archive example.
316    
317            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
318            archive example.
319    
320            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
321            from (several mails per box) mbox format to (single mail per file)
322            eml format.
323    
324    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
325    
326            * data/crude.rda: Rebuilt.
327    
328            * data/acq.rda: Rebuilt.
329    
330            * R/reader.R: Factored out reader and parser methods from
331            textdoccol.R.
332    
333            * R/source.R: Factored out Source methods from aobjects.R and
334            textdoccol.R.
335            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
336            feeds.
337    
338            * R/textdoccol.R (DirSource): Added support for recursive
339            traversal of directories.
340    
341    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
342    
343            * R/textdoccol.R ([[): Loads the document corpus automatically
344            into memory upon access.
345            (tm_transform, tm_filter): Removed several checks whether the
346            document is already loaded ([[ ensures this now).
347            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
348            mailing list archive.
349    
350    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
351    
352            * R/aobjects.R (TextDocument): Is now a virtual class.
353            (Source): Is now a virtual class.
354    
355    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
356    
357            * R/textdoccol.R (c): Support for an arbitrary number of document
358            collections.
359    
360    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
361    
362            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
363            append_meta and remove_meta.
364    
365            * R/textdoccol.R: Removed modify_metadata method.
366    
367            * R/textrepo.R: Removed modify_metadata method.
368    
369            * R/textdoccol.R (remove_meta): Supports removal of document
370            collection metadata and document (= in data frame) metadata.
371    
372    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
373    
374            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
375    
376            * data/crude.rda: Rebuilt.
377    
378            * data/acq.rda: Rebuilt.
379    
380            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
381    
382            * R/textdoccol.R ([): Bug fix for subsetting a document
383            collection's data frame.
384    
385    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
386    
387            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
388            to s_filter.
389    
390            * R/textdoccol.R: Local text documents' metadata can now be copied
391            to a document collection's data frame with prescind_meta.
392    
393    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
394    
395            * R/: Text documents' slot metadata is now accessible in s_filter.
396    
397            * R/: Rewrote s_filter function (has still some restrictions).
398    
399    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
400    
401            * R/: Various fixes in handling metadata.
402    
403            * R/: Added update mechanism for text document collections.
404    
405    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
406    
407            * R/: Merging of document collections now creates a binary tree
408            for reconstructing merged document collections.
409    
410            * R/: Redesign of metadata for document collections.
411    
412    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
413    
414            * R/: Messages now use \code{ngettext}.
415    
416    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * R/: Added functions for modifying and removing metadata.
419    
420    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
421    
422            * man/: Updated some documentation.
423    
424            * R/: Corrected some connection issues.
425    
426            * inst/doc: Worked on the vignette.
427    
428    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
429    
430            * inst/: Added texts and started vignette.
431    
432            * R/: Final changes based upon David's comments.
433    
434    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * NAMESPACE: Corrected exports (generic methods need exportMethods
437            directives!).
438    
439    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * R/: Modified the TextDocCol constructur and various parsers. It
442            is now modular and supports various file formats via plugins (see
443            the new "Source" class).
444    
445    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
446    
447            * man/: Revised documentation after previous code changes.
448    
449    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
450    
451            * R/: Remaining changes as discussed with David.
452    
453    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/: Some changes as suggested by David. The rest will follow
456            within the next days.
457    
458    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * man/: Finished documentation.
461    
462    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
463    
464            * man/: Wrote some documentation.
465    
466    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
467    
468            * R/: Further syntactic sugar in form of additional assignment and
469            accessor methods.
470    
471    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
472    
473            * R/: Syntactic sugar in form of "length", "show" and "summary"
474            operators.
475    
476    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
477    
478            * R/: Diverse updates. Mainly on default operators ("[" or "c")
479            and dissimilarities.
480    
481    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * R/: Added similarity functions.
484    
485            * data/: Added english stopwords.
486    
487    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
488    
489            * data/: Examples compiled for new features
490    
491            * R/: Changes due to new structure.
492    
493            * NAMESPACE: Corrected namespace to reflect new structure.
494    
495            * R/termdocmatrix.R: Adapted for new naming scheme.
496    
497    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
498    
499            * R/textdoccol.R: Adapted code for new class structure. Wrote
500            several transform and filter functions operating on text document
501            collections (alias text document databases).
502    
503            * R/aobjects.R: Adapted class structure with inheritance,
504            repositories and additional meta data. Loading files on demand is
505            now possible.
506    
507    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
508    
509            * R/: Some cosmetic cleanups.
510    
511            * inst/: Removed vignette on clustering. That and much more is now
512            described in the JSS paper on text mining. Based upon that
513            article an elaborated vignette will be incorporated in the future.
514    
515    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
516    
517            * R/: Updated generic S4 methods to comply with signature changes
518            in newer versions of R (> 2.3)
519    
520    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
521    
522            * ext/R/importRIS.R: Automatic RIS import is now possible.
523    
524    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
525    
526            * R/textdoccol.R: Added RIS HTML input format.
527    
528    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
529    
530            * R/textdoccol.R: Removed bug that caused invalid text document
531            collections when handling many input files.
532    
533    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
534    
535            * R/textdoccol.R: Restructured and extended file import
536            mechanism.
537    
538            * inst/doc/clustering.Rnw: Adapted vignette for use with
539            ReutNews.rda
540    
541            * man/ReutNews.Rd: Documentation for ReutNews.rda
542    
543            * data/ReutNews.rda: A tiny Reuters21578 example data set.
544    
545    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
546    
547            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
548            clustering facilities of this package.
549    
550    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
551    
552            * R/aobjects.R: Changed package document structure to avoid class
553            dependency problems.
554    
555  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
556    
557            * Wrote a script for the ModLewis Split for the Reuters-21578 XML
558            data set.
559    
560          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
561          CMD check textmin" works without errors.          CMD check textmin" works without errors.
562    

Legend:
Removed from v.28  
changed lines
  Added in v.796

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge