SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 36, Wed Jan 11 15:42:56 2006 UTC trunk/tm/ChangeLog revision 790, Sun Oct 21 08:27:13 2007 UTC
# Line 1  Line 1 
1    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/termdocmatrix.R (TermDocMatrix2): New modular
4            constructor.
5    
6            * NAMESPACE: Exported termFreq.
7    
8    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
9    
10            * R/reader.R (readDOC): Added MS Word reader (using antiword).
11    
12    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
13    
14            * R/weight.R: Weighting functions for TermDocMatrix.
15    
16    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
17    
18            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
19            functions for accessing dimension, column, and row names.
20    
21            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
22    
23    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
24    
25            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
26    
27    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
28    
29            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
30    
31    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
32    
33            * R/reader.R (readPDF): Removed manual checks for pdftotext and
34            pdfinfo. The system call gives a warning anyway.
35    
36    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
37    
38            * R/textdoccol.R (asPlain): Conversion from
39            StructuredTextDocuments to PlainTextDocuments.
40    
41    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
42    
43            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
44            for accessing term-document matrices.
45    
46            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
47            are installed.
48    
49    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
50    
51            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
52            Christian Buchta.
53    
54    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
55    
56            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
57    
58    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
59    
60            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
61    
62            * R/reader.R (readPDF): Added PDF reader.
63    
64    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
65    
66            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
67    
68            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
69    
70            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
71    
72            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
73    
74    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
75    
76            * R/distmeasure.R (dissimilarity): Replaced dists call from
77            package cba by new dist call from package proxy.
78    
79    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
80    
81            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
82    
83    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
84    
85            * R/termdocmatrix.R: require() uses the quietly option to suppress
86            loading messages.
87    
88    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
89    
90            * R/dictionary.R: Added dictionary support.
91    
92    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
93    
94            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
95            documents. This simplifies some functions, e.g., asPlain.
96    
97    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
98    
99            * inst/doc/tm.Rnw: Fixed some typos in vignette.
100    
101    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
102    
103            * R/textdoccol.R (replaceWords): Added method to replace a set of
104            words by a single word. Useful for synonyms.
105    
106    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
107    
108            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
109    
110    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
111    
112            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
113            vectors. Thanks to Ariel Maguyon for his error report.
114            (removeSparseTerms): New function to remove columns from a
115            term-document matrix exceeding a sparse factor.
116    
117    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
118    
119            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
120    
121    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
122    
123            * man/sFilter.Rd: Corrected documentation on statement format (use
124            '==' instead of '=').
125    
126    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
127    
128            * R/aobjects.R (StructuredTextDocument): Inherits from
129            TextDocument.
130    
131    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
132    
133            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
134            on sparse matrices as proposed by Martin Maechler.
135    
136    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
137    
138            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
139            \pkg{filehash} version makes them deprecated.
140    
141    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
142    
143            * R/termdocmatrix.R (textvector): Stemming is now performed before
144            erasing stopwords.
145            (weightMatrix): Adapted to handle sparse matrices.
146            (TermDocMatrix): Sparse matrix is now efficiently built by
147            direct stepwise insertion of row values into it.
148    
149    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
150    
151            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
152            due to ongoing problems. For our purposes the latter is as useful
153            as the replaced package.
154    
155    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
156    
157            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
158    
159            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
160    
161    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
162    
163            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
164            languages with available stopwords.
165    
166    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
167    
168            * inst/doc/tm.Rnw: Minor corrections in the vignette.
169    
170    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
171    
172            * DESCRIPTION: Update to version 0.2, since a lot of new features
173            have been integrated.
174    
175            * inst/stopwords: Updated existing stopwords and added stopwords
176            for various other languages.
177    
178    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
179    
180            * man/: Updated documentation.
181    
182            * Work/testDb.R: Script to test database stuff.
183    
184            * R/: Fixed various database related bugs. Seems to be rather
185            useable now, i.e., consider as alpha status for now.
186    
187    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
188    
189            * R/: Fixed some bugs related to database support.
190    
191    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
192    
193            * man/: Added a lot of examples to the manuals.
194    
195    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
196    
197            * man/: Updated parts of the documentation.
198    
199            * R/textdoccol.R (asPlain): Added conversion from newsgroup
200            documents to plain text documents.
201    
202    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
203    
204            * R/textdoccol.R: Finished experimental database support. Not yet
205            intensively tested.
206    
207            * R/source.R: Now each source has a default reader.
208    
209            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
210            class anymore.
211    
212            * R/plaintextdoc.R: Custom show method for plain text documents.
213    
214            * R/aobjects.R: Added a class for structured text documents.
215    
216            * R/reader.R: Replaced remaining \code{parser} occurrences with
217            \code{reader}.
218    
219            * R/textdoccol.R (summary): Indent tags.
220    
221            * R/textdoccol.R (removePunctuation): Transform method to remove
222            punctuation marks.
223    
224    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
225    
226            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
227            using prescindMeta().
228    
229    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
230    
231            * R/textdoccol.R: Improved database support.
232    
233    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
234    
235            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
236    
237            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
238            language code.
239    
240            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
241            into parserControl argument.
242    
243            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
244    
245    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
246    
247            * Work/tmDataSetup.R: The datasets acq and crude can now be
248            created on the fly.
249    
250            * R/stopwords.R: Introduced a function returning the stopwords for
251            a given language (English, German and French at the moment)
252    
253            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
254            otherwise falls back to Snowball package.
255    
256    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
257    
258            * man/dissimilarity-methods.Rd: Make clear that any method offered
259            by "dists" from package "cba" can be used.
260    
261    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
262    
263            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
264            to Kurt's latex suggestion. Removed points and underscores in
265            variable names for consistent naming.
266    
267            * DESCRIPTION: Update to version 0.1-2.
268    
269            * man/TextRepository.Rd: Fixed bug in documentation.
270    
271    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
272    
273            * DESCRIPTION: Update to version 0.1-1.
274    
275    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
276    
277            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
278            wordStem.
279    
280    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
281    
282            * R/: Changes due to Kurt's review.
283    
284    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
285    
286            * R/: Implemented improvements based upon comments by David
287            Meyer.
288    
289    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
290    
291            * inst/doc/: Rewrote vignette.
292    
293            * man/: Improved documentation.
294    
295    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
296    
297            * man/: Updated documentation.
298    
299            * DESCRIPTION: Changed package name to "tm". Updated version to
300            0.1 for first CRAN release.
301    
302            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
303            list archive example.
304    
305            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
306            archive example.
307    
308            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
309            from (several mails per box) mbox format to (single mail per file)
310            eml format.
311    
312    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
313    
314            * data/crude.rda: Rebuilt.
315    
316            * data/acq.rda: Rebuilt.
317    
318            * R/reader.R: Factored out reader and parser methods from
319            textdoccol.R.
320    
321            * R/source.R: Factored out Source methods from aobjects.R and
322            textdoccol.R.
323            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
324            feeds.
325    
326            * R/textdoccol.R (DirSource): Added support for recursive
327            traversal of directories.
328    
329    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
330    
331            * R/textdoccol.R ([[): Loads the document corpus automatically
332            into memory upon access.
333            (tm_transform, tm_filter): Removed several checks whether the
334            document is already loaded ([[ ensures this now).
335            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
336            mailing list archive.
337    
338    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
339    
340            * R/aobjects.R (TextDocument): Is now a virtual class.
341            (Source): Is now a virtual class.
342    
343    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
344    
345            * R/textdoccol.R (c): Support for an arbitrary number of document
346            collections.
347    
348    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
349    
350            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
351            append_meta and remove_meta.
352    
353            * R/textdoccol.R: Removed modify_metadata method.
354    
355            * R/textrepo.R: Removed modify_metadata method.
356    
357            * R/textdoccol.R (remove_meta): Supports removal of document
358            collection metadata and document (= in data frame) metadata.
359    
360    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
361    
362            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
363    
364            * data/crude.rda: Rebuilt.
365    
366            * data/acq.rda: Rebuilt.
367    
368            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
369    
370            * R/textdoccol.R ([): Bug fix for subsetting a document
371            collection's data frame.
372    
373    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
374    
375            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
376            to s_filter.
377    
378            * R/textdoccol.R: Local text documents' metadata can now be copied
379            to a document collection's data frame with prescind_meta.
380    
381    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
382    
383            * R/: Text documents' slot metadata is now accessible in s_filter.
384    
385            * R/: Rewrote s_filter function (has still some restrictions).
386    
387    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
388    
389            * R/: Various fixes in handling metadata.
390    
391            * R/: Added update mechanism for text document collections.
392    
393    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
394    
395            * R/: Merging of document collections now creates a binary tree
396            for reconstructing merged document collections.
397    
398            * R/: Redesign of metadata for document collections.
399    
400    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
401    
402            * R/: Messages now use \code{ngettext}.
403    
404    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
405    
406            * R/: Added functions for modifying and removing metadata.
407    
408    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
409    
410            * man/: Updated some documentation.
411    
412            * R/: Corrected some connection issues.
413    
414            * inst/doc: Worked on the vignette.
415    
416    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * inst/: Added texts and started vignette.
419    
420            * R/: Final changes based upon David's comments.
421    
422    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
423    
424            * NAMESPACE: Corrected exports (generic methods need exportMethods
425            directives!).
426    
427    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
428    
429            * R/: Modified the TextDocCol constructur and various parsers. It
430            is now modular and supports various file formats via plugins (see
431            the new "Source" class).
432    
433    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
434    
435            * man/: Revised documentation after previous code changes.
436    
437    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
438    
439            * R/: Remaining changes as discussed with David.
440    
441    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
442    
443            * R/: Some changes as suggested by David. The rest will follow
444            within the next days.
445    
446    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
447    
448            * man/: Finished documentation.
449    
450    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
451    
452            * man/: Wrote some documentation.
453    
454    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * R/: Further syntactic sugar in form of additional assignment and
457            accessor methods.
458    
459    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
460    
461            * R/: Syntactic sugar in form of "length", "show" and "summary"
462            operators.
463    
464    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
465    
466            * R/: Diverse updates. Mainly on default operators ("[" or "c")
467            and dissimilarities.
468    
469    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
470    
471            * R/: Added similarity functions.
472    
473            * data/: Added english stopwords.
474    
475    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
476    
477            * data/: Examples compiled for new features
478    
479            * R/: Changes due to new structure.
480    
481            * NAMESPACE: Corrected namespace to reflect new structure.
482    
483            * R/termdocmatrix.R: Adapted for new naming scheme.
484    
485    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
486    
487            * R/textdoccol.R: Adapted code for new class structure. Wrote
488            several transform and filter functions operating on text document
489            collections (alias text document databases).
490    
491            * R/aobjects.R: Adapted class structure with inheritance,
492            repositories and additional meta data. Loading files on demand is
493            now possible.
494    
495    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497            * R/: Some cosmetic cleanups.
498    
499            * inst/: Removed vignette on clustering. That and much more is now
500            described in the JSS paper on text mining. Based upon that
501            article an elaborated vignette will be incorporated in the future.
502    
503    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
504    
505            * R/: Updated generic S4 methods to comply with signature changes
506            in newer versions of R (> 2.3)
507    
508    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * ext/R/importRIS.R: Automatic RIS import is now possible.
511    
512    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
513    
514            * R/textdoccol.R: Added RIS HTML input format.
515    
516    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/textdoccol.R: Removed bug that caused invalid text document
519            collections when handling many input files.
520    
521  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/textdoccol.R: Restructured and extended file import
524            mechanism.
525    
526          * inst/doc/clustering.Rnw: Adapted vignette for use with          * inst/doc/clustering.Rnw: Adapted vignette for use with
527          ReutNews.rda          ReutNews.rda
528    

Legend:
Removed from v.36  
changed lines
  Added in v.790

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge