SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 20, Tue Nov 8 16:40:52 2005 UTC trunk/tm/ChangeLog revision 770, Tue Jul 17 12:41:04 2007 UTC
# Line 1  Line 1 
1    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
4            Christian Buchta.
5    
6    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
7    
8            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
9    
10    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
11    
12            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
13    
14            * R/reader.R (readPDF): Added PDF reader.
15    
16    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
17    
18            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
19    
20            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
21    
22            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
23    
24            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
25    
26    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
27    
28            * R/distmeasure.R (dissimilarity): Replaced dists call from
29            package cba by new dist call from package proxy.
30    
31    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
32    
33            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
34    
35    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
36    
37            * R/termdocmatrix.R: require() uses the quietly option to suppress
38            loading messages.
39    
40    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
41    
42            * R/dictionary.R: Added dictionary support.
43    
44    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
45    
46            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
47            documents. This simplifies some functions, e.g., asPlain.
48    
49    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
50    
51            * inst/doc/tm.Rnw: Fixed some typos in vignette.
52    
53    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
54    
55            * R/textdoccol.R (replaceWords): Added method to replace a set of
56            words by a single word. Useful for synonyms.
57    
58    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
59    
60            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
61    
62    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
63    
64            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
65            vectors. Thanks to Ariel Maguyon for his error report.
66            (removeSparseTerms): New function to remove columns from a
67            term-document matrix exceeding a sparse factor.
68    
69    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
70    
71            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
72    
73    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
74    
75            * man/sFilter.Rd: Corrected documentation on statement format (use
76            '==' instead of '=').
77    
78    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
79    
80            * R/aobjects.R (StructuredTextDocument): Inherits from
81            TextDocument.
82    
83    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
84    
85            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
86            on sparse matrices as proposed by Martin Maechler.
87    
88    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
89    
90            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
91            \pkg{filehash} version makes them deprecated.
92    
93    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
94    
95            * R/termdocmatrix.R (textvector): Stemming is now performed before
96            erasing stopwords.
97            (weightMatrix): Adapted to handle sparse matrices.
98            (TermDocMatrix): Sparse matrix is now efficiently built by
99            direct stepwise insertion of row values into it.
100    
101    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
102    
103            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
104            due to ongoing problems. For our purposes the latter is as useful
105            as the replaced package.
106    
107    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
108    
109            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
110    
111            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
112    
113    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
114    
115            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
116            languages with available stopwords.
117    
118    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
119    
120            * inst/doc/tm.Rnw: Minor corrections in the vignette.
121    
122    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
123    
124            * DESCRIPTION: Update to version 0.2, since a lot of new features
125            have been integrated.
126    
127            * inst/stopwords: Updated existing stopwords and added stopwords
128            for various other languages.
129    
130    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
131    
132            * man/: Updated documentation.
133    
134            * Work/testDb.R: Script to test database stuff.
135    
136            * R/: Fixed various database related bugs. Seems to be rather
137            useable now, i.e., consider as alpha status for now.
138    
139    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
140    
141            * R/: Fixed some bugs related to database support.
142    
143    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
144    
145            * man/: Added a lot of examples to the manuals.
146    
147    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
148    
149            * man/: Updated parts of the documentation.
150    
151            * R/textdoccol.R (asPlain): Added conversion from newsgroup
152            documents to plain text documents.
153    
154    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
155    
156            * R/textdoccol.R: Finished experimental database support. Not yet
157            intensively tested.
158    
159            * R/source.R: Now each source has a default reader.
160    
161            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
162            class anymore.
163    
164            * R/plaintextdoc.R: Custom show method for plain text documents.
165    
166            * R/aobjects.R: Added a class for structured text documents.
167    
168            * R/reader.R: Replaced remaining \code{parser} occurrences with
169            \code{reader}.
170    
171            * R/textdoccol.R (summary): Indent tags.
172    
173            * R/textdoccol.R (removePunctuation): Transform method to remove
174            punctuation marks.
175    
176    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
177    
178            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
179            using prescindMeta().
180    
181    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
182    
183            * R/textdoccol.R: Improved database support.
184    
185    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
186    
187            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
188    
189            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
190            language code.
191    
192            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
193            into parserControl argument.
194    
195            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
196    
197    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
198    
199            * Work/tmDataSetup.R: The datasets acq and crude can now be
200            created on the fly.
201    
202            * R/stopwords.R: Introduced a function returning the stopwords for
203            a given language (English, German and French at the moment)
204    
205            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
206            otherwise falls back to Snowball package.
207    
208    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
209    
210            * man/dissimilarity-methods.Rd: Make clear that any method offered
211            by "dists" from package "cba" can be used.
212    
213    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
214    
215            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
216            to Kurt's latex suggestion. Removed points and underscores in
217            variable names for consistent naming.
218    
219            * DESCRIPTION: Update to version 0.1-2.
220    
221            * man/TextRepository.Rd: Fixed bug in documentation.
222    
223    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
224    
225            * DESCRIPTION: Update to version 0.1-1.
226    
227    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
228    
229            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
230            wordStem.
231    
232    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
233    
234            * R/: Changes due to Kurt's review.
235    
236    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
237    
238            * R/: Implemented improvements based upon comments by David
239            Meyer.
240    
241    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
242    
243            * inst/doc/: Rewrote vignette.
244    
245            * man/: Improved documentation.
246    
247    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
248    
249            * man/: Updated documentation.
250    
251            * DESCRIPTION: Changed package name to "tm". Updated version to
252            0.1 for first CRAN release.
253    
254            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
255            list archive example.
256    
257            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
258            archive example.
259    
260            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
261            from (several mails per box) mbox format to (single mail per file)
262            eml format.
263    
264    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
265    
266            * data/crude.rda: Rebuilt.
267    
268            * data/acq.rda: Rebuilt.
269    
270            * R/reader.R: Factored out reader and parser methods from
271            textdoccol.R.
272    
273            * R/source.R: Factored out Source methods from aobjects.R and
274            textdoccol.R.
275            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
276            feeds.
277    
278            * R/textdoccol.R (DirSource): Added support for recursive
279            traversal of directories.
280    
281    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
282    
283            * R/textdoccol.R ([[): Loads the document corpus automatically
284            into memory upon access.
285            (tm_transform, tm_filter): Removed several checks whether the
286            document is already loaded ([[ ensures this now).
287            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
288            mailing list archive.
289    
290    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
291    
292            * R/aobjects.R (TextDocument): Is now a virtual class.
293            (Source): Is now a virtual class.
294    
295    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
296    
297            * R/textdoccol.R (c): Support for an arbitrary number of document
298            collections.
299    
300    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
301    
302            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
303            append_meta and remove_meta.
304    
305            * R/textdoccol.R: Removed modify_metadata method.
306    
307            * R/textrepo.R: Removed modify_metadata method.
308    
309            * R/textdoccol.R (remove_meta): Supports removal of document
310            collection metadata and document (= in data frame) metadata.
311    
312    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
313    
314            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
315    
316            * data/crude.rda: Rebuilt.
317    
318            * data/acq.rda: Rebuilt.
319    
320            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
321    
322            * R/textdoccol.R ([): Bug fix for subsetting a document
323            collection's data frame.
324    
325    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
326    
327            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
328            to s_filter.
329    
330            * R/textdoccol.R: Local text documents' metadata can now be copied
331            to a document collection's data frame with prescind_meta.
332    
333    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
334    
335            * R/: Text documents' slot metadata is now accessible in s_filter.
336    
337            * R/: Rewrote s_filter function (has still some restrictions).
338    
339    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
340    
341            * R/: Various fixes in handling metadata.
342    
343            * R/: Added update mechanism for text document collections.
344    
345    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
346    
347            * R/: Merging of document collections now creates a binary tree
348            for reconstructing merged document collections.
349    
350            * R/: Redesign of metadata for document collections.
351    
352    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
353    
354            * R/: Messages now use \code{ngettext}.
355    
356    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
357    
358            * R/: Added functions for modifying and removing metadata.
359    
360    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
361    
362            * man/: Updated some documentation.
363    
364            * R/: Corrected some connection issues.
365    
366            * inst/doc: Worked on the vignette.
367    
368    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
369    
370            * inst/: Added texts and started vignette.
371    
372            * R/: Final changes based upon David's comments.
373    
374    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
375    
376            * NAMESPACE: Corrected exports (generic methods need exportMethods
377            directives!).
378    
379    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
380    
381            * R/: Modified the TextDocCol constructur and various parsers. It
382            is now modular and supports various file formats via plugins (see
383            the new "Source" class).
384    
385    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
386    
387            * man/: Revised documentation after previous code changes.
388    
389    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
390    
391            * R/: Remaining changes as discussed with David.
392    
393    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
394    
395            * R/: Some changes as suggested by David. The rest will follow
396            within the next days.
397    
398    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
399    
400            * man/: Finished documentation.
401    
402    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
403    
404            * man/: Wrote some documentation.
405    
406    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
407    
408            * R/: Further syntactic sugar in form of additional assignment and
409            accessor methods.
410    
411    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * R/: Syntactic sugar in form of "length", "show" and "summary"
414            operators.
415    
416    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * R/: Diverse updates. Mainly on default operators ("[" or "c")
419            and dissimilarities.
420    
421    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
422    
423            * R/: Added similarity functions.
424    
425            * data/: Added english stopwords.
426    
427    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
428    
429            * data/: Examples compiled for new features
430    
431            * R/: Changes due to new structure.
432    
433            * NAMESPACE: Corrected namespace to reflect new structure.
434    
435            * R/termdocmatrix.R: Adapted for new naming scheme.
436    
437    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
438    
439            * R/textdoccol.R: Adapted code for new class structure. Wrote
440            several transform and filter functions operating on text document
441            collections (alias text document databases).
442    
443            * R/aobjects.R: Adapted class structure with inheritance,
444            repositories and additional meta data. Loading files on demand is
445            now possible.
446    
447    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
448    
449            * R/: Some cosmetic cleanups.
450    
451            * inst/: Removed vignette on clustering. That and much more is now
452            described in the JSS paper on text mining. Based upon that
453            article an elaborated vignette will be incorporated in the future.
454    
455    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
456    
457            * R/: Updated generic S4 methods to comply with signature changes
458            in newer versions of R (> 2.3)
459    
460    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
461    
462            * ext/R/importRIS.R: Automatic RIS import is now possible.
463    
464    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
465    
466            * R/textdoccol.R: Added RIS HTML input format.
467    
468    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/textdoccol.R: Removed bug that caused invalid text document
471            collections when handling many input files.
472    
473    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/textdoccol.R: Restructured and extended file import
476            mechanism.
477    
478            * inst/doc/clustering.Rnw: Adapted vignette for use with
479            ReutNews.rda
480    
481            * man/ReutNews.Rd: Documentation for ReutNews.rda
482    
483            * data/ReutNews.rda: A tiny Reuters21578 example data set.
484    
485    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
486    
487            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
488            clustering facilities of this package.
489    
490    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
491    
492            * R/aobjects.R: Changed package document structure to avoid class
493            dependency problems.
494    
495    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497            * Wrote a script for the ModLewis Split for the Reuters-21578 XML
498            data set.
499    
500            * Finished documentation and reordered directory structure. Now "R
501            CMD check textmin" works without errors.
502    
503    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
504    
505            * src/: Various splits can now be easily created for the
506            Reuters21578 data set.
507    
508    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * Updated documentation
511    
512    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
513    
514            * Wrote R documentation for some classes and methods.
515    
516    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
519            files. See the questionnaire data/Umfrage.csv for such an example.
520            We are now able to import files in Reuters-21578 XML format.
521    
522            * Changed class interfaces in various files. Weighting of the text
523            matrix is now possible.
524    
525  2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
526    
527          * R/textdoccol.R: One can build term-document matrices if          * R/textdoccol.R: One can build term-document matrices if

Legend:
Removed from v.20  
changed lines
  Added in v.770

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge