SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 33, Thu Dec 15 13:29:17 2005 UTC trunk/tm/ChangeLog revision 768, Sun Jul 15 08:33:56 2007 UTC
# Line 1  Line 1 
1    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
4    
5    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
6    
7            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
8    
9            * R/reader.R (readPDF): Added PDF reader.
10    
11    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
12    
13            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
14    
15            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
16    
17            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
18    
19            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
20    
21    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
22    
23            * R/distmeasure.R (dissimilarity): Replaced dists call from
24            package cba by new dist call from package proxy.
25    
26    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
27    
28            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
29    
30    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
31    
32            * R/termdocmatrix.R: require() uses the quietly option to suppress
33            loading messages.
34    
35    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
36    
37            * R/dictionary.R: Added dictionary support.
38    
39    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
40    
41            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
42            documents. This simplifies some functions, e.g., asPlain.
43    
44    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
45    
46            * inst/doc/tm.Rnw: Fixed some typos in vignette.
47    
48    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
49    
50            * R/textdoccol.R (replaceWords): Added method to replace a set of
51            words by a single word. Useful for synonyms.
52    
53    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
54    
55            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
56    
57    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
58    
59            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
60            vectors. Thanks to Ariel Maguyon for his error report.
61            (removeSparseTerms): New function to remove columns from a
62            term-document matrix exceeding a sparse factor.
63    
64    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
65    
66            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
67    
68    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
69    
70            * man/sFilter.Rd: Corrected documentation on statement format (use
71            '==' instead of '=').
72    
73    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
74    
75            * R/aobjects.R (StructuredTextDocument): Inherits from
76            TextDocument.
77    
78    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
79    
80            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
81            on sparse matrices as proposed by Martin Maechler.
82    
83    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
84    
85            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
86            \pkg{filehash} version makes them deprecated.
87    
88    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
89    
90            * R/termdocmatrix.R (textvector): Stemming is now performed before
91            erasing stopwords.
92            (weightMatrix): Adapted to handle sparse matrices.
93            (TermDocMatrix): Sparse matrix is now efficiently built by
94            direct stepwise insertion of row values into it.
95    
96    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
97    
98            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
99            due to ongoing problems. For our purposes the latter is as useful
100            as the replaced package.
101    
102    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
103    
104            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
105    
106            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
107    
108    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
109    
110            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
111            languages with available stopwords.
112    
113    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
114    
115            * inst/doc/tm.Rnw: Minor corrections in the vignette.
116    
117    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
118    
119            * DESCRIPTION: Update to version 0.2, since a lot of new features
120            have been integrated.
121    
122            * inst/stopwords: Updated existing stopwords and added stopwords
123            for various other languages.
124    
125    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
126    
127            * man/: Updated documentation.
128    
129            * Work/testDb.R: Script to test database stuff.
130    
131            * R/: Fixed various database related bugs. Seems to be rather
132            useable now, i.e., consider as alpha status for now.
133    
134    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
135    
136            * R/: Fixed some bugs related to database support.
137    
138    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
139    
140            * man/: Added a lot of examples to the manuals.
141    
142    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
143    
144            * man/: Updated parts of the documentation.
145    
146            * R/textdoccol.R (asPlain): Added conversion from newsgroup
147            documents to plain text documents.
148    
149    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
150    
151            * R/textdoccol.R: Finished experimental database support. Not yet
152            intensively tested.
153    
154            * R/source.R: Now each source has a default reader.
155    
156            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
157            class anymore.
158    
159            * R/plaintextdoc.R: Custom show method for plain text documents.
160    
161            * R/aobjects.R: Added a class for structured text documents.
162    
163            * R/reader.R: Replaced remaining \code{parser} occurrences with
164            \code{reader}.
165    
166            * R/textdoccol.R (summary): Indent tags.
167    
168            * R/textdoccol.R (removePunctuation): Transform method to remove
169            punctuation marks.
170    
171    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
172    
173            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
174            using prescindMeta().
175    
176    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
177    
178            * R/textdoccol.R: Improved database support.
179    
180    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
181    
182            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
183    
184            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
185            language code.
186    
187            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
188            into parserControl argument.
189    
190            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
191    
192    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
193    
194            * Work/tmDataSetup.R: The datasets acq and crude can now be
195            created on the fly.
196    
197            * R/stopwords.R: Introduced a function returning the stopwords for
198            a given language (English, German and French at the moment)
199    
200            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
201            otherwise falls back to Snowball package.
202    
203    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
204    
205            * man/dissimilarity-methods.Rd: Make clear that any method offered
206            by "dists" from package "cba" can be used.
207    
208    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
209    
210            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
211            to Kurt's latex suggestion. Removed points and underscores in
212            variable names for consistent naming.
213    
214            * DESCRIPTION: Update to version 0.1-2.
215    
216            * man/TextRepository.Rd: Fixed bug in documentation.
217    
218    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
219    
220            * DESCRIPTION: Update to version 0.1-1.
221    
222    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
223    
224            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
225            wordStem.
226    
227    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
228    
229            * R/: Changes due to Kurt's review.
230    
231    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
232    
233            * R/: Implemented improvements based upon comments by David
234            Meyer.
235    
236    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
237    
238            * inst/doc/: Rewrote vignette.
239    
240            * man/: Improved documentation.
241    
242    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
243    
244            * man/: Updated documentation.
245    
246            * DESCRIPTION: Changed package name to "tm". Updated version to
247            0.1 for first CRAN release.
248    
249            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
250            list archive example.
251    
252            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
253            archive example.
254    
255            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
256            from (several mails per box) mbox format to (single mail per file)
257            eml format.
258    
259    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
260    
261            * data/crude.rda: Rebuilt.
262    
263            * data/acq.rda: Rebuilt.
264    
265            * R/reader.R: Factored out reader and parser methods from
266            textdoccol.R.
267    
268            * R/source.R: Factored out Source methods from aobjects.R and
269            textdoccol.R.
270            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
271            feeds.
272    
273            * R/textdoccol.R (DirSource): Added support for recursive
274            traversal of directories.
275    
276    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
277    
278            * R/textdoccol.R ([[): Loads the document corpus automatically
279            into memory upon access.
280            (tm_transform, tm_filter): Removed several checks whether the
281            document is already loaded ([[ ensures this now).
282            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
283            mailing list archive.
284    
285    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
286    
287            * R/aobjects.R (TextDocument): Is now a virtual class.
288            (Source): Is now a virtual class.
289    
290    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
291    
292            * R/textdoccol.R (c): Support for an arbitrary number of document
293            collections.
294    
295    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
296    
297            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
298            append_meta and remove_meta.
299    
300            * R/textdoccol.R: Removed modify_metadata method.
301    
302            * R/textrepo.R: Removed modify_metadata method.
303    
304            * R/textdoccol.R (remove_meta): Supports removal of document
305            collection metadata and document (= in data frame) metadata.
306    
307    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
308    
309            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
310    
311            * data/crude.rda: Rebuilt.
312    
313            * data/acq.rda: Rebuilt.
314    
315            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
316    
317            * R/textdoccol.R ([): Bug fix for subsetting a document
318            collection's data frame.
319    
320    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
321    
322            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
323            to s_filter.
324    
325            * R/textdoccol.R: Local text documents' metadata can now be copied
326            to a document collection's data frame with prescind_meta.
327    
328    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
329    
330            * R/: Text documents' slot metadata is now accessible in s_filter.
331    
332            * R/: Rewrote s_filter function (has still some restrictions).
333    
334    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
335    
336            * R/: Various fixes in handling metadata.
337    
338            * R/: Added update mechanism for text document collections.
339    
340    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
341    
342            * R/: Merging of document collections now creates a binary tree
343            for reconstructing merged document collections.
344    
345            * R/: Redesign of metadata for document collections.
346    
347    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
348    
349            * R/: Messages now use \code{ngettext}.
350    
351    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
352    
353            * R/: Added functions for modifying and removing metadata.
354    
355    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
356    
357            * man/: Updated some documentation.
358    
359            * R/: Corrected some connection issues.
360    
361            * inst/doc: Worked on the vignette.
362    
363    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
364    
365            * inst/: Added texts and started vignette.
366    
367            * R/: Final changes based upon David's comments.
368    
369    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
370    
371            * NAMESPACE: Corrected exports (generic methods need exportMethods
372            directives!).
373    
374    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
375    
376            * R/: Modified the TextDocCol constructur and various parsers. It
377            is now modular and supports various file formats via plugins (see
378            the new "Source" class).
379    
380    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
381    
382            * man/: Revised documentation after previous code changes.
383    
384    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
385    
386            * R/: Remaining changes as discussed with David.
387    
388    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
389    
390            * R/: Some changes as suggested by David. The rest will follow
391            within the next days.
392    
393    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
394    
395            * man/: Finished documentation.
396    
397    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
398    
399            * man/: Wrote some documentation.
400    
401    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * R/: Further syntactic sugar in form of additional assignment and
404            accessor methods.
405    
406    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
407    
408            * R/: Syntactic sugar in form of "length", "show" and "summary"
409            operators.
410    
411    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * R/: Diverse updates. Mainly on default operators ("[" or "c")
414            and dissimilarities.
415    
416    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * R/: Added similarity functions.
419    
420            * data/: Added english stopwords.
421    
422    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
423    
424            * data/: Examples compiled for new features
425    
426            * R/: Changes due to new structure.
427    
428            * NAMESPACE: Corrected namespace to reflect new structure.
429    
430            * R/termdocmatrix.R: Adapted for new naming scheme.
431    
432    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
433    
434            * R/textdoccol.R: Adapted code for new class structure. Wrote
435            several transform and filter functions operating on text document
436            collections (alias text document databases).
437    
438            * R/aobjects.R: Adapted class structure with inheritance,
439            repositories and additional meta data. Loading files on demand is
440            now possible.
441    
442    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
443    
444            * R/: Some cosmetic cleanups.
445    
446            * inst/: Removed vignette on clustering. That and much more is now
447            described in the JSS paper on text mining. Based upon that
448            article an elaborated vignette will be incorporated in the future.
449    
450    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
451    
452            * R/: Updated generic S4 methods to comply with signature changes
453            in newer versions of R (> 2.3)
454    
455    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
456    
457            * ext/R/importRIS.R: Automatic RIS import is now possible.
458    
459    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
460    
461            * R/textdoccol.R: Added RIS HTML input format.
462    
463    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/textdoccol.R: Removed bug that caused invalid text document
466            collections when handling many input files.
467    
468    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/textdoccol.R: Restructured and extended file import
471            mechanism.
472    
473            * inst/doc/clustering.Rnw: Adapted vignette for use with
474            ReutNews.rda
475    
476            * man/ReutNews.Rd: Documentation for ReutNews.rda
477    
478            * data/ReutNews.rda: A tiny Reuters21578 example data set.
479    
480    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
481    
482            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
483            clustering facilities of this package.
484    
485  2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
486    
487          * R/aobjects.R: Changed package document structure to avoid class          * R/aobjects.R: Changed package document structure to avoid class

Legend:
Removed from v.33  
changed lines
  Added in v.768

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge