SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 26, Sat Dec 3 15:20:17 2005 UTC trunk/tm/ChangeLog revision 767, Sat Jul 14 16:50:44 2007 UTC
# Line 1  Line 1 
1    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
4    
5            * R/reader.R (readPDF): Added PDF reader.
6    
7    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
8    
9            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
10    
11            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
12    
13            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
14    
15            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
16    
17    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
18    
19            * R/distmeasure.R (dissimilarity): Replaced dists call from
20            package cba by new dist call from package proxy.
21    
22    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
23    
24            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
25    
26    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
27    
28            * R/termdocmatrix.R: require() uses the quietly option to suppress
29            loading messages.
30    
31    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
32    
33            * R/dictionary.R: Added dictionary support.
34    
35    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
36    
37            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
38            documents. This simplifies some functions, e.g., asPlain.
39    
40    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
41    
42            * inst/doc/tm.Rnw: Fixed some typos in vignette.
43    
44    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
45    
46            * R/textdoccol.R (replaceWords): Added method to replace a set of
47            words by a single word. Useful for synonyms.
48    
49    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
50    
51            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
52    
53    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
54    
55            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
56            vectors. Thanks to Ariel Maguyon for his error report.
57            (removeSparseTerms): New function to remove columns from a
58            term-document matrix exceeding a sparse factor.
59    
60    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
61    
62            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
63    
64    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
65    
66            * man/sFilter.Rd: Corrected documentation on statement format (use
67            '==' instead of '=').
68    
69    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
70    
71            * R/aobjects.R (StructuredTextDocument): Inherits from
72            TextDocument.
73    
74    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
75    
76            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
77            on sparse matrices as proposed by Martin Maechler.
78    
79    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
80    
81            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
82            \pkg{filehash} version makes them deprecated.
83    
84    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
85    
86            * R/termdocmatrix.R (textvector): Stemming is now performed before
87            erasing stopwords.
88            (weightMatrix): Adapted to handle sparse matrices.
89            (TermDocMatrix): Sparse matrix is now efficiently built by
90            direct stepwise insertion of row values into it.
91    
92    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
93    
94            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
95            due to ongoing problems. For our purposes the latter is as useful
96            as the replaced package.
97    
98    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
99    
100            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
101    
102            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
103    
104    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
105    
106            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
107            languages with available stopwords.
108    
109    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
110    
111            * inst/doc/tm.Rnw: Minor corrections in the vignette.
112    
113    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
114    
115            * DESCRIPTION: Update to version 0.2, since a lot of new features
116            have been integrated.
117    
118            * inst/stopwords: Updated existing stopwords and added stopwords
119            for various other languages.
120    
121    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
122    
123            * man/: Updated documentation.
124    
125            * Work/testDb.R: Script to test database stuff.
126    
127            * R/: Fixed various database related bugs. Seems to be rather
128            useable now, i.e., consider as alpha status for now.
129    
130    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
131    
132            * R/: Fixed some bugs related to database support.
133    
134    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
135    
136            * man/: Added a lot of examples to the manuals.
137    
138    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
139    
140            * man/: Updated parts of the documentation.
141    
142            * R/textdoccol.R (asPlain): Added conversion from newsgroup
143            documents to plain text documents.
144    
145    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
146    
147            * R/textdoccol.R: Finished experimental database support. Not yet
148            intensively tested.
149    
150            * R/source.R: Now each source has a default reader.
151    
152            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
153            class anymore.
154    
155            * R/plaintextdoc.R: Custom show method for plain text documents.
156    
157            * R/aobjects.R: Added a class for structured text documents.
158    
159            * R/reader.R: Replaced remaining \code{parser} occurrences with
160            \code{reader}.
161    
162            * R/textdoccol.R (summary): Indent tags.
163    
164            * R/textdoccol.R (removePunctuation): Transform method to remove
165            punctuation marks.
166    
167    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
168    
169            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
170            using prescindMeta().
171    
172    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
173    
174            * R/textdoccol.R: Improved database support.
175    
176    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
177    
178            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
179    
180            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
181            language code.
182    
183            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
184            into parserControl argument.
185    
186            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
187    
188    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
189    
190            * Work/tmDataSetup.R: The datasets acq and crude can now be
191            created on the fly.
192    
193            * R/stopwords.R: Introduced a function returning the stopwords for
194            a given language (English, German and French at the moment)
195    
196            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
197            otherwise falls back to Snowball package.
198    
199    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
200    
201            * man/dissimilarity-methods.Rd: Make clear that any method offered
202            by "dists" from package "cba" can be used.
203    
204    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
205    
206            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
207            to Kurt's latex suggestion. Removed points and underscores in
208            variable names for consistent naming.
209    
210            * DESCRIPTION: Update to version 0.1-2.
211    
212            * man/TextRepository.Rd: Fixed bug in documentation.
213    
214    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
215    
216            * DESCRIPTION: Update to version 0.1-1.
217    
218    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
219    
220            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
221            wordStem.
222    
223    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
224    
225            * R/: Changes due to Kurt's review.
226    
227    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
228    
229            * R/: Implemented improvements based upon comments by David
230            Meyer.
231    
232    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
233    
234            * inst/doc/: Rewrote vignette.
235    
236            * man/: Improved documentation.
237    
238    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
239    
240            * man/: Updated documentation.
241    
242            * DESCRIPTION: Changed package name to "tm". Updated version to
243            0.1 for first CRAN release.
244    
245            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
246            list archive example.
247    
248            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
249            archive example.
250    
251            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
252            from (several mails per box) mbox format to (single mail per file)
253            eml format.
254    
255    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
256    
257            * data/crude.rda: Rebuilt.
258    
259            * data/acq.rda: Rebuilt.
260    
261            * R/reader.R: Factored out reader and parser methods from
262            textdoccol.R.
263    
264            * R/source.R: Factored out Source methods from aobjects.R and
265            textdoccol.R.
266            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
267            feeds.
268    
269            * R/textdoccol.R (DirSource): Added support for recursive
270            traversal of directories.
271    
272    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
273    
274            * R/textdoccol.R ([[): Loads the document corpus automatically
275            into memory upon access.
276            (tm_transform, tm_filter): Removed several checks whether the
277            document is already loaded ([[ ensures this now).
278            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
279            mailing list archive.
280    
281    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
282    
283            * R/aobjects.R (TextDocument): Is now a virtual class.
284            (Source): Is now a virtual class.
285    
286    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
287    
288            * R/textdoccol.R (c): Support for an arbitrary number of document
289            collections.
290    
291    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
292    
293            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
294            append_meta and remove_meta.
295    
296            * R/textdoccol.R: Removed modify_metadata method.
297    
298            * R/textrepo.R: Removed modify_metadata method.
299    
300            * R/textdoccol.R (remove_meta): Supports removal of document
301            collection metadata and document (= in data frame) metadata.
302    
303    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
304    
305            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
306    
307            * data/crude.rda: Rebuilt.
308    
309            * data/acq.rda: Rebuilt.
310    
311            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
312    
313            * R/textdoccol.R ([): Bug fix for subsetting a document
314            collection's data frame.
315    
316    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
317    
318            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
319            to s_filter.
320    
321            * R/textdoccol.R: Local text documents' metadata can now be copied
322            to a document collection's data frame with prescind_meta.
323    
324    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
325    
326            * R/: Text documents' slot metadata is now accessible in s_filter.
327    
328            * R/: Rewrote s_filter function (has still some restrictions).
329    
330    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
331    
332            * R/: Various fixes in handling metadata.
333    
334            * R/: Added update mechanism for text document collections.
335    
336    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
337    
338            * R/: Merging of document collections now creates a binary tree
339            for reconstructing merged document collections.
340    
341            * R/: Redesign of metadata for document collections.
342    
343    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
344    
345            * R/: Messages now use \code{ngettext}.
346    
347    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
348    
349            * R/: Added functions for modifying and removing metadata.
350    
351    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
352    
353            * man/: Updated some documentation.
354    
355            * R/: Corrected some connection issues.
356    
357            * inst/doc: Worked on the vignette.
358    
359    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
360    
361            * inst/: Added texts and started vignette.
362    
363            * R/: Final changes based upon David's comments.
364    
365    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
366    
367            * NAMESPACE: Corrected exports (generic methods need exportMethods
368            directives!).
369    
370    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
371    
372            * R/: Modified the TextDocCol constructur and various parsers. It
373            is now modular and supports various file formats via plugins (see
374            the new "Source" class).
375    
376    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
377    
378            * man/: Revised documentation after previous code changes.
379    
380    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
381    
382            * R/: Remaining changes as discussed with David.
383    
384    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
385    
386            * R/: Some changes as suggested by David. The rest will follow
387            within the next days.
388    
389    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
390    
391            * man/: Finished documentation.
392    
393    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
394    
395            * man/: Wrote some documentation.
396    
397    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
398    
399            * R/: Further syntactic sugar in form of additional assignment and
400            accessor methods.
401    
402    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
403    
404            * R/: Syntactic sugar in form of "length", "show" and "summary"
405            operators.
406    
407    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
408    
409            * R/: Diverse updates. Mainly on default operators ("[" or "c")
410            and dissimilarities.
411    
412    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
413    
414            * R/: Added similarity functions.
415    
416            * data/: Added english stopwords.
417    
418    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
419    
420            * data/: Examples compiled for new features
421    
422            * R/: Changes due to new structure.
423    
424            * NAMESPACE: Corrected namespace to reflect new structure.
425    
426            * R/termdocmatrix.R: Adapted for new naming scheme.
427    
428    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
429    
430            * R/textdoccol.R: Adapted code for new class structure. Wrote
431            several transform and filter functions operating on text document
432            collections (alias text document databases).
433    
434            * R/aobjects.R: Adapted class structure with inheritance,
435            repositories and additional meta data. Loading files on demand is
436            now possible.
437    
438    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
439    
440            * R/: Some cosmetic cleanups.
441    
442            * inst/: Removed vignette on clustering. That and much more is now
443            described in the JSS paper on text mining. Based upon that
444            article an elaborated vignette will be incorporated in the future.
445    
446    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
447    
448            * R/: Updated generic S4 methods to comply with signature changes
449            in newer versions of R (> 2.3)
450    
451    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
452    
453            * ext/R/importRIS.R: Automatic RIS import is now possible.
454    
455    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
456    
457            * R/textdoccol.R: Added RIS HTML input format.
458    
459    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
460    
461            * R/textdoccol.R: Removed bug that caused invalid text document
462            collections when handling many input files.
463    
464    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
465    
466            * R/textdoccol.R: Restructured and extended file import
467            mechanism.
468    
469            * inst/doc/clustering.Rnw: Adapted vignette for use with
470            ReutNews.rda
471    
472            * man/ReutNews.Rd: Documentation for ReutNews.rda
473    
474            * data/ReutNews.rda: A tiny Reuters21578 example data set.
475    
476    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
477    
478            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
479            clustering facilities of this package.
480    
481    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * R/aobjects.R: Changed package document structure to avoid class
484            dependency problems.
485    
486    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
487    
488            * Wrote a script for the ModLewis Split for the Reuters-21578 XML
489            data set.
490    
491            * Finished documentation and reordered directory structure. Now "R
492            CMD check textmin" works without errors.
493    
494    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * src/: Various splits can now be easily created for the
497            Reuters21578 data set.
498    
499  2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
500    
501          * Updated documentation          * Updated documentation

Legend:
Removed from v.26  
changed lines
  Added in v.767

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge