SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 27, Sun Dec 4 15:30:18 2005 UTC trunk/tm/ChangeLog revision 779, Tue Sep 11 05:52:39 2007 UTC
# Line 1  Line 1 
1    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
4    
5    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
6    
7            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
8    
9    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
10    
11            * R/reader.R (readPDF): Removed manual checks for pdftotext and
12            pdfinfo. The system call gives a warning anyway.
13    
14    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
15    
16            * R/textdoccol.R (asPlain): Conversion from
17            StructuredTextDocuments to PlainTextDocuments.
18    
19    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
20    
21            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
22            for accessing term-document matrices.
23    
24            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
25            are installed.
26    
27    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
28    
29            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
30            Christian Buchta.
31    
32    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
33    
34            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
35    
36    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
37    
38            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
39    
40            * R/reader.R (readPDF): Added PDF reader.
41    
42    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
43    
44            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
45    
46            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
47    
48            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
49    
50            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
51    
52    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
53    
54            * R/distmeasure.R (dissimilarity): Replaced dists call from
55            package cba by new dist call from package proxy.
56    
57    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
58    
59            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
60    
61    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
62    
63            * R/termdocmatrix.R: require() uses the quietly option to suppress
64            loading messages.
65    
66    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
67    
68            * R/dictionary.R: Added dictionary support.
69    
70    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
71    
72            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
73            documents. This simplifies some functions, e.g., asPlain.
74    
75    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
76    
77            * inst/doc/tm.Rnw: Fixed some typos in vignette.
78    
79    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
80    
81            * R/textdoccol.R (replaceWords): Added method to replace a set of
82            words by a single word. Useful for synonyms.
83    
84    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
85    
86            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
87    
88    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
89    
90            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
91            vectors. Thanks to Ariel Maguyon for his error report.
92            (removeSparseTerms): New function to remove columns from a
93            term-document matrix exceeding a sparse factor.
94    
95    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
96    
97            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
98    
99    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
100    
101            * man/sFilter.Rd: Corrected documentation on statement format (use
102            '==' instead of '=').
103    
104    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
105    
106            * R/aobjects.R (StructuredTextDocument): Inherits from
107            TextDocument.
108    
109    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
110    
111            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
112            on sparse matrices as proposed by Martin Maechler.
113    
114    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
115    
116            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
117            \pkg{filehash} version makes them deprecated.
118    
119    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
120    
121            * R/termdocmatrix.R (textvector): Stemming is now performed before
122            erasing stopwords.
123            (weightMatrix): Adapted to handle sparse matrices.
124            (TermDocMatrix): Sparse matrix is now efficiently built by
125            direct stepwise insertion of row values into it.
126    
127    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
128    
129            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
130            due to ongoing problems. For our purposes the latter is as useful
131            as the replaced package.
132    
133    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
134    
135            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
136    
137            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
138    
139    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
140    
141            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
142            languages with available stopwords.
143    
144    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
145    
146            * inst/doc/tm.Rnw: Minor corrections in the vignette.
147    
148    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
149    
150            * DESCRIPTION: Update to version 0.2, since a lot of new features
151            have been integrated.
152    
153            * inst/stopwords: Updated existing stopwords and added stopwords
154            for various other languages.
155    
156    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
157    
158            * man/: Updated documentation.
159    
160            * Work/testDb.R: Script to test database stuff.
161    
162            * R/: Fixed various database related bugs. Seems to be rather
163            useable now, i.e., consider as alpha status for now.
164    
165    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
166    
167            * R/: Fixed some bugs related to database support.
168    
169    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
170    
171            * man/: Added a lot of examples to the manuals.
172    
173    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
174    
175            * man/: Updated parts of the documentation.
176    
177            * R/textdoccol.R (asPlain): Added conversion from newsgroup
178            documents to plain text documents.
179    
180    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
181    
182            * R/textdoccol.R: Finished experimental database support. Not yet
183            intensively tested.
184    
185            * R/source.R: Now each source has a default reader.
186    
187            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
188            class anymore.
189    
190            * R/plaintextdoc.R: Custom show method for plain text documents.
191    
192            * R/aobjects.R: Added a class for structured text documents.
193    
194            * R/reader.R: Replaced remaining \code{parser} occurrences with
195            \code{reader}.
196    
197            * R/textdoccol.R (summary): Indent tags.
198    
199            * R/textdoccol.R (removePunctuation): Transform method to remove
200            punctuation marks.
201    
202    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
203    
204            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
205            using prescindMeta().
206    
207    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
208    
209            * R/textdoccol.R: Improved database support.
210    
211    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
212    
213            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
214    
215            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
216            language code.
217    
218            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
219            into parserControl argument.
220    
221            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
222    
223    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
224    
225            * Work/tmDataSetup.R: The datasets acq and crude can now be
226            created on the fly.
227    
228            * R/stopwords.R: Introduced a function returning the stopwords for
229            a given language (English, German and French at the moment)
230    
231            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
232            otherwise falls back to Snowball package.
233    
234    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
235    
236            * man/dissimilarity-methods.Rd: Make clear that any method offered
237            by "dists" from package "cba" can be used.
238    
239    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
240    
241            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
242            to Kurt's latex suggestion. Removed points and underscores in
243            variable names for consistent naming.
244    
245            * DESCRIPTION: Update to version 0.1-2.
246    
247            * man/TextRepository.Rd: Fixed bug in documentation.
248    
249    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
250    
251            * DESCRIPTION: Update to version 0.1-1.
252    
253    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
254    
255            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
256            wordStem.
257    
258    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
259    
260            * R/: Changes due to Kurt's review.
261    
262    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
263    
264            * R/: Implemented improvements based upon comments by David
265            Meyer.
266    
267    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
268    
269            * inst/doc/: Rewrote vignette.
270    
271            * man/: Improved documentation.
272    
273    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
274    
275            * man/: Updated documentation.
276    
277            * DESCRIPTION: Changed package name to "tm". Updated version to
278            0.1 for first CRAN release.
279    
280            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
281            list archive example.
282    
283            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
284            archive example.
285    
286            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
287            from (several mails per box) mbox format to (single mail per file)
288            eml format.
289    
290    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
291    
292            * data/crude.rda: Rebuilt.
293    
294            * data/acq.rda: Rebuilt.
295    
296            * R/reader.R: Factored out reader and parser methods from
297            textdoccol.R.
298    
299            * R/source.R: Factored out Source methods from aobjects.R and
300            textdoccol.R.
301            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
302            feeds.
303    
304            * R/textdoccol.R (DirSource): Added support for recursive
305            traversal of directories.
306    
307    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
308    
309            * R/textdoccol.R ([[): Loads the document corpus automatically
310            into memory upon access.
311            (tm_transform, tm_filter): Removed several checks whether the
312            document is already loaded ([[ ensures this now).
313            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
314            mailing list archive.
315    
316    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
317    
318            * R/aobjects.R (TextDocument): Is now a virtual class.
319            (Source): Is now a virtual class.
320    
321    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
322    
323            * R/textdoccol.R (c): Support for an arbitrary number of document
324            collections.
325    
326    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
327    
328            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
329            append_meta and remove_meta.
330    
331            * R/textdoccol.R: Removed modify_metadata method.
332    
333            * R/textrepo.R: Removed modify_metadata method.
334    
335            * R/textdoccol.R (remove_meta): Supports removal of document
336            collection metadata and document (= in data frame) metadata.
337    
338    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
339    
340            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
341    
342            * data/crude.rda: Rebuilt.
343    
344            * data/acq.rda: Rebuilt.
345    
346            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
347    
348            * R/textdoccol.R ([): Bug fix for subsetting a document
349            collection's data frame.
350    
351    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
352    
353            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
354            to s_filter.
355    
356            * R/textdoccol.R: Local text documents' metadata can now be copied
357            to a document collection's data frame with prescind_meta.
358    
359    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
360    
361            * R/: Text documents' slot metadata is now accessible in s_filter.
362    
363            * R/: Rewrote s_filter function (has still some restrictions).
364    
365    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
366    
367            * R/: Various fixes in handling metadata.
368    
369            * R/: Added update mechanism for text document collections.
370    
371    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
372    
373            * R/: Merging of document collections now creates a binary tree
374            for reconstructing merged document collections.
375    
376            * R/: Redesign of metadata for document collections.
377    
378    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
379    
380            * R/: Messages now use \code{ngettext}.
381    
382    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
383    
384            * R/: Added functions for modifying and removing metadata.
385    
386    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
387    
388            * man/: Updated some documentation.
389    
390            * R/: Corrected some connection issues.
391    
392            * inst/doc: Worked on the vignette.
393    
394    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
395    
396            * inst/: Added texts and started vignette.
397    
398            * R/: Final changes based upon David's comments.
399    
400    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
401    
402            * NAMESPACE: Corrected exports (generic methods need exportMethods
403            directives!).
404    
405    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
406    
407            * R/: Modified the TextDocCol constructur and various parsers. It
408            is now modular and supports various file formats via plugins (see
409            the new "Source" class).
410    
411    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * man/: Revised documentation after previous code changes.
414    
415    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
416    
417            * R/: Remaining changes as discussed with David.
418    
419    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
420    
421            * R/: Some changes as suggested by David. The rest will follow
422            within the next days.
423    
424    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
425    
426            * man/: Finished documentation.
427    
428    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
429    
430            * man/: Wrote some documentation.
431    
432    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
433    
434            * R/: Further syntactic sugar in form of additional assignment and
435            accessor methods.
436    
437    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
438    
439            * R/: Syntactic sugar in form of "length", "show" and "summary"
440            operators.
441    
442    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
443    
444            * R/: Diverse updates. Mainly on default operators ("[" or "c")
445            and dissimilarities.
446    
447    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
448    
449            * R/: Added similarity functions.
450    
451            * data/: Added english stopwords.
452    
453    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * data/: Examples compiled for new features
456    
457            * R/: Changes due to new structure.
458    
459            * NAMESPACE: Corrected namespace to reflect new structure.
460    
461            * R/termdocmatrix.R: Adapted for new naming scheme.
462    
463    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/textdoccol.R: Adapted code for new class structure. Wrote
466            several transform and filter functions operating on text document
467            collections (alias text document databases).
468    
469            * R/aobjects.R: Adapted class structure with inheritance,
470            repositories and additional meta data. Loading files on demand is
471            now possible.
472    
473    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/: Some cosmetic cleanups.
476    
477            * inst/: Removed vignette on clustering. That and much more is now
478            described in the JSS paper on text mining. Based upon that
479            article an elaborated vignette will be incorporated in the future.
480    
481    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * R/: Updated generic S4 methods to comply with signature changes
484            in newer versions of R (> 2.3)
485    
486    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
487    
488            * ext/R/importRIS.R: Automatic RIS import is now possible.
489    
490    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
491    
492            * R/textdoccol.R: Added RIS HTML input format.
493    
494    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * R/textdoccol.R: Removed bug that caused invalid text document
497            collections when handling many input files.
498    
499    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
500    
501            * R/textdoccol.R: Restructured and extended file import
502            mechanism.
503    
504            * inst/doc/clustering.Rnw: Adapted vignette for use with
505            ReutNews.rda
506    
507            * man/ReutNews.Rd: Documentation for ReutNews.rda
508    
509            * data/ReutNews.rda: A tiny Reuters21578 example data set.
510    
511    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
512    
513            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
514            clustering facilities of this package.
515    
516    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/aobjects.R: Changed package document structure to avoid class
519            dependency problems.
520    
521    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * Wrote a script for the ModLewis Split for the Reuters-21578 XML
524            data set.
525    
526            * Finished documentation and reordered directory structure. Now "R
527            CMD check textmin" works without errors.
528    
529  2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
530    
531          * src/: Various splits can now be easily created for the          * src/: Various splits can now be easily created for the

Legend:
Removed from v.27  
changed lines
  Added in v.779

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge