SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 20, Tue Nov 8 16:40:52 2005 UTC trunk/tm/ChangeLog revision 788, Sun Oct 14 12:16:26 2007 UTC
# Line 1  Line 1 
1    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/weight.R: Weighting functions for TermDocMatrix.
4    
5    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
6    
7            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
8            functions for accessing dimension, column, and row names.
9    
10            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
11    
12    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
13    
14            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
15    
16    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
17    
18            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
19    
20    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
21    
22            * R/reader.R (readPDF): Removed manual checks for pdftotext and
23            pdfinfo. The system call gives a warning anyway.
24    
25    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
26    
27            * R/textdoccol.R (asPlain): Conversion from
28            StructuredTextDocuments to PlainTextDocuments.
29    
30    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
31    
32            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
33            for accessing term-document matrices.
34    
35            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
36            are installed.
37    
38    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
39    
40            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
41            Christian Buchta.
42    
43    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
44    
45            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
46    
47    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
48    
49            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
50    
51            * R/reader.R (readPDF): Added PDF reader.
52    
53    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
54    
55            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
56    
57            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
58    
59            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
60    
61            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
62    
63    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
64    
65            * R/distmeasure.R (dissimilarity): Replaced dists call from
66            package cba by new dist call from package proxy.
67    
68    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
69    
70            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
71    
72    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
73    
74            * R/termdocmatrix.R: require() uses the quietly option to suppress
75            loading messages.
76    
77    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
78    
79            * R/dictionary.R: Added dictionary support.
80    
81    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
82    
83            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
84            documents. This simplifies some functions, e.g., asPlain.
85    
86    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
87    
88            * inst/doc/tm.Rnw: Fixed some typos in vignette.
89    
90    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
91    
92            * R/textdoccol.R (replaceWords): Added method to replace a set of
93            words by a single word. Useful for synonyms.
94    
95    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
96    
97            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
98    
99    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
100    
101            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
102            vectors. Thanks to Ariel Maguyon for his error report.
103            (removeSparseTerms): New function to remove columns from a
104            term-document matrix exceeding a sparse factor.
105    
106    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
107    
108            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
109    
110    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
111    
112            * man/sFilter.Rd: Corrected documentation on statement format (use
113            '==' instead of '=').
114    
115    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
116    
117            * R/aobjects.R (StructuredTextDocument): Inherits from
118            TextDocument.
119    
120    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
121    
122            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
123            on sparse matrices as proposed by Martin Maechler.
124    
125    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
126    
127            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
128            \pkg{filehash} version makes them deprecated.
129    
130    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
131    
132            * R/termdocmatrix.R (textvector): Stemming is now performed before
133            erasing stopwords.
134            (weightMatrix): Adapted to handle sparse matrices.
135            (TermDocMatrix): Sparse matrix is now efficiently built by
136            direct stepwise insertion of row values into it.
137    
138    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
139    
140            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
141            due to ongoing problems. For our purposes the latter is as useful
142            as the replaced package.
143    
144    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
145    
146            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
147    
148            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
149    
150    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
151    
152            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
153            languages with available stopwords.
154    
155    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
156    
157            * inst/doc/tm.Rnw: Minor corrections in the vignette.
158    
159    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
160    
161            * DESCRIPTION: Update to version 0.2, since a lot of new features
162            have been integrated.
163    
164            * inst/stopwords: Updated existing stopwords and added stopwords
165            for various other languages.
166    
167    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
168    
169            * man/: Updated documentation.
170    
171            * Work/testDb.R: Script to test database stuff.
172    
173            * R/: Fixed various database related bugs. Seems to be rather
174            useable now, i.e., consider as alpha status for now.
175    
176    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
177    
178            * R/: Fixed some bugs related to database support.
179    
180    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
181    
182            * man/: Added a lot of examples to the manuals.
183    
184    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
185    
186            * man/: Updated parts of the documentation.
187    
188            * R/textdoccol.R (asPlain): Added conversion from newsgroup
189            documents to plain text documents.
190    
191    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
192    
193            * R/textdoccol.R: Finished experimental database support. Not yet
194            intensively tested.
195    
196            * R/source.R: Now each source has a default reader.
197    
198            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
199            class anymore.
200    
201            * R/plaintextdoc.R: Custom show method for plain text documents.
202    
203            * R/aobjects.R: Added a class for structured text documents.
204    
205            * R/reader.R: Replaced remaining \code{parser} occurrences with
206            \code{reader}.
207    
208            * R/textdoccol.R (summary): Indent tags.
209    
210            * R/textdoccol.R (removePunctuation): Transform method to remove
211            punctuation marks.
212    
213    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
214    
215            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
216            using prescindMeta().
217    
218    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
219    
220            * R/textdoccol.R: Improved database support.
221    
222    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
223    
224            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
225    
226            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
227            language code.
228    
229            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
230            into parserControl argument.
231    
232            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
233    
234    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
235    
236            * Work/tmDataSetup.R: The datasets acq and crude can now be
237            created on the fly.
238    
239            * R/stopwords.R: Introduced a function returning the stopwords for
240            a given language (English, German and French at the moment)
241    
242            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
243            otherwise falls back to Snowball package.
244    
245    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
246    
247            * man/dissimilarity-methods.Rd: Make clear that any method offered
248            by "dists" from package "cba" can be used.
249    
250    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
251    
252            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
253            to Kurt's latex suggestion. Removed points and underscores in
254            variable names for consistent naming.
255    
256            * DESCRIPTION: Update to version 0.1-2.
257    
258            * man/TextRepository.Rd: Fixed bug in documentation.
259    
260    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
261    
262            * DESCRIPTION: Update to version 0.1-1.
263    
264    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
265    
266            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
267            wordStem.
268    
269    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
270    
271            * R/: Changes due to Kurt's review.
272    
273    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
274    
275            * R/: Implemented improvements based upon comments by David
276            Meyer.
277    
278    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
279    
280            * inst/doc/: Rewrote vignette.
281    
282            * man/: Improved documentation.
283    
284    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
285    
286            * man/: Updated documentation.
287    
288            * DESCRIPTION: Changed package name to "tm". Updated version to
289            0.1 for first CRAN release.
290    
291            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
292            list archive example.
293    
294            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
295            archive example.
296    
297            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
298            from (several mails per box) mbox format to (single mail per file)
299            eml format.
300    
301    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
302    
303            * data/crude.rda: Rebuilt.
304    
305            * data/acq.rda: Rebuilt.
306    
307            * R/reader.R: Factored out reader and parser methods from
308            textdoccol.R.
309    
310            * R/source.R: Factored out Source methods from aobjects.R and
311            textdoccol.R.
312            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
313            feeds.
314    
315            * R/textdoccol.R (DirSource): Added support for recursive
316            traversal of directories.
317    
318    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
319    
320            * R/textdoccol.R ([[): Loads the document corpus automatically
321            into memory upon access.
322            (tm_transform, tm_filter): Removed several checks whether the
323            document is already loaded ([[ ensures this now).
324            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
325            mailing list archive.
326    
327    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
328    
329            * R/aobjects.R (TextDocument): Is now a virtual class.
330            (Source): Is now a virtual class.
331    
332    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
333    
334            * R/textdoccol.R (c): Support for an arbitrary number of document
335            collections.
336    
337    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
338    
339            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
340            append_meta and remove_meta.
341    
342            * R/textdoccol.R: Removed modify_metadata method.
343    
344            * R/textrepo.R: Removed modify_metadata method.
345    
346            * R/textdoccol.R (remove_meta): Supports removal of document
347            collection metadata and document (= in data frame) metadata.
348    
349    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
350    
351            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
352    
353            * data/crude.rda: Rebuilt.
354    
355            * data/acq.rda: Rebuilt.
356    
357            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
358    
359            * R/textdoccol.R ([): Bug fix for subsetting a document
360            collection's data frame.
361    
362    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
363    
364            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
365            to s_filter.
366    
367            * R/textdoccol.R: Local text documents' metadata can now be copied
368            to a document collection's data frame with prescind_meta.
369    
370    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
371    
372            * R/: Text documents' slot metadata is now accessible in s_filter.
373    
374            * R/: Rewrote s_filter function (has still some restrictions).
375    
376    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
377    
378            * R/: Various fixes in handling metadata.
379    
380            * R/: Added update mechanism for text document collections.
381    
382    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
383    
384            * R/: Merging of document collections now creates a binary tree
385            for reconstructing merged document collections.
386    
387            * R/: Redesign of metadata for document collections.
388    
389    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
390    
391            * R/: Messages now use \code{ngettext}.
392    
393    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
394    
395            * R/: Added functions for modifying and removing metadata.
396    
397    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
398    
399            * man/: Updated some documentation.
400    
401            * R/: Corrected some connection issues.
402    
403            * inst/doc: Worked on the vignette.
404    
405    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
406    
407            * inst/: Added texts and started vignette.
408    
409            * R/: Final changes based upon David's comments.
410    
411    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * NAMESPACE: Corrected exports (generic methods need exportMethods
414            directives!).
415    
416    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * R/: Modified the TextDocCol constructur and various parsers. It
419            is now modular and supports various file formats via plugins (see
420            the new "Source" class).
421    
422    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
423    
424            * man/: Revised documentation after previous code changes.
425    
426    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
427    
428            * R/: Remaining changes as discussed with David.
429    
430    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
431    
432            * R/: Some changes as suggested by David. The rest will follow
433            within the next days.
434    
435    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
436    
437            * man/: Finished documentation.
438    
439    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * man/: Wrote some documentation.
442    
443    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
444    
445            * R/: Further syntactic sugar in form of additional assignment and
446            accessor methods.
447    
448    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
449    
450            * R/: Syntactic sugar in form of "length", "show" and "summary"
451            operators.
452    
453    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/: Diverse updates. Mainly on default operators ("[" or "c")
456            and dissimilarities.
457    
458    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/: Added similarity functions.
461    
462            * data/: Added english stopwords.
463    
464    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
465    
466            * data/: Examples compiled for new features
467    
468            * R/: Changes due to new structure.
469    
470            * NAMESPACE: Corrected namespace to reflect new structure.
471    
472            * R/termdocmatrix.R: Adapted for new naming scheme.
473    
474    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
475    
476            * R/textdoccol.R: Adapted code for new class structure. Wrote
477            several transform and filter functions operating on text document
478            collections (alias text document databases).
479    
480            * R/aobjects.R: Adapted class structure with inheritance,
481            repositories and additional meta data. Loading files on demand is
482            now possible.
483    
484    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
485    
486            * R/: Some cosmetic cleanups.
487    
488            * inst/: Removed vignette on clustering. That and much more is now
489            described in the JSS paper on text mining. Based upon that
490            article an elaborated vignette will be incorporated in the future.
491    
492    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * R/: Updated generic S4 methods to comply with signature changes
495            in newer versions of R (> 2.3)
496    
497    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
498    
499            * ext/R/importRIS.R: Automatic RIS import is now possible.
500    
501    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
502    
503            * R/textdoccol.R: Added RIS HTML input format.
504    
505    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
506    
507            * R/textdoccol.R: Removed bug that caused invalid text document
508            collections when handling many input files.
509    
510    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
511    
512            * R/textdoccol.R: Restructured and extended file import
513            mechanism.
514    
515            * inst/doc/clustering.Rnw: Adapted vignette for use with
516            ReutNews.rda
517    
518            * man/ReutNews.Rd: Documentation for ReutNews.rda
519    
520            * data/ReutNews.rda: A tiny Reuters21578 example data set.
521    
522    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
523    
524            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
525            clustering facilities of this package.
526    
527    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
528    
529            * R/aobjects.R: Changed package document structure to avoid class
530            dependency problems.
531    
532    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
533    
534            * Wrote a script for the ModLewis Split for the Reuters-21578 XML
535            data set.
536    
537            * Finished documentation and reordered directory structure. Now "R
538            CMD check textmin" works without errors.
539    
540    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
541    
542            * src/: Various splits can now be easily created for the
543            Reuters21578 data set.
544    
545    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
546    
547            * Updated documentation
548    
549    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
550    
551            * Wrote R documentation for some classes and methods.
552    
553    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
554    
555            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
556            files. See the questionnaire data/Umfrage.csv for such an example.
557            We are now able to import files in Reuters-21578 XML format.
558    
559            * Changed class interfaces in various files. Weighting of the text
560            matrix is now possible.
561    
562  2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
563    
564          * R/textdoccol.R: One can build term-document matrices if          * R/textdoccol.R: One can build term-document matrices if

Legend:
Removed from v.20  
changed lines
  Added in v.788

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge