SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 37, Wed Jan 11 17:49:17 2006 UTC trunk/tm/ChangeLog revision 800, Thu Nov 29 13:36:31 2007 UTC
# Line 1  Line 1 
1    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/textdoccol.R (TextDocCol): Better handling of empty
4            arguments.
5    
6            * NAMESPACE: Exported readDOC.
7    
8            * man/completeStems.Rd: Added an example.
9    
10    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
11    
12            * R/stopwords.R (stopwords): Look up .dat files at every
13            call. Allows users to modify stopword .dat files interactively.
14    
15    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
16    
17            * R/termdocmatrix.R (termFreq): Correct processing of empty
18            documents.
19    
20    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
21    
22            * man/: Updated documentation.
23    
24    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
25    
26            * R/complete.R (completeStems): Completes (heuristically) word
27            stems.
28    
29            * R/termdocmatrix.R (TermDocMatrix2): New modular
30            constructor.
31    
32            * NAMESPACE: Exported termFreq.
33    
34    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
35    
36            * R/reader.R (readDOC): Added MS Word reader (using antiword).
37    
38    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
39    
40            * R/weight.R: Weighting functions for TermDocMatrix.
41    
42    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
43    
44            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
45            functions for accessing dimension, column, and row names.
46    
47            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
48    
49    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
50    
51            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
52    
53    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
54    
55            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
56    
57    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
58    
59            * R/reader.R (readPDF): Removed manual checks for pdftotext and
60            pdfinfo. The system call gives a warning anyway.
61    
62    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
63    
64            * R/textdoccol.R (asPlain): Conversion from
65            StructuredTextDocuments to PlainTextDocuments.
66    
67    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
68    
69            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
70            for accessing term-document matrices.
71    
72            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
73            are installed.
74    
75    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
76    
77            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
78            Christian Buchta.
79    
80    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
81    
82            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
83    
84    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
85    
86            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
87    
88            * R/reader.R (readPDF): Added PDF reader.
89    
90    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
91    
92            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
93    
94            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
95    
96            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
97    
98            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
99    
100    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
101    
102            * R/distmeasure.R (dissimilarity): Replaced dists call from
103            package cba by new dist call from package proxy.
104    
105    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
106    
107            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
108    
109    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
110    
111            * R/termdocmatrix.R: require() uses the quietly option to suppress
112            loading messages.
113    
114    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
115    
116            * R/dictionary.R: Added dictionary support.
117    
118    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
119    
120            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
121            documents. This simplifies some functions, e.g., asPlain.
122    
123    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
124    
125            * inst/doc/tm.Rnw: Fixed some typos in vignette.
126    
127    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
128    
129            * R/textdoccol.R (replaceWords): Added method to replace a set of
130            words by a single word. Useful for synonyms.
131    
132    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
133    
134            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
135    
136    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
137    
138            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
139            vectors. Thanks to Ariel Maguyon for his error report.
140            (removeSparseTerms): New function to remove columns from a
141            term-document matrix exceeding a sparse factor.
142    
143    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
144    
145            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
146    
147    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
148    
149            * man/sFilter.Rd: Corrected documentation on statement format (use
150            '==' instead of '=').
151    
152    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
153    
154            * R/aobjects.R (StructuredTextDocument): Inherits from
155            TextDocument.
156    
157    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
158    
159            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
160            on sparse matrices as proposed by Martin Maechler.
161    
162    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
163    
164            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
165            \pkg{filehash} version makes them deprecated.
166    
167    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
168    
169            * R/termdocmatrix.R (textvector): Stemming is now performed before
170            erasing stopwords.
171            (weightMatrix): Adapted to handle sparse matrices.
172            (TermDocMatrix): Sparse matrix is now efficiently built by
173            direct stepwise insertion of row values into it.
174    
175    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
176    
177            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
178            due to ongoing problems. For our purposes the latter is as useful
179            as the replaced package.
180    
181    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
182    
183            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
184    
185            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
186    
187    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
188    
189            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
190            languages with available stopwords.
191    
192    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
193    
194            * inst/doc/tm.Rnw: Minor corrections in the vignette.
195    
196    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
197    
198            * DESCRIPTION: Update to version 0.2, since a lot of new features
199            have been integrated.
200    
201            * inst/stopwords: Updated existing stopwords and added stopwords
202            for various other languages.
203    
204    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
205    
206            * man/: Updated documentation.
207    
208            * Work/testDb.R: Script to test database stuff.
209    
210            * R/: Fixed various database related bugs. Seems to be rather
211            useable now, i.e., consider as alpha status for now.
212    
213    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
214    
215            * R/: Fixed some bugs related to database support.
216    
217    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
218    
219            * man/: Added a lot of examples to the manuals.
220    
221    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
222    
223            * man/: Updated parts of the documentation.
224    
225            * R/textdoccol.R (asPlain): Added conversion from newsgroup
226            documents to plain text documents.
227    
228    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
229    
230            * R/textdoccol.R: Finished experimental database support. Not yet
231            intensively tested.
232    
233            * R/source.R: Now each source has a default reader.
234    
235            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
236            class anymore.
237    
238            * R/plaintextdoc.R: Custom show method for plain text documents.
239    
240            * R/aobjects.R: Added a class for structured text documents.
241    
242            * R/reader.R: Replaced remaining \code{parser} occurrences with
243            \code{reader}.
244    
245            * R/textdoccol.R (summary): Indent tags.
246    
247            * R/textdoccol.R (removePunctuation): Transform method to remove
248            punctuation marks.
249    
250    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
251    
252            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
253            using prescindMeta().
254    
255    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
256    
257            * R/textdoccol.R: Improved database support.
258    
259    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
260    
261            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
262    
263            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
264            language code.
265    
266            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
267            into parserControl argument.
268    
269            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
270    
271    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
272    
273            * Work/tmDataSetup.R: The datasets acq and crude can now be
274            created on the fly.
275    
276            * R/stopwords.R: Introduced a function returning the stopwords for
277            a given language (English, German and French at the moment)
278    
279            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
280            otherwise falls back to Snowball package.
281    
282    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
283    
284            * man/dissimilarity-methods.Rd: Make clear that any method offered
285            by "dists" from package "cba" can be used.
286    
287    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
288    
289            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
290            to Kurt's latex suggestion. Removed points and underscores in
291            variable names for consistent naming.
292    
293            * DESCRIPTION: Update to version 0.1-2.
294    
295            * man/TextRepository.Rd: Fixed bug in documentation.
296    
297    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
298    
299            * DESCRIPTION: Update to version 0.1-1.
300    
301    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
302    
303            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
304            wordStem.
305    
306    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
307    
308            * R/: Changes due to Kurt's review.
309    
310    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
311    
312            * R/: Implemented improvements based upon comments by David
313            Meyer.
314    
315    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
316    
317            * inst/doc/: Rewrote vignette.
318    
319            * man/: Improved documentation.
320    
321    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
322    
323            * man/: Updated documentation.
324    
325            * DESCRIPTION: Changed package name to "tm". Updated version to
326            0.1 for first CRAN release.
327    
328            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
329            list archive example.
330    
331            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
332            archive example.
333    
334            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
335            from (several mails per box) mbox format to (single mail per file)
336            eml format.
337    
338    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
339    
340            * data/crude.rda: Rebuilt.
341    
342            * data/acq.rda: Rebuilt.
343    
344            * R/reader.R: Factored out reader and parser methods from
345            textdoccol.R.
346    
347            * R/source.R: Factored out Source methods from aobjects.R and
348            textdoccol.R.
349            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
350            feeds.
351    
352            * R/textdoccol.R (DirSource): Added support for recursive
353            traversal of directories.
354    
355    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
356    
357            * R/textdoccol.R ([[): Loads the document corpus automatically
358            into memory upon access.
359            (tm_transform, tm_filter): Removed several checks whether the
360            document is already loaded ([[ ensures this now).
361            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
362            mailing list archive.
363    
364    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
365    
366            * R/aobjects.R (TextDocument): Is now a virtual class.
367            (Source): Is now a virtual class.
368    
369    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
370    
371            * R/textdoccol.R (c): Support for an arbitrary number of document
372            collections.
373    
374    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
375    
376            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
377            append_meta and remove_meta.
378    
379            * R/textdoccol.R: Removed modify_metadata method.
380    
381            * R/textrepo.R: Removed modify_metadata method.
382    
383            * R/textdoccol.R (remove_meta): Supports removal of document
384            collection metadata and document (= in data frame) metadata.
385    
386    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
387    
388            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
389    
390            * data/crude.rda: Rebuilt.
391    
392            * data/acq.rda: Rebuilt.
393    
394            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
395    
396            * R/textdoccol.R ([): Bug fix for subsetting a document
397            collection's data frame.
398    
399    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
400    
401            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
402            to s_filter.
403    
404            * R/textdoccol.R: Local text documents' metadata can now be copied
405            to a document collection's data frame with prescind_meta.
406    
407    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
408    
409            * R/: Text documents' slot metadata is now accessible in s_filter.
410    
411            * R/: Rewrote s_filter function (has still some restrictions).
412    
413    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
414    
415            * R/: Various fixes in handling metadata.
416    
417            * R/: Added update mechanism for text document collections.
418    
419    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
420    
421            * R/: Merging of document collections now creates a binary tree
422            for reconstructing merged document collections.
423    
424            * R/: Redesign of metadata for document collections.
425    
426    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
427    
428            * R/: Messages now use \code{ngettext}.
429    
430    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
431    
432            * R/: Added functions for modifying and removing metadata.
433    
434    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * man/: Updated some documentation.
437    
438            * R/: Corrected some connection issues.
439    
440            * inst/doc: Worked on the vignette.
441    
442    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
443    
444            * inst/: Added texts and started vignette.
445    
446            * R/: Final changes based upon David's comments.
447    
448    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
449    
450            * NAMESPACE: Corrected exports (generic methods need exportMethods
451            directives!).
452    
453    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/: Modified the TextDocCol constructur and various parsers. It
456            is now modular and supports various file formats via plugins (see
457            the new "Source" class).
458    
459    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
460    
461            * man/: Revised documentation after previous code changes.
462    
463    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/: Remaining changes as discussed with David.
466    
467    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
468    
469            * R/: Some changes as suggested by David. The rest will follow
470            within the next days.
471    
472    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
473    
474            * man/: Finished documentation.
475    
476    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
477    
478            * man/: Wrote some documentation.
479    
480    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
481    
482            * R/: Further syntactic sugar in form of additional assignment and
483            accessor methods.
484    
485    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
486    
487            * R/: Syntactic sugar in form of "length", "show" and "summary"
488            operators.
489    
490    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
491    
492            * R/: Diverse updates. Mainly on default operators ("[" or "c")
493            and dissimilarities.
494    
495    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497            * R/: Added similarity functions.
498    
499            * data/: Added english stopwords.
500    
501    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
502    
503            * data/: Examples compiled for new features
504    
505            * R/: Changes due to new structure.
506    
507            * NAMESPACE: Corrected namespace to reflect new structure.
508    
509            * R/termdocmatrix.R: Adapted for new naming scheme.
510    
511    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
512    
513            * R/textdoccol.R: Adapted code for new class structure. Wrote
514            several transform and filter functions operating on text document
515            collections (alias text document databases).
516    
517            * R/aobjects.R: Adapted class structure with inheritance,
518            repositories and additional meta data. Loading files on demand is
519            now possible.
520    
521    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/: Some cosmetic cleanups.
524    
525            * inst/: Removed vignette on clustering. That and much more is now
526            described in the JSS paper on text mining. Based upon that
527            article an elaborated vignette will be incorporated in the future.
528    
529    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
530    
531            * R/: Updated generic S4 methods to comply with signature changes
532            in newer versions of R (> 2.3)
533    
534    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
535    
536            * ext/R/importRIS.R: Automatic RIS import is now possible.
537    
538    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
539    
540            * R/textdoccol.R: Added RIS HTML input format.
541    
542    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
543    
544            * R/textdoccol.R: Removed bug that caused invalid text document
545            collections when handling many input files.
546    
547  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
548    
549          * R/textdoccol.R: Restructured and extended file import          * R/textdoccol.R: Restructured and extended file import

Legend:
Removed from v.37  
changed lines
  Added in v.800

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge