SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 36, Wed Jan 11 15:42:56 2006 UTC trunk/tm/ChangeLog revision 813, Tue Jan 22 18:46:13 2008 UTC
# Line 1  Line 1 
1    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/meta.R (meta): New function for consistent access to meta data
4            of document collections, repositories, and texts.
5    
6    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
7    
8            * R/: Better support for encodings.
9    
10    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
11    
12            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
13            selection when no reader argument is given.
14    
15    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
16    
17            * R/source.R (CSVSource): Now uses read.csv instead of scan
18            internally.
19    
20    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
21    
22            * R/reader.R (getReaders): Returns available reader functions.
23    
24            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
25            as default.
26    
27    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
28    
29            * R/stopwords.R (stopwords): Shortened code, removed codetools
30            variable warnings.
31    
32            * man/: Documentation for showMeta, added an example for tmMap.
33    
34            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
35            some minor typos fixed.
36    
37    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
38    
39            * R/aobjects.R (showMeta): Added method for pretty printing a
40            text document's meta data.
41    
42    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
43    
44            * R/textdoccol.R (TextDocCol): Better handling of empty
45            arguments.
46    
47            * NAMESPACE: Exported readDOC.
48    
49            * man/completeStems.Rd: Added an example.
50    
51    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
52    
53            * R/stopwords.R (stopwords): Look up .dat files at every
54            call. Allows users to modify stopword .dat files interactively.
55    
56    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
57    
58            * R/termdocmatrix.R (termFreq): Correct processing of empty
59            documents.
60    
61    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
62    
63            * man/: Updated documentation.
64    
65    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
66    
67            * R/complete.R (completeStems): Completes (heuristically) word
68            stems.
69    
70            * R/termdocmatrix.R (TermDocMatrix2): New modular
71            constructor.
72    
73            * NAMESPACE: Exported termFreq.
74    
75    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
76    
77            * R/reader.R (readDOC): Added MS Word reader (using antiword).
78    
79    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
80    
81            * R/weight.R: Weighting functions for TermDocMatrix.
82    
83    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
84    
85            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
86            functions for accessing dimension, column, and row names.
87    
88            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
89    
90    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
91    
92            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
93    
94    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
95    
96            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
97    
98    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
99    
100            * R/reader.R (readPDF): Removed manual checks for pdftotext and
101            pdfinfo. The system call gives a warning anyway.
102    
103    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
104    
105            * R/textdoccol.R (asPlain): Conversion from
106            StructuredTextDocuments to PlainTextDocuments.
107    
108    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
109    
110            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
111            for accessing term-document matrices.
112    
113            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
114            are installed.
115    
116    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
117    
118            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
119            Christian Buchta.
120    
121    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
122    
123            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
124    
125    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
126    
127            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
128    
129            * R/reader.R (readPDF): Added PDF reader.
130    
131    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
132    
133            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
134    
135            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
136    
137            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
138    
139            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
140    
141    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
142    
143            * R/distmeasure.R (dissimilarity): Replaced dists call from
144            package cba by new dist call from package proxy.
145    
146    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
147    
148            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
149    
150    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
151    
152            * R/termdocmatrix.R: require() uses the quietly option to suppress
153            loading messages.
154    
155    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
156    
157            * R/dictionary.R: Added dictionary support.
158    
159    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
160    
161            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
162            documents. This simplifies some functions, e.g., asPlain.
163    
164    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
165    
166            * inst/doc/tm.Rnw: Fixed some typos in vignette.
167    
168    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
169    
170            * R/textdoccol.R (replaceWords): Added method to replace a set of
171            words by a single word. Useful for synonyms.
172    
173    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
174    
175            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
176    
177    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
178    
179            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
180            vectors. Thanks to Ariel Maguyon for his error report.
181            (removeSparseTerms): New function to remove columns from a
182            term-document matrix exceeding a sparse factor.
183    
184    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
185    
186            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
187    
188    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
189    
190            * man/sFilter.Rd: Corrected documentation on statement format (use
191            '==' instead of '=').
192    
193    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
194    
195            * R/aobjects.R (StructuredTextDocument): Inherits from
196            TextDocument.
197    
198    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
199    
200            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
201            on sparse matrices as proposed by Martin Maechler.
202    
203    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
204    
205            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
206            \pkg{filehash} version makes them deprecated.
207    
208    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
209    
210            * R/termdocmatrix.R (textvector): Stemming is now performed before
211            erasing stopwords.
212            (weightMatrix): Adapted to handle sparse matrices.
213            (TermDocMatrix): Sparse matrix is now efficiently built by
214            direct stepwise insertion of row values into it.
215    
216    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
217    
218            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
219            due to ongoing problems. For our purposes the latter is as useful
220            as the replaced package.
221    
222    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
223    
224            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
225    
226            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
227    
228    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
229    
230            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
231            languages with available stopwords.
232    
233    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
234    
235            * inst/doc/tm.Rnw: Minor corrections in the vignette.
236    
237    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
238    
239            * DESCRIPTION: Update to version 0.2, since a lot of new features
240            have been integrated.
241    
242            * inst/stopwords: Updated existing stopwords and added stopwords
243            for various other languages.
244    
245    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
246    
247            * man/: Updated documentation.
248    
249            * Work/testDb.R: Script to test database stuff.
250    
251            * R/: Fixed various database related bugs. Seems to be rather
252            useable now, i.e., consider as alpha status for now.
253    
254    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
255    
256            * R/: Fixed some bugs related to database support.
257    
258    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
259    
260            * man/: Added a lot of examples to the manuals.
261    
262    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
263    
264            * man/: Updated parts of the documentation.
265    
266            * R/textdoccol.R (asPlain): Added conversion from newsgroup
267            documents to plain text documents.
268    
269    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
270    
271            * R/textdoccol.R: Finished experimental database support. Not yet
272            intensively tested.
273    
274            * R/source.R: Now each source has a default reader.
275    
276            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
277            class anymore.
278    
279            * R/plaintextdoc.R: Custom show method for plain text documents.
280    
281            * R/aobjects.R: Added a class for structured text documents.
282    
283            * R/reader.R: Replaced remaining \code{parser} occurrences with
284            \code{reader}.
285    
286            * R/textdoccol.R (summary): Indent tags.
287    
288            * R/textdoccol.R (removePunctuation): Transform method to remove
289            punctuation marks.
290    
291    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
292    
293            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
294            using prescindMeta().
295    
296    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
297    
298            * R/textdoccol.R: Improved database support.
299    
300    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
301    
302            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
303    
304            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
305            language code.
306    
307            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
308            into parserControl argument.
309    
310            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
311    
312    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
313    
314            * Work/tmDataSetup.R: The datasets acq and crude can now be
315            created on the fly.
316    
317            * R/stopwords.R: Introduced a function returning the stopwords for
318            a given language (English, German and French at the moment)
319    
320            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
321            otherwise falls back to Snowball package.
322    
323    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
324    
325            * man/dissimilarity-methods.Rd: Make clear that any method offered
326            by "dists" from package "cba" can be used.
327    
328    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
329    
330            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
331            to Kurt's latex suggestion. Removed points and underscores in
332            variable names for consistent naming.
333    
334            * DESCRIPTION: Update to version 0.1-2.
335    
336            * man/TextRepository.Rd: Fixed bug in documentation.
337    
338    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
339    
340            * DESCRIPTION: Update to version 0.1-1.
341    
342    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
343    
344            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
345            wordStem.
346    
347    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
348    
349            * R/: Changes due to Kurt's review.
350    
351    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
352    
353            * R/: Implemented improvements based upon comments by David
354            Meyer.
355    
356    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
357    
358            * inst/doc/: Rewrote vignette.
359    
360            * man/: Improved documentation.
361    
362    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
363    
364            * man/: Updated documentation.
365    
366            * DESCRIPTION: Changed package name to "tm". Updated version to
367            0.1 for first CRAN release.
368    
369            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
370            list archive example.
371    
372            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
373            archive example.
374    
375            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
376            from (several mails per box) mbox format to (single mail per file)
377            eml format.
378    
379    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
380    
381            * data/crude.rda: Rebuilt.
382    
383            * data/acq.rda: Rebuilt.
384    
385            * R/reader.R: Factored out reader and parser methods from
386            textdoccol.R.
387    
388            * R/source.R: Factored out Source methods from aobjects.R and
389            textdoccol.R.
390            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
391            feeds.
392    
393            * R/textdoccol.R (DirSource): Added support for recursive
394            traversal of directories.
395    
396    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
397    
398            * R/textdoccol.R ([[): Loads the document corpus automatically
399            into memory upon access.
400            (tm_transform, tm_filter): Removed several checks whether the
401            document is already loaded ([[ ensures this now).
402            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
403            mailing list archive.
404    
405    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
406    
407            * R/aobjects.R (TextDocument): Is now a virtual class.
408            (Source): Is now a virtual class.
409    
410    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
411    
412            * R/textdoccol.R (c): Support for an arbitrary number of document
413            collections.
414    
415    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
416    
417            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
418            append_meta and remove_meta.
419    
420            * R/textdoccol.R: Removed modify_metadata method.
421    
422            * R/textrepo.R: Removed modify_metadata method.
423    
424            * R/textdoccol.R (remove_meta): Supports removal of document
425            collection metadata and document (= in data frame) metadata.
426    
427    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
428    
429            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
430    
431            * data/crude.rda: Rebuilt.
432    
433            * data/acq.rda: Rebuilt.
434    
435            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
436    
437            * R/textdoccol.R ([): Bug fix for subsetting a document
438            collection's data frame.
439    
440    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
441    
442            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
443            to s_filter.
444    
445            * R/textdoccol.R: Local text documents' metadata can now be copied
446            to a document collection's data frame with prescind_meta.
447    
448    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
449    
450            * R/: Text documents' slot metadata is now accessible in s_filter.
451    
452            * R/: Rewrote s_filter function (has still some restrictions).
453    
454    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * R/: Various fixes in handling metadata.
457    
458            * R/: Added update mechanism for text document collections.
459    
460    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
461    
462            * R/: Merging of document collections now creates a binary tree
463            for reconstructing merged document collections.
464    
465            * R/: Redesign of metadata for document collections.
466    
467    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
468    
469            * R/: Messages now use \code{ngettext}.
470    
471    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
472    
473            * R/: Added functions for modifying and removing metadata.
474    
475    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
476    
477            * man/: Updated some documentation.
478    
479            * R/: Corrected some connection issues.
480    
481            * inst/doc: Worked on the vignette.
482    
483    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
484    
485            * inst/: Added texts and started vignette.
486    
487            * R/: Final changes based upon David's comments.
488    
489    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
490    
491            * NAMESPACE: Corrected exports (generic methods need exportMethods
492            directives!).
493    
494    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * R/: Modified the TextDocCol constructur and various parsers. It
497            is now modular and supports various file formats via plugins (see
498            the new "Source" class).
499    
500    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
501    
502            * man/: Revised documentation after previous code changes.
503    
504    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
505    
506            * R/: Remaining changes as discussed with David.
507    
508    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * R/: Some changes as suggested by David. The rest will follow
511            within the next days.
512    
513    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
514    
515            * man/: Finished documentation.
516    
517    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * man/: Wrote some documentation.
520    
521    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/: Further syntactic sugar in form of additional assignment and
524            accessor methods.
525    
526    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/: Syntactic sugar in form of "length", "show" and "summary"
529            operators.
530    
531    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
532    
533            * R/: Diverse updates. Mainly on default operators ("[" or "c")
534            and dissimilarities.
535    
536    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
537    
538            * R/: Added similarity functions.
539    
540            * data/: Added english stopwords.
541    
542    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
543    
544            * data/: Examples compiled for new features
545    
546            * R/: Changes due to new structure.
547    
548            * NAMESPACE: Corrected namespace to reflect new structure.
549    
550            * R/termdocmatrix.R: Adapted for new naming scheme.
551    
552    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
553    
554            * R/textdoccol.R: Adapted code for new class structure. Wrote
555            several transform and filter functions operating on text document
556            collections (alias text document databases).
557    
558            * R/aobjects.R: Adapted class structure with inheritance,
559            repositories and additional meta data. Loading files on demand is
560            now possible.
561    
562    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
563    
564            * R/: Some cosmetic cleanups.
565    
566            * inst/: Removed vignette on clustering. That and much more is now
567            described in the JSS paper on text mining. Based upon that
568            article an elaborated vignette will be incorporated in the future.
569    
570    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
571    
572            * R/: Updated generic S4 methods to comply with signature changes
573            in newer versions of R (> 2.3)
574    
575    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
576    
577            * ext/R/importRIS.R: Automatic RIS import is now possible.
578    
579    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
580    
581            * R/textdoccol.R: Added RIS HTML input format.
582    
583    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
584    
585            * R/textdoccol.R: Removed bug that caused invalid text document
586            collections when handling many input files.
587    
588  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
589    
590            * R/textdoccol.R: Restructured and extended file import
591            mechanism.
592    
593          * inst/doc/clustering.Rnw: Adapted vignette for use with          * inst/doc/clustering.Rnw: Adapted vignette for use with
594          ReutNews.rda          ReutNews.rda
595    

Legend:
Removed from v.36  
changed lines
  Added in v.813

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge