SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/tm/ChangeLog revision 45, Wed Jul 5 17:27:29 2006 UTC trunk/tm/ChangeLog revision 816, Thu Jan 24 14:36:41 2008 UTC
# Line 1  Line 1 
1    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
4    
5            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
6    
7    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
8    
9            * R/meta.R (meta): New function for consistent access to meta data
10            of document collections, repositories, and texts.
11    
12    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
13    
14            * R/: Better support for encodings.
15    
16    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
17    
18            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
19            selection when no reader argument is given.
20    
21    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
22    
23            * R/source.R (CSVSource): Now uses read.csv instead of scan
24            internally.
25    
26    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
27    
28            * R/reader.R (getReaders): Returns available reader functions.
29    
30            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
31            as default.
32    
33    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
34    
35            * R/stopwords.R (stopwords): Shortened code, removed codetools
36            variable warnings.
37    
38            * man/: Documentation for showMeta, added an example for tmMap.
39    
40            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
41            some minor typos fixed.
42    
43    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
44    
45            * R/aobjects.R (showMeta): Added method for pretty printing a
46            text document's meta data.
47    
48    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
49    
50            * R/textdoccol.R (TextDocCol): Better handling of empty
51            arguments.
52    
53            * NAMESPACE: Exported readDOC.
54    
55            * man/completeStems.Rd: Added an example.
56    
57    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
58    
59            * R/stopwords.R (stopwords): Look up .dat files at every
60            call. Allows users to modify stopword .dat files interactively.
61    
62    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
63    
64            * R/termdocmatrix.R (termFreq): Correct processing of empty
65            documents.
66    
67    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
68    
69            * man/: Updated documentation.
70    
71    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
72    
73            * R/complete.R (completeStems): Completes (heuristically) word
74            stems.
75    
76            * R/termdocmatrix.R (TermDocMatrix2): New modular
77            constructor.
78    
79            * NAMESPACE: Exported termFreq.
80    
81    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
82    
83            * R/reader.R (readDOC): Added MS Word reader (using antiword).
84    
85    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
86    
87            * R/weight.R: Weighting functions for TermDocMatrix.
88    
89    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
90    
91            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
92            functions for accessing dimension, column, and row names.
93    
94            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
95    
96    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
97    
98            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
99    
100    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
101    
102            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
103    
104    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
105    
106            * R/reader.R (readPDF): Removed manual checks for pdftotext and
107            pdfinfo. The system call gives a warning anyway.
108    
109    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
110    
111            * R/textdoccol.R (asPlain): Conversion from
112            StructuredTextDocuments to PlainTextDocuments.
113    
114    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
115    
116            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
117            for accessing term-document matrices.
118    
119            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
120            are installed.
121    
122    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
123    
124            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
125            Christian Buchta.
126    
127    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
128    
129            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
130    
131    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
132    
133            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
134    
135            * R/reader.R (readPDF): Added PDF reader.
136    
137    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
138    
139            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
140    
141            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
142    
143            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
144    
145            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
146    
147    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
148    
149            * R/distmeasure.R (dissimilarity): Replaced dists call from
150            package cba by new dist call from package proxy.
151    
152    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
153    
154            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
155    
156    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
157    
158            * R/termdocmatrix.R: require() uses the quietly option to suppress
159            loading messages.
160    
161    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
162    
163            * R/dictionary.R: Added dictionary support.
164    
165    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
166    
167            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
168            documents. This simplifies some functions, e.g., asPlain.
169    
170    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
171    
172            * inst/doc/tm.Rnw: Fixed some typos in vignette.
173    
174    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
175    
176            * R/textdoccol.R (replaceWords): Added method to replace a set of
177            words by a single word. Useful for synonyms.
178    
179    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
180    
181            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
182    
183    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
184    
185            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
186            vectors. Thanks to Ariel Maguyon for his error report.
187            (removeSparseTerms): New function to remove columns from a
188            term-document matrix exceeding a sparse factor.
189    
190    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
191    
192            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
193    
194    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
195    
196            * man/sFilter.Rd: Corrected documentation on statement format (use
197            '==' instead of '=').
198    
199    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
200    
201            * R/aobjects.R (StructuredTextDocument): Inherits from
202            TextDocument.
203    
204    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
205    
206            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
207            on sparse matrices as proposed by Martin Maechler.
208    
209    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
210    
211            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
212            \pkg{filehash} version makes them deprecated.
213    
214    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
215    
216            * R/termdocmatrix.R (textvector): Stemming is now performed before
217            erasing stopwords.
218            (weightMatrix): Adapted to handle sparse matrices.
219            (TermDocMatrix): Sparse matrix is now efficiently built by
220            direct stepwise insertion of row values into it.
221    
222    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
223    
224            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
225            due to ongoing problems. For our purposes the latter is as useful
226            as the replaced package.
227    
228    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
229    
230            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
231    
232            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
233    
234    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
235    
236            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
237            languages with available stopwords.
238    
239    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
240    
241            * inst/doc/tm.Rnw: Minor corrections in the vignette.
242    
243    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
244    
245            * DESCRIPTION: Update to version 0.2, since a lot of new features
246            have been integrated.
247    
248            * inst/stopwords: Updated existing stopwords and added stopwords
249            for various other languages.
250    
251    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
252    
253            * man/: Updated documentation.
254    
255            * Work/testDb.R: Script to test database stuff.
256    
257            * R/: Fixed various database related bugs. Seems to be rather
258            useable now, i.e., consider as alpha status for now.
259    
260    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
261    
262            * R/: Fixed some bugs related to database support.
263    
264    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
265    
266            * man/: Added a lot of examples to the manuals.
267    
268    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
269    
270            * man/: Updated parts of the documentation.
271    
272            * R/textdoccol.R (asPlain): Added conversion from newsgroup
273            documents to plain text documents.
274    
275    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
276    
277            * R/textdoccol.R: Finished experimental database support. Not yet
278            intensively tested.
279    
280            * R/source.R: Now each source has a default reader.
281    
282            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
283            class anymore.
284    
285            * R/plaintextdoc.R: Custom show method for plain text documents.
286    
287            * R/aobjects.R: Added a class for structured text documents.
288    
289            * R/reader.R: Replaced remaining \code{parser} occurrences with
290            \code{reader}.
291    
292            * R/textdoccol.R (summary): Indent tags.
293    
294            * R/textdoccol.R (removePunctuation): Transform method to remove
295            punctuation marks.
296    
297    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
298    
299            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
300            using prescindMeta().
301    
302    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
303    
304            * R/textdoccol.R: Improved database support.
305    
306    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
307    
308            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
309    
310            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
311            language code.
312    
313            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
314            into parserControl argument.
315    
316            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
317    
318    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
319    
320            * Work/tmDataSetup.R: The datasets acq and crude can now be
321            created on the fly.
322    
323            * R/stopwords.R: Introduced a function returning the stopwords for
324            a given language (English, German and French at the moment)
325    
326            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
327            otherwise falls back to Snowball package.
328    
329    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
330    
331            * man/dissimilarity-methods.Rd: Make clear that any method offered
332            by "dists" from package "cba" can be used.
333    
334    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
335    
336            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
337            to Kurt's latex suggestion. Removed points and underscores in
338            variable names for consistent naming.
339    
340            * DESCRIPTION: Update to version 0.1-2.
341    
342            * man/TextRepository.Rd: Fixed bug in documentation.
343    
344    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
345    
346            * DESCRIPTION: Update to version 0.1-1.
347    
348    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
349    
350            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
351            wordStem.
352    
353    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
354    
355            * R/: Changes due to Kurt's review.
356    
357    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
358    
359            * R/: Implemented improvements based upon comments by David
360            Meyer.
361    
362    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
363    
364            * inst/doc/: Rewrote vignette.
365    
366            * man/: Improved documentation.
367    
368    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
369    
370            * man/: Updated documentation.
371    
372            * DESCRIPTION: Changed package name to "tm". Updated version to
373            0.1 for first CRAN release.
374    
375            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
376            list archive example.
377    
378            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
379            archive example.
380    
381            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
382            from (several mails per box) mbox format to (single mail per file)
383            eml format.
384    
385    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
386    
387            * data/crude.rda: Rebuilt.
388    
389            * data/acq.rda: Rebuilt.
390    
391            * R/reader.R: Factored out reader and parser methods from
392            textdoccol.R.
393    
394            * R/source.R: Factored out Source methods from aobjects.R and
395            textdoccol.R.
396            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
397            feeds.
398    
399            * R/textdoccol.R (DirSource): Added support for recursive
400            traversal of directories.
401    
402    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
403    
404            * R/textdoccol.R ([[): Loads the document corpus automatically
405            into memory upon access.
406            (tm_transform, tm_filter): Removed several checks whether the
407            document is already loaded ([[ ensures this now).
408            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
409            mailing list archive.
410    
411    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * R/aobjects.R (TextDocument): Is now a virtual class.
414            (Source): Is now a virtual class.
415    
416    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * R/textdoccol.R (c): Support for an arbitrary number of document
419            collections.
420    
421    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
422    
423            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
424            append_meta and remove_meta.
425    
426            * R/textdoccol.R: Removed modify_metadata method.
427    
428            * R/textrepo.R: Removed modify_metadata method.
429    
430            * R/textdoccol.R (remove_meta): Supports removal of document
431            collection metadata and document (= in data frame) metadata.
432    
433    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
434    
435            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
436    
437            * data/crude.rda: Rebuilt.
438    
439            * data/acq.rda: Rebuilt.
440    
441            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
442    
443            * R/textdoccol.R ([): Bug fix for subsetting a document
444            collection's data frame.
445    
446    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
447    
448            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
449            to s_filter.
450    
451            * R/textdoccol.R: Local text documents' metadata can now be copied
452            to a document collection's data frame with prescind_meta.
453    
454    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * R/: Text documents' slot metadata is now accessible in s_filter.
457    
458            * R/: Rewrote s_filter function (has still some restrictions).
459    
460    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
461    
462            * R/: Various fixes in handling metadata.
463    
464            * R/: Added update mechanism for text document collections.
465    
466    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
467    
468            * R/: Merging of document collections now creates a binary tree
469            for reconstructing merged document collections.
470    
471            * R/: Redesign of metadata for document collections.
472    
473    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/: Messages now use \code{ngettext}.
476    
477    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
478    
479            * R/: Added functions for modifying and removing metadata.
480    
481    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * man/: Updated some documentation.
484    
485            * R/: Corrected some connection issues.
486    
487            * inst/doc: Worked on the vignette.
488    
489    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
490    
491            * inst/: Added texts and started vignette.
492    
493            * R/: Final changes based upon David's comments.
494    
495    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497            * NAMESPACE: Corrected exports (generic methods need exportMethods
498            directives!).
499    
500    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
501    
502            * R/: Modified the TextDocCol constructur and various parsers. It
503            is now modular and supports various file formats via plugins (see
504            the new "Source" class).
505    
506    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * man/: Revised documentation after previous code changes.
509    
510    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
511    
512            * R/: Remaining changes as discussed with David.
513    
514    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
515    
516            * R/: Some changes as suggested by David. The rest will follow
517            within the next days.
518    
519    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
520    
521            * man/: Finished documentation.
522    
523    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
524    
525            * man/: Wrote some documentation.
526    
527    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
528    
529            * R/: Further syntactic sugar in form of additional assignment and
530            accessor methods.
531    
532    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
533    
534            * R/: Syntactic sugar in form of "length", "show" and "summary"
535            operators.
536    
537    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
538    
539            * R/: Diverse updates. Mainly on default operators ("[" or "c")
540            and dissimilarities.
541    
542    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
543    
544            * R/: Added similarity functions.
545    
546            * data/: Added english stopwords.
547    
548    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
549    
550            * data/: Examples compiled for new features
551    
552            * R/: Changes due to new structure.
553    
554            * NAMESPACE: Corrected namespace to reflect new structure.
555    
556            * R/termdocmatrix.R: Adapted for new naming scheme.
557    
558    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
559    
560            * R/textdoccol.R: Adapted code for new class structure. Wrote
561            several transform and filter functions operating on text document
562            collections (alias text document databases).
563    
564            * R/aobjects.R: Adapted class structure with inheritance,
565            repositories and additional meta data. Loading files on demand is
566            now possible.
567    
568    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/: Some cosmetic cleanups.
571    
572            * inst/: Removed vignette on clustering. That and much more is now
573            described in the JSS paper on text mining. Based upon that
574            article an elaborated vignette will be incorporated in the future.
575    
576  2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
577    
578          * R/: Updated generic S4 methods to comply with signature changes          * R/: Updated generic S4 methods to comply with signature changes

Legend:
Removed from v.45  
changed lines
  Added in v.816

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge