SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC trunk/tm/ChangeLog revision 776, Sun Jul 29 15:27:41 2007 UTC
# Line 1  Line 1 
1    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/reader.R (readPDF): Removed manual checks for pdftotext and
4            pdfinfo. The system call gives a warning anyway.
5    
6    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
7    
8            * R/textdoccol.R (asPlain): Conversion from
9            StructuredTextDocuments to PlainTextDocuments.
10    
11    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
12    
13            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
14            for accessing term-document matrices.
15    
16            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
17            are installed.
18    
19    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
20    
21            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
22            Christian Buchta.
23    
24    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
25    
26            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
27    
28    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
29    
30            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
31    
32            * R/reader.R (readPDF): Added PDF reader.
33    
34    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
35    
36            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
37    
38            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
39    
40            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
41    
42            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
43    
44    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
45    
46            * R/distmeasure.R (dissimilarity): Replaced dists call from
47            package cba by new dist call from package proxy.
48    
49    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
50    
51            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
52    
53    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
54    
55            * R/termdocmatrix.R: require() uses the quietly option to suppress
56            loading messages.
57    
58    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
59    
60            * R/dictionary.R: Added dictionary support.
61    
62    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
63    
64            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
65            documents. This simplifies some functions, e.g., asPlain.
66    
67    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
68    
69            * inst/doc/tm.Rnw: Fixed some typos in vignette.
70    
71    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
72    
73            * R/textdoccol.R (replaceWords): Added method to replace a set of
74            words by a single word. Useful for synonyms.
75    
76    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
77    
78            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
79    
80    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
81    
82            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
83            vectors. Thanks to Ariel Maguyon for his error report.
84            (removeSparseTerms): New function to remove columns from a
85            term-document matrix exceeding a sparse factor.
86    
87    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
88    
89            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
90    
91    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
92    
93            * man/sFilter.Rd: Corrected documentation on statement format (use
94            '==' instead of '=').
95    
96    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
97    
98            * R/aobjects.R (StructuredTextDocument): Inherits from
99            TextDocument.
100    
101    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
102    
103            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
104            on sparse matrices as proposed by Martin Maechler.
105    
106    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
107    
108            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
109            \pkg{filehash} version makes them deprecated.
110    
111    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
112    
113            * R/termdocmatrix.R (textvector): Stemming is now performed before
114            erasing stopwords.
115            (weightMatrix): Adapted to handle sparse matrices.
116            (TermDocMatrix): Sparse matrix is now efficiently built by
117            direct stepwise insertion of row values into it.
118    
119    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
120    
121            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
122            due to ongoing problems. For our purposes the latter is as useful
123            as the replaced package.
124    
125    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
126    
127            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
128    
129            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
130    
131    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
132    
133            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
134            languages with available stopwords.
135    
136    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
137    
138            * inst/doc/tm.Rnw: Minor corrections in the vignette.
139    
140    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
141    
142            * DESCRIPTION: Update to version 0.2, since a lot of new features
143            have been integrated.
144    
145            * inst/stopwords: Updated existing stopwords and added stopwords
146            for various other languages.
147    
148    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
149    
150            * man/: Updated documentation.
151    
152            * Work/testDb.R: Script to test database stuff.
153    
154            * R/: Fixed various database related bugs. Seems to be rather
155            useable now, i.e., consider as alpha status for now.
156    
157    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
158    
159            * R/: Fixed some bugs related to database support.
160    
161    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
162    
163            * man/: Added a lot of examples to the manuals.
164    
165    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
166    
167            * man/: Updated parts of the documentation.
168    
169            * R/textdoccol.R (asPlain): Added conversion from newsgroup
170            documents to plain text documents.
171    
172    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
173    
174            * R/textdoccol.R: Finished experimental database support. Not yet
175            intensively tested.
176    
177            * R/source.R: Now each source has a default reader.
178    
179            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
180            class anymore.
181    
182            * R/plaintextdoc.R: Custom show method for plain text documents.
183    
184            * R/aobjects.R: Added a class for structured text documents.
185    
186            * R/reader.R: Replaced remaining \code{parser} occurrences with
187            \code{reader}.
188    
189            * R/textdoccol.R (summary): Indent tags.
190    
191            * R/textdoccol.R (removePunctuation): Transform method to remove
192            punctuation marks.
193    
194    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
195    
196            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
197            using prescindMeta().
198    
199    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
200    
201            * R/textdoccol.R: Improved database support.
202    
203    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
204    
205            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
206    
207            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
208            language code.
209    
210            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
211            into parserControl argument.
212    
213            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
214    
215    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
216    
217            * Work/tmDataSetup.R: The datasets acq and crude can now be
218            created on the fly.
219    
220            * R/stopwords.R: Introduced a function returning the stopwords for
221            a given language (English, German and French at the moment)
222    
223            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
224            otherwise falls back to Snowball package.
225    
226    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
227    
228            * man/dissimilarity-methods.Rd: Make clear that any method offered
229            by "dists" from package "cba" can be used.
230    
231    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
232    
233            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
234            to Kurt's latex suggestion. Removed points and underscores in
235            variable names for consistent naming.
236    
237            * DESCRIPTION: Update to version 0.1-2.
238    
239            * man/TextRepository.Rd: Fixed bug in documentation.
240    
241    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
242    
243            * DESCRIPTION: Update to version 0.1-1.
244    
245    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
246    
247            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
248            wordStem.
249    
250    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
251    
252            * R/: Changes due to Kurt's review.
253    
254    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
255    
256            * R/: Implemented improvements based upon comments by David
257            Meyer.
258    
259    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
260    
261            * inst/doc/: Rewrote vignette.
262    
263            * man/: Improved documentation.
264    
265    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
266    
267            * man/: Updated documentation.
268    
269            * DESCRIPTION: Changed package name to "tm". Updated version to
270            0.1 for first CRAN release.
271    
272            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
273            list archive example.
274    
275            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
276            archive example.
277    
278            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
279            from (several mails per box) mbox format to (single mail per file)
280            eml format.
281    
282    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
283    
284            * data/crude.rda: Rebuilt.
285    
286            * data/acq.rda: Rebuilt.
287    
288            * R/reader.R: Factored out reader and parser methods from
289            textdoccol.R.
290    
291            * R/source.R: Factored out Source methods from aobjects.R and
292            textdoccol.R.
293            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
294            feeds.
295    
296            * R/textdoccol.R (DirSource): Added support for recursive
297            traversal of directories.
298    
299    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
300    
301            * R/textdoccol.R ([[): Loads the document corpus automatically
302            into memory upon access.
303            (tm_transform, tm_filter): Removed several checks whether the
304            document is already loaded ([[ ensures this now).
305            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
306            mailing list archive.
307    
308    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
309    
310            * R/aobjects.R (TextDocument): Is now a virtual class.
311            (Source): Is now a virtual class.
312    
313    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
314    
315            * R/textdoccol.R (c): Support for an arbitrary number of document
316            collections.
317    
318    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
319    
320            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
321            append_meta and remove_meta.
322    
323            * R/textdoccol.R: Removed modify_metadata method.
324    
325            * R/textrepo.R: Removed modify_metadata method.
326    
327            * R/textdoccol.R (remove_meta): Supports removal of document
328            collection metadata and document (= in data frame) metadata.
329    
330    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
331    
332            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
333    
334            * data/crude.rda: Rebuilt.
335    
336            * data/acq.rda: Rebuilt.
337    
338            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
339    
340            * R/textdoccol.R ([): Bug fix for subsetting a document
341            collection's data frame.
342    
343    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
344    
345            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
346            to s_filter.
347    
348            * R/textdoccol.R: Local text documents' metadata can now be copied
349            to a document collection's data frame with prescind_meta.
350    
351    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
352    
353            * R/: Text documents' slot metadata is now accessible in s_filter.
354    
355            * R/: Rewrote s_filter function (has still some restrictions).
356    
357    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
358    
359            * R/: Various fixes in handling metadata.
360    
361            * R/: Added update mechanism for text document collections.
362    
363    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
364    
365            * R/: Merging of document collections now creates a binary tree
366            for reconstructing merged document collections.
367    
368            * R/: Redesign of metadata for document collections.
369    
370    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
371    
372            * R/: Messages now use \code{ngettext}.
373    
374    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
375    
376            * R/: Added functions for modifying and removing metadata.
377    
378    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
379    
380            * man/: Updated some documentation.
381    
382            * R/: Corrected some connection issues.
383    
384            * inst/doc: Worked on the vignette.
385    
386    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
387    
388            * inst/: Added texts and started vignette.
389    
390            * R/: Final changes based upon David's comments.
391    
392    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
393    
394            * NAMESPACE: Corrected exports (generic methods need exportMethods
395            directives!).
396    
397    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
398    
399            * R/: Modified the TextDocCol constructur and various parsers. It
400            is now modular and supports various file formats via plugins (see
401            the new "Source" class).
402    
403    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
404    
405            * man/: Revised documentation after previous code changes.
406    
407    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
408    
409            * R/: Remaining changes as discussed with David.
410    
411    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * R/: Some changes as suggested by David. The rest will follow
414            within the next days.
415    
416    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * man/: Finished documentation.
419    
420    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
421    
422            * man/: Wrote some documentation.
423    
424    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
425    
426            * R/: Further syntactic sugar in form of additional assignment and
427            accessor methods.
428    
429    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
430    
431            * R/: Syntactic sugar in form of "length", "show" and "summary"
432            operators.
433    
434    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * R/: Diverse updates. Mainly on default operators ("[" or "c")
437            and dissimilarities.
438    
439    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * R/: Added similarity functions.
442    
443            * data/: Added english stopwords.
444    
445    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
446    
447            * data/: Examples compiled for new features
448    
449            * R/: Changes due to new structure.
450    
451            * NAMESPACE: Corrected namespace to reflect new structure.
452    
453            * R/termdocmatrix.R: Adapted for new naming scheme.
454    
455    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
456    
457            * R/textdoccol.R: Adapted code for new class structure. Wrote
458            several transform and filter functions operating on text document
459            collections (alias text document databases).
460    
461            * R/aobjects.R: Adapted class structure with inheritance,
462            repositories and additional meta data. Loading files on demand is
463            now possible.
464    
465    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
466    
467            * R/: Some cosmetic cleanups.
468    
469            * inst/: Removed vignette on clustering. That and much more is now
470            described in the JSS paper on text mining. Based upon that
471            article an elaborated vignette will be incorporated in the future.
472    
473    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/: Updated generic S4 methods to comply with signature changes
476            in newer versions of R (> 2.3)
477    
478    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * ext/R/importRIS.R: Automatic RIS import is now possible.
481    
482    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * R/textdoccol.R: Added RIS HTML input format.
485    
486    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
487    
488            * R/textdoccol.R: Removed bug that caused invalid text document
489            collections when handling many input files.
490    
491    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
492    
493            * R/textdoccol.R: Restructured and extended file import
494            mechanism.
495    
496            * inst/doc/clustering.Rnw: Adapted vignette for use with
497            ReutNews.rda
498    
499            * man/ReutNews.Rd: Documentation for ReutNews.rda
500    
501            * data/ReutNews.rda: A tiny Reuters21578 example data set.
502    
503    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
504    
505            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
506            clustering facilities of this package.
507    
508    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * R/aobjects.R: Changed package document structure to avoid class
511            dependency problems.
512    
513    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
514    
515            * Wrote a script for the ModLewis Split for the Reuters-21578 XML
516            data set.
517    
518            * Finished documentation and reordered directory structure. Now "R
519            CMD check textmin" works without errors.
520    
521    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * src/: Various splits can now be easily created for the
524            Reuters21578 data set.
525    
526    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * Updated documentation
529    
530    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
531    
532            * Wrote R documentation for some classes and methods.
533    
534    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
535    
536            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
537            files. See the questionnaire data/Umfrage.csv for such an example.
538            We are now able to import files in Reuters-21578 XML format.
539    
540            * Changed class interfaces in various files. Weighting of the text
541            matrix is now possible.
542    
543    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * R/textdoccol.R: One can build term-document matrices if
546            nessecary (with buildTDM(...)) and fill the field tdm from a text
547            document collection with it.
548    
549            * R/textmatrix.R: Wrote S4 class for term-document matrices.
550    
551    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
552    
553            * R/textdoccol.R: We now can read in a whole XML file with several
554            news items.
555    
556  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
557    
558          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.776

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge