SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC trunk/tm/ChangeLog revision 773, Sat Jul 21 12:05:08 2007 UTC
# Line 1  Line 1 
1    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
4            are installed.
5    
6    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
7    
8            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
9            Christian Buchta.
10    
11    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
12    
13            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
14    
15    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
16    
17            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
18    
19            * R/reader.R (readPDF): Added PDF reader.
20    
21    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
22    
23            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
24    
25            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
26    
27            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
28    
29            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
30    
31    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
32    
33            * R/distmeasure.R (dissimilarity): Replaced dists call from
34            package cba by new dist call from package proxy.
35    
36    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
37    
38            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
39    
40    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
41    
42            * R/termdocmatrix.R: require() uses the quietly option to suppress
43            loading messages.
44    
45    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
46    
47            * R/dictionary.R: Added dictionary support.
48    
49    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
50    
51            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
52            documents. This simplifies some functions, e.g., asPlain.
53    
54    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
55    
56            * inst/doc/tm.Rnw: Fixed some typos in vignette.
57    
58    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
59    
60            * R/textdoccol.R (replaceWords): Added method to replace a set of
61            words by a single word. Useful for synonyms.
62    
63    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
64    
65            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
66    
67    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
68    
69            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
70            vectors. Thanks to Ariel Maguyon for his error report.
71            (removeSparseTerms): New function to remove columns from a
72            term-document matrix exceeding a sparse factor.
73    
74    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
75    
76            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
77    
78    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
79    
80            * man/sFilter.Rd: Corrected documentation on statement format (use
81            '==' instead of '=').
82    
83    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
84    
85            * R/aobjects.R (StructuredTextDocument): Inherits from
86            TextDocument.
87    
88    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
89    
90            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
91            on sparse matrices as proposed by Martin Maechler.
92    
93    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
94    
95            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
96            \pkg{filehash} version makes them deprecated.
97    
98    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
99    
100            * R/termdocmatrix.R (textvector): Stemming is now performed before
101            erasing stopwords.
102            (weightMatrix): Adapted to handle sparse matrices.
103            (TermDocMatrix): Sparse matrix is now efficiently built by
104            direct stepwise insertion of row values into it.
105    
106    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
107    
108            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
109            due to ongoing problems. For our purposes the latter is as useful
110            as the replaced package.
111    
112    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
113    
114            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
115    
116            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
117    
118    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
119    
120            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
121            languages with available stopwords.
122    
123    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
124    
125            * inst/doc/tm.Rnw: Minor corrections in the vignette.
126    
127    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
128    
129            * DESCRIPTION: Update to version 0.2, since a lot of new features
130            have been integrated.
131    
132            * inst/stopwords: Updated existing stopwords and added stopwords
133            for various other languages.
134    
135    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
136    
137            * man/: Updated documentation.
138    
139            * Work/testDb.R: Script to test database stuff.
140    
141            * R/: Fixed various database related bugs. Seems to be rather
142            useable now, i.e., consider as alpha status for now.
143    
144    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
145    
146            * R/: Fixed some bugs related to database support.
147    
148    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
149    
150            * man/: Added a lot of examples to the manuals.
151    
152    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
153    
154            * man/: Updated parts of the documentation.
155    
156            * R/textdoccol.R (asPlain): Added conversion from newsgroup
157            documents to plain text documents.
158    
159    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
160    
161            * R/textdoccol.R: Finished experimental database support. Not yet
162            intensively tested.
163    
164            * R/source.R: Now each source has a default reader.
165    
166            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
167            class anymore.
168    
169            * R/plaintextdoc.R: Custom show method for plain text documents.
170    
171            * R/aobjects.R: Added a class for structured text documents.
172    
173            * R/reader.R: Replaced remaining \code{parser} occurrences with
174            \code{reader}.
175    
176            * R/textdoccol.R (summary): Indent tags.
177    
178            * R/textdoccol.R (removePunctuation): Transform method to remove
179            punctuation marks.
180    
181    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
182    
183            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
184            using prescindMeta().
185    
186    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
187    
188            * R/textdoccol.R: Improved database support.
189    
190    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
191    
192            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
193    
194            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
195            language code.
196    
197            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
198            into parserControl argument.
199    
200            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
201    
202    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
203    
204            * Work/tmDataSetup.R: The datasets acq and crude can now be
205            created on the fly.
206    
207            * R/stopwords.R: Introduced a function returning the stopwords for
208            a given language (English, German and French at the moment)
209    
210            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
211            otherwise falls back to Snowball package.
212    
213    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
214    
215            * man/dissimilarity-methods.Rd: Make clear that any method offered
216            by "dists" from package "cba" can be used.
217    
218    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
219    
220            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
221            to Kurt's latex suggestion. Removed points and underscores in
222            variable names for consistent naming.
223    
224            * DESCRIPTION: Update to version 0.1-2.
225    
226            * man/TextRepository.Rd: Fixed bug in documentation.
227    
228    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
229    
230            * DESCRIPTION: Update to version 0.1-1.
231    
232    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
233    
234            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
235            wordStem.
236    
237    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
238    
239            * R/: Changes due to Kurt's review.
240    
241    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
242    
243            * R/: Implemented improvements based upon comments by David
244            Meyer.
245    
246    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
247    
248            * inst/doc/: Rewrote vignette.
249    
250            * man/: Improved documentation.
251    
252    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
253    
254            * man/: Updated documentation.
255    
256            * DESCRIPTION: Changed package name to "tm". Updated version to
257            0.1 for first CRAN release.
258    
259            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
260            list archive example.
261    
262            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
263            archive example.
264    
265            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
266            from (several mails per box) mbox format to (single mail per file)
267            eml format.
268    
269    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
270    
271            * data/crude.rda: Rebuilt.
272    
273            * data/acq.rda: Rebuilt.
274    
275            * R/reader.R: Factored out reader and parser methods from
276            textdoccol.R.
277    
278            * R/source.R: Factored out Source methods from aobjects.R and
279            textdoccol.R.
280            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
281            feeds.
282    
283            * R/textdoccol.R (DirSource): Added support for recursive
284            traversal of directories.
285    
286    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
287    
288            * R/textdoccol.R ([[): Loads the document corpus automatically
289            into memory upon access.
290            (tm_transform, tm_filter): Removed several checks whether the
291            document is already loaded ([[ ensures this now).
292            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
293            mailing list archive.
294    
295    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
296    
297            * R/aobjects.R (TextDocument): Is now a virtual class.
298            (Source): Is now a virtual class.
299    
300    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
301    
302            * R/textdoccol.R (c): Support for an arbitrary number of document
303            collections.
304    
305    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
306    
307            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
308            append_meta and remove_meta.
309    
310            * R/textdoccol.R: Removed modify_metadata method.
311    
312            * R/textrepo.R: Removed modify_metadata method.
313    
314            * R/textdoccol.R (remove_meta): Supports removal of document
315            collection metadata and document (= in data frame) metadata.
316    
317    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
318    
319            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
320    
321            * data/crude.rda: Rebuilt.
322    
323            * data/acq.rda: Rebuilt.
324    
325            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
326    
327            * R/textdoccol.R ([): Bug fix for subsetting a document
328            collection's data frame.
329    
330    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
331    
332            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
333            to s_filter.
334    
335            * R/textdoccol.R: Local text documents' metadata can now be copied
336            to a document collection's data frame with prescind_meta.
337    
338    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
339    
340            * R/: Text documents' slot metadata is now accessible in s_filter.
341    
342            * R/: Rewrote s_filter function (has still some restrictions).
343    
344    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
345    
346            * R/: Various fixes in handling metadata.
347    
348            * R/: Added update mechanism for text document collections.
349    
350    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
351    
352            * R/: Merging of document collections now creates a binary tree
353            for reconstructing merged document collections.
354    
355            * R/: Redesign of metadata for document collections.
356    
357    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
358    
359            * R/: Messages now use \code{ngettext}.
360    
361    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
362    
363            * R/: Added functions for modifying and removing metadata.
364    
365    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
366    
367            * man/: Updated some documentation.
368    
369            * R/: Corrected some connection issues.
370    
371            * inst/doc: Worked on the vignette.
372    
373    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
374    
375            * inst/: Added texts and started vignette.
376    
377            * R/: Final changes based upon David's comments.
378    
379    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
380    
381            * NAMESPACE: Corrected exports (generic methods need exportMethods
382            directives!).
383    
384    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
385    
386            * R/: Modified the TextDocCol constructur and various parsers. It
387            is now modular and supports various file formats via plugins (see
388            the new "Source" class).
389    
390    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
391    
392            * man/: Revised documentation after previous code changes.
393    
394    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
395    
396            * R/: Remaining changes as discussed with David.
397    
398    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
399    
400            * R/: Some changes as suggested by David. The rest will follow
401            within the next days.
402    
403    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
404    
405            * man/: Finished documentation.
406    
407    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
408    
409            * man/: Wrote some documentation.
410    
411    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * R/: Further syntactic sugar in form of additional assignment and
414            accessor methods.
415    
416    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * R/: Syntactic sugar in form of "length", "show" and "summary"
419            operators.
420    
421    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
422    
423            * R/: Diverse updates. Mainly on default operators ("[" or "c")
424            and dissimilarities.
425    
426    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
427    
428            * R/: Added similarity functions.
429    
430            * data/: Added english stopwords.
431    
432    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
433    
434            * data/: Examples compiled for new features
435    
436            * R/: Changes due to new structure.
437    
438            * NAMESPACE: Corrected namespace to reflect new structure.
439    
440            * R/termdocmatrix.R: Adapted for new naming scheme.
441    
442    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
443    
444            * R/textdoccol.R: Adapted code for new class structure. Wrote
445            several transform and filter functions operating on text document
446            collections (alias text document databases).
447    
448            * R/aobjects.R: Adapted class structure with inheritance,
449            repositories and additional meta data. Loading files on demand is
450            now possible.
451    
452    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
453    
454            * R/: Some cosmetic cleanups.
455    
456            * inst/: Removed vignette on clustering. That and much more is now
457            described in the JSS paper on text mining. Based upon that
458            article an elaborated vignette will be incorporated in the future.
459    
460    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
461    
462            * R/: Updated generic S4 methods to comply with signature changes
463            in newer versions of R (> 2.3)
464    
465    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
466    
467            * ext/R/importRIS.R: Automatic RIS import is now possible.
468    
469    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
470    
471            * R/textdoccol.R: Added RIS HTML input format.
472    
473    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/textdoccol.R: Removed bug that caused invalid text document
476            collections when handling many input files.
477    
478    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * R/textdoccol.R: Restructured and extended file import
481            mechanism.
482    
483            * inst/doc/clustering.Rnw: Adapted vignette for use with
484            ReutNews.rda
485    
486            * man/ReutNews.Rd: Documentation for ReutNews.rda
487    
488            * data/ReutNews.rda: A tiny Reuters21578 example data set.
489    
490    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
491    
492            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
493            clustering facilities of this package.
494    
495    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497            * R/aobjects.R: Changed package document structure to avoid class
498            dependency problems.
499    
500  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
501    
502            * Wrote a script for the ModLewis Split for the Reuters-21578 XML
503            data set.
504    
505          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
506          CMD check textmin" works without errors.          CMD check textmin" works without errors.
507    

Legend:
Removed from v.28  
changed lines
  Added in v.773

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge