SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC trunk/tm/ChangeLog revision 774, Sat Jul 21 16:25:54 2007 UTC
# Line 1  Line 1 
1    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
4            for accessing term-document matrices.
5    
6            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
7            are installed.
8    
9    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
10    
11            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
12            Christian Buchta.
13    
14    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
15    
16            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
17    
18    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
19    
20            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
21    
22            * R/reader.R (readPDF): Added PDF reader.
23    
24    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
25    
26            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
27    
28            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
29    
30            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
31    
32            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
33    
34    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
35    
36            * R/distmeasure.R (dissimilarity): Replaced dists call from
37            package cba by new dist call from package proxy.
38    
39    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
40    
41            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
42    
43    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
44    
45            * R/termdocmatrix.R: require() uses the quietly option to suppress
46            loading messages.
47    
48    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
49    
50            * R/dictionary.R: Added dictionary support.
51    
52    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
53    
54            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
55            documents. This simplifies some functions, e.g., asPlain.
56    
57    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
58    
59            * inst/doc/tm.Rnw: Fixed some typos in vignette.
60    
61    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
62    
63            * R/textdoccol.R (replaceWords): Added method to replace a set of
64            words by a single word. Useful for synonyms.
65    
66    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
67    
68            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
69    
70    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
71    
72            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
73            vectors. Thanks to Ariel Maguyon for his error report.
74            (removeSparseTerms): New function to remove columns from a
75            term-document matrix exceeding a sparse factor.
76    
77    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
78    
79            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
80    
81    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
82    
83            * man/sFilter.Rd: Corrected documentation on statement format (use
84            '==' instead of '=').
85    
86    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
87    
88            * R/aobjects.R (StructuredTextDocument): Inherits from
89            TextDocument.
90    
91    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
92    
93            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
94            on sparse matrices as proposed by Martin Maechler.
95    
96    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
97    
98            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
99            \pkg{filehash} version makes them deprecated.
100    
101    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
102    
103            * R/termdocmatrix.R (textvector): Stemming is now performed before
104            erasing stopwords.
105            (weightMatrix): Adapted to handle sparse matrices.
106            (TermDocMatrix): Sparse matrix is now efficiently built by
107            direct stepwise insertion of row values into it.
108    
109    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
110    
111            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
112            due to ongoing problems. For our purposes the latter is as useful
113            as the replaced package.
114    
115    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
116    
117            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
118    
119            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
120    
121    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
122    
123            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
124            languages with available stopwords.
125    
126    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
127    
128            * inst/doc/tm.Rnw: Minor corrections in the vignette.
129    
130    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
131    
132            * DESCRIPTION: Update to version 0.2, since a lot of new features
133            have been integrated.
134    
135            * inst/stopwords: Updated existing stopwords and added stopwords
136            for various other languages.
137    
138    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
139    
140            * man/: Updated documentation.
141    
142            * Work/testDb.R: Script to test database stuff.
143    
144            * R/: Fixed various database related bugs. Seems to be rather
145            useable now, i.e., consider as alpha status for now.
146    
147    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
148    
149            * R/: Fixed some bugs related to database support.
150    
151    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
152    
153            * man/: Added a lot of examples to the manuals.
154    
155    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
156    
157            * man/: Updated parts of the documentation.
158    
159            * R/textdoccol.R (asPlain): Added conversion from newsgroup
160            documents to plain text documents.
161    
162    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
163    
164            * R/textdoccol.R: Finished experimental database support. Not yet
165            intensively tested.
166    
167            * R/source.R: Now each source has a default reader.
168    
169            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
170            class anymore.
171    
172            * R/plaintextdoc.R: Custom show method for plain text documents.
173    
174            * R/aobjects.R: Added a class for structured text documents.
175    
176            * R/reader.R: Replaced remaining \code{parser} occurrences with
177            \code{reader}.
178    
179            * R/textdoccol.R (summary): Indent tags.
180    
181            * R/textdoccol.R (removePunctuation): Transform method to remove
182            punctuation marks.
183    
184    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
185    
186            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
187            using prescindMeta().
188    
189    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
190    
191            * R/textdoccol.R: Improved database support.
192    
193    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
194    
195            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
196    
197            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
198            language code.
199    
200            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
201            into parserControl argument.
202    
203            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
204    
205    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
206    
207            * Work/tmDataSetup.R: The datasets acq and crude can now be
208            created on the fly.
209    
210            * R/stopwords.R: Introduced a function returning the stopwords for
211            a given language (English, German and French at the moment)
212    
213            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
214            otherwise falls back to Snowball package.
215    
216    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
217    
218            * man/dissimilarity-methods.Rd: Make clear that any method offered
219            by "dists" from package "cba" can be used.
220    
221    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
222    
223            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
224            to Kurt's latex suggestion. Removed points and underscores in
225            variable names for consistent naming.
226    
227            * DESCRIPTION: Update to version 0.1-2.
228    
229            * man/TextRepository.Rd: Fixed bug in documentation.
230    
231    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
232    
233            * DESCRIPTION: Update to version 0.1-1.
234    
235    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
236    
237            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
238            wordStem.
239    
240    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
241    
242            * R/: Changes due to Kurt's review.
243    
244    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
245    
246            * R/: Implemented improvements based upon comments by David
247            Meyer.
248    
249    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
250    
251            * inst/doc/: Rewrote vignette.
252    
253            * man/: Improved documentation.
254    
255    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
256    
257            * man/: Updated documentation.
258    
259            * DESCRIPTION: Changed package name to "tm". Updated version to
260            0.1 for first CRAN release.
261    
262            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
263            list archive example.
264    
265            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
266            archive example.
267    
268            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
269            from (several mails per box) mbox format to (single mail per file)
270            eml format.
271    
272    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
273    
274            * data/crude.rda: Rebuilt.
275    
276            * data/acq.rda: Rebuilt.
277    
278            * R/reader.R: Factored out reader and parser methods from
279            textdoccol.R.
280    
281            * R/source.R: Factored out Source methods from aobjects.R and
282            textdoccol.R.
283            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
284            feeds.
285    
286            * R/textdoccol.R (DirSource): Added support for recursive
287            traversal of directories.
288    
289    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
290    
291            * R/textdoccol.R ([[): Loads the document corpus automatically
292            into memory upon access.
293            (tm_transform, tm_filter): Removed several checks whether the
294            document is already loaded ([[ ensures this now).
295            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
296            mailing list archive.
297    
298    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
299    
300            * R/aobjects.R (TextDocument): Is now a virtual class.
301            (Source): Is now a virtual class.
302    
303    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
304    
305            * R/textdoccol.R (c): Support for an arbitrary number of document
306            collections.
307    
308    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
309    
310            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
311            append_meta and remove_meta.
312    
313            * R/textdoccol.R: Removed modify_metadata method.
314    
315            * R/textrepo.R: Removed modify_metadata method.
316    
317            * R/textdoccol.R (remove_meta): Supports removal of document
318            collection metadata and document (= in data frame) metadata.
319    
320    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
321    
322            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
323    
324            * data/crude.rda: Rebuilt.
325    
326            * data/acq.rda: Rebuilt.
327    
328            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
329    
330            * R/textdoccol.R ([): Bug fix for subsetting a document
331            collection's data frame.
332    
333    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
334    
335            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
336            to s_filter.
337    
338            * R/textdoccol.R: Local text documents' metadata can now be copied
339            to a document collection's data frame with prescind_meta.
340    
341    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
342    
343            * R/: Text documents' slot metadata is now accessible in s_filter.
344    
345            * R/: Rewrote s_filter function (has still some restrictions).
346    
347    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
348    
349            * R/: Various fixes in handling metadata.
350    
351            * R/: Added update mechanism for text document collections.
352    
353    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
354    
355            * R/: Merging of document collections now creates a binary tree
356            for reconstructing merged document collections.
357    
358            * R/: Redesign of metadata for document collections.
359    
360    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
361    
362            * R/: Messages now use \code{ngettext}.
363    
364    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
365    
366            * R/: Added functions for modifying and removing metadata.
367    
368    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
369    
370            * man/: Updated some documentation.
371    
372            * R/: Corrected some connection issues.
373    
374            * inst/doc: Worked on the vignette.
375    
376    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
377    
378            * inst/: Added texts and started vignette.
379    
380            * R/: Final changes based upon David's comments.
381    
382    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
383    
384            * NAMESPACE: Corrected exports (generic methods need exportMethods
385            directives!).
386    
387    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
388    
389            * R/: Modified the TextDocCol constructur and various parsers. It
390            is now modular and supports various file formats via plugins (see
391            the new "Source" class).
392    
393    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
394    
395            * man/: Revised documentation after previous code changes.
396    
397    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
398    
399            * R/: Remaining changes as discussed with David.
400    
401    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * R/: Some changes as suggested by David. The rest will follow
404            within the next days.
405    
406    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
407    
408            * man/: Finished documentation.
409    
410    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
411    
412            * man/: Wrote some documentation.
413    
414    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
415    
416            * R/: Further syntactic sugar in form of additional assignment and
417            accessor methods.
418    
419    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
420    
421            * R/: Syntactic sugar in form of "length", "show" and "summary"
422            operators.
423    
424    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
425    
426            * R/: Diverse updates. Mainly on default operators ("[" or "c")
427            and dissimilarities.
428    
429    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
430    
431            * R/: Added similarity functions.
432    
433            * data/: Added english stopwords.
434    
435    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
436    
437            * data/: Examples compiled for new features
438    
439            * R/: Changes due to new structure.
440    
441            * NAMESPACE: Corrected namespace to reflect new structure.
442    
443            * R/termdocmatrix.R: Adapted for new naming scheme.
444    
445    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
446    
447            * R/textdoccol.R: Adapted code for new class structure. Wrote
448            several transform and filter functions operating on text document
449            collections (alias text document databases).
450    
451            * R/aobjects.R: Adapted class structure with inheritance,
452            repositories and additional meta data. Loading files on demand is
453            now possible.
454    
455    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
456    
457            * R/: Some cosmetic cleanups.
458    
459            * inst/: Removed vignette on clustering. That and much more is now
460            described in the JSS paper on text mining. Based upon that
461            article an elaborated vignette will be incorporated in the future.
462    
463    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/: Updated generic S4 methods to comply with signature changes
466            in newer versions of R (> 2.3)
467    
468    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * ext/R/importRIS.R: Automatic RIS import is now possible.
471    
472    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
473    
474            * R/textdoccol.R: Added RIS HTML input format.
475    
476    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
477    
478            * R/textdoccol.R: Removed bug that caused invalid text document
479            collections when handling many input files.
480    
481    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * R/textdoccol.R: Restructured and extended file import
484            mechanism.
485    
486            * inst/doc/clustering.Rnw: Adapted vignette for use with
487            ReutNews.rda
488    
489            * man/ReutNews.Rd: Documentation for ReutNews.rda
490    
491            * data/ReutNews.rda: A tiny Reuters21578 example data set.
492    
493  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
494    
495          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.774

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge