SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 37, Wed Jan 11 17:49:17 2006 UTC trunk/tm/ChangeLog revision 777, Tue Aug 28 07:19:12 2007 UTC
# Line 1  Line 1 
1    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
4    
5    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
6    
7            * R/reader.R (readPDF): Removed manual checks for pdftotext and
8            pdfinfo. The system call gives a warning anyway.
9    
10    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
11    
12            * R/textdoccol.R (asPlain): Conversion from
13            StructuredTextDocuments to PlainTextDocuments.
14    
15    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
16    
17            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
18            for accessing term-document matrices.
19    
20            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
21            are installed.
22    
23    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
24    
25            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
26            Christian Buchta.
27    
28    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
29    
30            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
31    
32    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
33    
34            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
35    
36            * R/reader.R (readPDF): Added PDF reader.
37    
38    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
39    
40            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
41    
42            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
43    
44            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
45    
46            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
47    
48    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
49    
50            * R/distmeasure.R (dissimilarity): Replaced dists call from
51            package cba by new dist call from package proxy.
52    
53    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
54    
55            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
56    
57    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
58    
59            * R/termdocmatrix.R: require() uses the quietly option to suppress
60            loading messages.
61    
62    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
63    
64            * R/dictionary.R: Added dictionary support.
65    
66    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
67    
68            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
69            documents. This simplifies some functions, e.g., asPlain.
70    
71    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
72    
73            * inst/doc/tm.Rnw: Fixed some typos in vignette.
74    
75    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
76    
77            * R/textdoccol.R (replaceWords): Added method to replace a set of
78            words by a single word. Useful for synonyms.
79    
80    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
81    
82            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
83    
84    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
85    
86            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
87            vectors. Thanks to Ariel Maguyon for his error report.
88            (removeSparseTerms): New function to remove columns from a
89            term-document matrix exceeding a sparse factor.
90    
91    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
92    
93            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
94    
95    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
96    
97            * man/sFilter.Rd: Corrected documentation on statement format (use
98            '==' instead of '=').
99    
100    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
101    
102            * R/aobjects.R (StructuredTextDocument): Inherits from
103            TextDocument.
104    
105    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
106    
107            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
108            on sparse matrices as proposed by Martin Maechler.
109    
110    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
111    
112            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
113            \pkg{filehash} version makes them deprecated.
114    
115    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
116    
117            * R/termdocmatrix.R (textvector): Stemming is now performed before
118            erasing stopwords.
119            (weightMatrix): Adapted to handle sparse matrices.
120            (TermDocMatrix): Sparse matrix is now efficiently built by
121            direct stepwise insertion of row values into it.
122    
123    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
124    
125            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
126            due to ongoing problems. For our purposes the latter is as useful
127            as the replaced package.
128    
129    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
130    
131            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
132    
133            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
134    
135    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
136    
137            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
138            languages with available stopwords.
139    
140    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
141    
142            * inst/doc/tm.Rnw: Minor corrections in the vignette.
143    
144    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
145    
146            * DESCRIPTION: Update to version 0.2, since a lot of new features
147            have been integrated.
148    
149            * inst/stopwords: Updated existing stopwords and added stopwords
150            for various other languages.
151    
152    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
153    
154            * man/: Updated documentation.
155    
156            * Work/testDb.R: Script to test database stuff.
157    
158            * R/: Fixed various database related bugs. Seems to be rather
159            useable now, i.e., consider as alpha status for now.
160    
161    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
162    
163            * R/: Fixed some bugs related to database support.
164    
165    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
166    
167            * man/: Added a lot of examples to the manuals.
168    
169    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
170    
171            * man/: Updated parts of the documentation.
172    
173            * R/textdoccol.R (asPlain): Added conversion from newsgroup
174            documents to plain text documents.
175    
176    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
177    
178            * R/textdoccol.R: Finished experimental database support. Not yet
179            intensively tested.
180    
181            * R/source.R: Now each source has a default reader.
182    
183            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
184            class anymore.
185    
186            * R/plaintextdoc.R: Custom show method for plain text documents.
187    
188            * R/aobjects.R: Added a class for structured text documents.
189    
190            * R/reader.R: Replaced remaining \code{parser} occurrences with
191            \code{reader}.
192    
193            * R/textdoccol.R (summary): Indent tags.
194    
195            * R/textdoccol.R (removePunctuation): Transform method to remove
196            punctuation marks.
197    
198    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
199    
200            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
201            using prescindMeta().
202    
203    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
204    
205            * R/textdoccol.R: Improved database support.
206    
207    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
208    
209            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
210    
211            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
212            language code.
213    
214            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
215            into parserControl argument.
216    
217            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
218    
219    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
220    
221            * Work/tmDataSetup.R: The datasets acq and crude can now be
222            created on the fly.
223    
224            * R/stopwords.R: Introduced a function returning the stopwords for
225            a given language (English, German and French at the moment)
226    
227            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
228            otherwise falls back to Snowball package.
229    
230    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
231    
232            * man/dissimilarity-methods.Rd: Make clear that any method offered
233            by "dists" from package "cba" can be used.
234    
235    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
236    
237            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
238            to Kurt's latex suggestion. Removed points and underscores in
239            variable names for consistent naming.
240    
241            * DESCRIPTION: Update to version 0.1-2.
242    
243            * man/TextRepository.Rd: Fixed bug in documentation.
244    
245    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
246    
247            * DESCRIPTION: Update to version 0.1-1.
248    
249    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
250    
251            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
252            wordStem.
253    
254    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
255    
256            * R/: Changes due to Kurt's review.
257    
258    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
259    
260            * R/: Implemented improvements based upon comments by David
261            Meyer.
262    
263    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
264    
265            * inst/doc/: Rewrote vignette.
266    
267            * man/: Improved documentation.
268    
269    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
270    
271            * man/: Updated documentation.
272    
273            * DESCRIPTION: Changed package name to "tm". Updated version to
274            0.1 for first CRAN release.
275    
276            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
277            list archive example.
278    
279            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
280            archive example.
281    
282            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
283            from (several mails per box) mbox format to (single mail per file)
284            eml format.
285    
286    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
287    
288            * data/crude.rda: Rebuilt.
289    
290            * data/acq.rda: Rebuilt.
291    
292            * R/reader.R: Factored out reader and parser methods from
293            textdoccol.R.
294    
295            * R/source.R: Factored out Source methods from aobjects.R and
296            textdoccol.R.
297            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
298            feeds.
299    
300            * R/textdoccol.R (DirSource): Added support for recursive
301            traversal of directories.
302    
303    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
304    
305            * R/textdoccol.R ([[): Loads the document corpus automatically
306            into memory upon access.
307            (tm_transform, tm_filter): Removed several checks whether the
308            document is already loaded ([[ ensures this now).
309            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
310            mailing list archive.
311    
312    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
313    
314            * R/aobjects.R (TextDocument): Is now a virtual class.
315            (Source): Is now a virtual class.
316    
317    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
318    
319            * R/textdoccol.R (c): Support for an arbitrary number of document
320            collections.
321    
322    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
323    
324            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
325            append_meta and remove_meta.
326    
327            * R/textdoccol.R: Removed modify_metadata method.
328    
329            * R/textrepo.R: Removed modify_metadata method.
330    
331            * R/textdoccol.R (remove_meta): Supports removal of document
332            collection metadata and document (= in data frame) metadata.
333    
334    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
335    
336            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
337    
338            * data/crude.rda: Rebuilt.
339    
340            * data/acq.rda: Rebuilt.
341    
342            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
343    
344            * R/textdoccol.R ([): Bug fix for subsetting a document
345            collection's data frame.
346    
347    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
348    
349            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
350            to s_filter.
351    
352            * R/textdoccol.R: Local text documents' metadata can now be copied
353            to a document collection's data frame with prescind_meta.
354    
355    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
356    
357            * R/: Text documents' slot metadata is now accessible in s_filter.
358    
359            * R/: Rewrote s_filter function (has still some restrictions).
360    
361    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
362    
363            * R/: Various fixes in handling metadata.
364    
365            * R/: Added update mechanism for text document collections.
366    
367    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
368    
369            * R/: Merging of document collections now creates a binary tree
370            for reconstructing merged document collections.
371    
372            * R/: Redesign of metadata for document collections.
373    
374    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
375    
376            * R/: Messages now use \code{ngettext}.
377    
378    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
379    
380            * R/: Added functions for modifying and removing metadata.
381    
382    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
383    
384            * man/: Updated some documentation.
385    
386            * R/: Corrected some connection issues.
387    
388            * inst/doc: Worked on the vignette.
389    
390    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
391    
392            * inst/: Added texts and started vignette.
393    
394            * R/: Final changes based upon David's comments.
395    
396    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
397    
398            * NAMESPACE: Corrected exports (generic methods need exportMethods
399            directives!).
400    
401    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * R/: Modified the TextDocCol constructur and various parsers. It
404            is now modular and supports various file formats via plugins (see
405            the new "Source" class).
406    
407    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
408    
409            * man/: Revised documentation after previous code changes.
410    
411    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * R/: Remaining changes as discussed with David.
414    
415    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
416    
417            * R/: Some changes as suggested by David. The rest will follow
418            within the next days.
419    
420    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
421    
422            * man/: Finished documentation.
423    
424    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
425    
426            * man/: Wrote some documentation.
427    
428    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
429    
430            * R/: Further syntactic sugar in form of additional assignment and
431            accessor methods.
432    
433    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
434    
435            * R/: Syntactic sugar in form of "length", "show" and "summary"
436            operators.
437    
438    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
439    
440            * R/: Diverse updates. Mainly on default operators ("[" or "c")
441            and dissimilarities.
442    
443    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
444    
445            * R/: Added similarity functions.
446    
447            * data/: Added english stopwords.
448    
449    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
450    
451            * data/: Examples compiled for new features
452    
453            * R/: Changes due to new structure.
454    
455            * NAMESPACE: Corrected namespace to reflect new structure.
456    
457            * R/termdocmatrix.R: Adapted for new naming scheme.
458    
459    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
460    
461            * R/textdoccol.R: Adapted code for new class structure. Wrote
462            several transform and filter functions operating on text document
463            collections (alias text document databases).
464    
465            * R/aobjects.R: Adapted class structure with inheritance,
466            repositories and additional meta data. Loading files on demand is
467            now possible.
468    
469    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
470    
471            * R/: Some cosmetic cleanups.
472    
473            * inst/: Removed vignette on clustering. That and much more is now
474            described in the JSS paper on text mining. Based upon that
475            article an elaborated vignette will be incorporated in the future.
476    
477    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
478    
479            * R/: Updated generic S4 methods to comply with signature changes
480            in newer versions of R (> 2.3)
481    
482    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * ext/R/importRIS.R: Automatic RIS import is now possible.
485    
486    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
487    
488            * R/textdoccol.R: Added RIS HTML input format.
489    
490    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
491    
492            * R/textdoccol.R: Removed bug that caused invalid text document
493            collections when handling many input files.
494    
495  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497          * R/textdoccol.R: Restructured and extended file import          * R/textdoccol.R: Restructured and extended file import

Legend:
Removed from v.37  
changed lines
  Added in v.777

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge