SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 37, Wed Jan 11 17:49:17 2006 UTC trunk/tm/ChangeLog revision 789, Tue Oct 16 11:26:19 2007 UTC
# Line 1  Line 1 
1    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/reader.R (readDOC): Added MS Word reader (using antiword).
4    
5    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
6    
7            * R/weight.R: Weighting functions for TermDocMatrix.
8    
9    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
10    
11            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
12            functions for accessing dimension, column, and row names.
13    
14            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
15    
16    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
17    
18            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
19    
20    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
21    
22            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
23    
24    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
25    
26            * R/reader.R (readPDF): Removed manual checks for pdftotext and
27            pdfinfo. The system call gives a warning anyway.
28    
29    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
30    
31            * R/textdoccol.R (asPlain): Conversion from
32            StructuredTextDocuments to PlainTextDocuments.
33    
34    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
35    
36            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
37            for accessing term-document matrices.
38    
39            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
40            are installed.
41    
42    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
43    
44            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
45            Christian Buchta.
46    
47    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
48    
49            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
50    
51    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
52    
53            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
54    
55            * R/reader.R (readPDF): Added PDF reader.
56    
57    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
58    
59            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
60    
61            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
62    
63            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
64    
65            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
66    
67    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
68    
69            * R/distmeasure.R (dissimilarity): Replaced dists call from
70            package cba by new dist call from package proxy.
71    
72    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
73    
74            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
75    
76    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
77    
78            * R/termdocmatrix.R: require() uses the quietly option to suppress
79            loading messages.
80    
81    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
82    
83            * R/dictionary.R: Added dictionary support.
84    
85    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
86    
87            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
88            documents. This simplifies some functions, e.g., asPlain.
89    
90    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
91    
92            * inst/doc/tm.Rnw: Fixed some typos in vignette.
93    
94    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
95    
96            * R/textdoccol.R (replaceWords): Added method to replace a set of
97            words by a single word. Useful for synonyms.
98    
99    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
100    
101            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
102    
103    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
104    
105            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
106            vectors. Thanks to Ariel Maguyon for his error report.
107            (removeSparseTerms): New function to remove columns from a
108            term-document matrix exceeding a sparse factor.
109    
110    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
111    
112            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
113    
114    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
115    
116            * man/sFilter.Rd: Corrected documentation on statement format (use
117            '==' instead of '=').
118    
119    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
120    
121            * R/aobjects.R (StructuredTextDocument): Inherits from
122            TextDocument.
123    
124    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
125    
126            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
127            on sparse matrices as proposed by Martin Maechler.
128    
129    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
130    
131            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
132            \pkg{filehash} version makes them deprecated.
133    
134    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
135    
136            * R/termdocmatrix.R (textvector): Stemming is now performed before
137            erasing stopwords.
138            (weightMatrix): Adapted to handle sparse matrices.
139            (TermDocMatrix): Sparse matrix is now efficiently built by
140            direct stepwise insertion of row values into it.
141    
142    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
143    
144            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
145            due to ongoing problems. For our purposes the latter is as useful
146            as the replaced package.
147    
148    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
149    
150            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
151    
152            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
153    
154    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
155    
156            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
157            languages with available stopwords.
158    
159    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
160    
161            * inst/doc/tm.Rnw: Minor corrections in the vignette.
162    
163    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
164    
165            * DESCRIPTION: Update to version 0.2, since a lot of new features
166            have been integrated.
167    
168            * inst/stopwords: Updated existing stopwords and added stopwords
169            for various other languages.
170    
171    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
172    
173            * man/: Updated documentation.
174    
175            * Work/testDb.R: Script to test database stuff.
176    
177            * R/: Fixed various database related bugs. Seems to be rather
178            useable now, i.e., consider as alpha status for now.
179    
180    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
181    
182            * R/: Fixed some bugs related to database support.
183    
184    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
185    
186            * man/: Added a lot of examples to the manuals.
187    
188    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
189    
190            * man/: Updated parts of the documentation.
191    
192            * R/textdoccol.R (asPlain): Added conversion from newsgroup
193            documents to plain text documents.
194    
195    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
196    
197            * R/textdoccol.R: Finished experimental database support. Not yet
198            intensively tested.
199    
200            * R/source.R: Now each source has a default reader.
201    
202            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
203            class anymore.
204    
205            * R/plaintextdoc.R: Custom show method for plain text documents.
206    
207            * R/aobjects.R: Added a class for structured text documents.
208    
209            * R/reader.R: Replaced remaining \code{parser} occurrences with
210            \code{reader}.
211    
212            * R/textdoccol.R (summary): Indent tags.
213    
214            * R/textdoccol.R (removePunctuation): Transform method to remove
215            punctuation marks.
216    
217    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
218    
219            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
220            using prescindMeta().
221    
222    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
223    
224            * R/textdoccol.R: Improved database support.
225    
226    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
227    
228            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
229    
230            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
231            language code.
232    
233            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
234            into parserControl argument.
235    
236            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
237    
238    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
239    
240            * Work/tmDataSetup.R: The datasets acq and crude can now be
241            created on the fly.
242    
243            * R/stopwords.R: Introduced a function returning the stopwords for
244            a given language (English, German and French at the moment)
245    
246            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
247            otherwise falls back to Snowball package.
248    
249    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
250    
251            * man/dissimilarity-methods.Rd: Make clear that any method offered
252            by "dists" from package "cba" can be used.
253    
254    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
255    
256            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
257            to Kurt's latex suggestion. Removed points and underscores in
258            variable names for consistent naming.
259    
260            * DESCRIPTION: Update to version 0.1-2.
261    
262            * man/TextRepository.Rd: Fixed bug in documentation.
263    
264    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
265    
266            * DESCRIPTION: Update to version 0.1-1.
267    
268    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
269    
270            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
271            wordStem.
272    
273    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
274    
275            * R/: Changes due to Kurt's review.
276    
277    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
278    
279            * R/: Implemented improvements based upon comments by David
280            Meyer.
281    
282    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
283    
284            * inst/doc/: Rewrote vignette.
285    
286            * man/: Improved documentation.
287    
288    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
289    
290            * man/: Updated documentation.
291    
292            * DESCRIPTION: Changed package name to "tm". Updated version to
293            0.1 for first CRAN release.
294    
295            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
296            list archive example.
297    
298            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
299            archive example.
300    
301            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
302            from (several mails per box) mbox format to (single mail per file)
303            eml format.
304    
305    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
306    
307            * data/crude.rda: Rebuilt.
308    
309            * data/acq.rda: Rebuilt.
310    
311            * R/reader.R: Factored out reader and parser methods from
312            textdoccol.R.
313    
314            * R/source.R: Factored out Source methods from aobjects.R and
315            textdoccol.R.
316            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
317            feeds.
318    
319            * R/textdoccol.R (DirSource): Added support for recursive
320            traversal of directories.
321    
322    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
323    
324            * R/textdoccol.R ([[): Loads the document corpus automatically
325            into memory upon access.
326            (tm_transform, tm_filter): Removed several checks whether the
327            document is already loaded ([[ ensures this now).
328            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
329            mailing list archive.
330    
331    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
332    
333            * R/aobjects.R (TextDocument): Is now a virtual class.
334            (Source): Is now a virtual class.
335    
336    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
337    
338            * R/textdoccol.R (c): Support for an arbitrary number of document
339            collections.
340    
341    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
342    
343            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
344            append_meta and remove_meta.
345    
346            * R/textdoccol.R: Removed modify_metadata method.
347    
348            * R/textrepo.R: Removed modify_metadata method.
349    
350            * R/textdoccol.R (remove_meta): Supports removal of document
351            collection metadata and document (= in data frame) metadata.
352    
353    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
354    
355            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
356    
357            * data/crude.rda: Rebuilt.
358    
359            * data/acq.rda: Rebuilt.
360    
361            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
362    
363            * R/textdoccol.R ([): Bug fix for subsetting a document
364            collection's data frame.
365    
366    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
367    
368            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
369            to s_filter.
370    
371            * R/textdoccol.R: Local text documents' metadata can now be copied
372            to a document collection's data frame with prescind_meta.
373    
374    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
375    
376            * R/: Text documents' slot metadata is now accessible in s_filter.
377    
378            * R/: Rewrote s_filter function (has still some restrictions).
379    
380    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
381    
382            * R/: Various fixes in handling metadata.
383    
384            * R/: Added update mechanism for text document collections.
385    
386    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
387    
388            * R/: Merging of document collections now creates a binary tree
389            for reconstructing merged document collections.
390    
391            * R/: Redesign of metadata for document collections.
392    
393    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
394    
395            * R/: Messages now use \code{ngettext}.
396    
397    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
398    
399            * R/: Added functions for modifying and removing metadata.
400    
401    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * man/: Updated some documentation.
404    
405            * R/: Corrected some connection issues.
406    
407            * inst/doc: Worked on the vignette.
408    
409    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
410    
411            * inst/: Added texts and started vignette.
412    
413            * R/: Final changes based upon David's comments.
414    
415    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
416    
417            * NAMESPACE: Corrected exports (generic methods need exportMethods
418            directives!).
419    
420    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
421    
422            * R/: Modified the TextDocCol constructur and various parsers. It
423            is now modular and supports various file formats via plugins (see
424            the new "Source" class).
425    
426    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
427    
428            * man/: Revised documentation after previous code changes.
429    
430    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
431    
432            * R/: Remaining changes as discussed with David.
433    
434    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * R/: Some changes as suggested by David. The rest will follow
437            within the next days.
438    
439    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * man/: Finished documentation.
442    
443    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
444    
445            * man/: Wrote some documentation.
446    
447    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
448    
449            * R/: Further syntactic sugar in form of additional assignment and
450            accessor methods.
451    
452    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
453    
454            * R/: Syntactic sugar in form of "length", "show" and "summary"
455            operators.
456    
457    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
458    
459            * R/: Diverse updates. Mainly on default operators ("[" or "c")
460            and dissimilarities.
461    
462    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
463    
464            * R/: Added similarity functions.
465    
466            * data/: Added english stopwords.
467    
468    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * data/: Examples compiled for new features
471    
472            * R/: Changes due to new structure.
473    
474            * NAMESPACE: Corrected namespace to reflect new structure.
475    
476            * R/termdocmatrix.R: Adapted for new naming scheme.
477    
478    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * R/textdoccol.R: Adapted code for new class structure. Wrote
481            several transform and filter functions operating on text document
482            collections (alias text document databases).
483    
484            * R/aobjects.R: Adapted class structure with inheritance,
485            repositories and additional meta data. Loading files on demand is
486            now possible.
487    
488    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
489    
490            * R/: Some cosmetic cleanups.
491    
492            * inst/: Removed vignette on clustering. That and much more is now
493            described in the JSS paper on text mining. Based upon that
494            article an elaborated vignette will be incorporated in the future.
495    
496    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
497    
498            * R/: Updated generic S4 methods to comply with signature changes
499            in newer versions of R (> 2.3)
500    
501    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
502    
503            * ext/R/importRIS.R: Automatic RIS import is now possible.
504    
505    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
506    
507            * R/textdoccol.R: Added RIS HTML input format.
508    
509    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
510    
511            * R/textdoccol.R: Removed bug that caused invalid text document
512            collections when handling many input files.
513    
514  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
515    
516          * R/textdoccol.R: Restructured and extended file import          * R/textdoccol.R: Restructured and extended file import

Legend:
Removed from v.37  
changed lines
  Added in v.789

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge