SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 37, Wed Jan 11 17:49:17 2006 UTC trunk/tm/ChangeLog revision 793, Sun Oct 21 13:18:38 2007 UTC
# Line 1  Line 1 
1    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/complete.R (completeStems): Completes (heuristically) word
4            stems.
5    
6            * R/termdocmatrix.R (TermDocMatrix2): New modular
7            constructor.
8    
9            * NAMESPACE: Exported termFreq.
10    
11    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
12    
13            * R/reader.R (readDOC): Added MS Word reader (using antiword).
14    
15    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
16    
17            * R/weight.R: Weighting functions for TermDocMatrix.
18    
19    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
20    
21            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
22            functions for accessing dimension, column, and row names.
23    
24            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
25    
26    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
27    
28            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
29    
30    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
31    
32            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
33    
34    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
35    
36            * R/reader.R (readPDF): Removed manual checks for pdftotext and
37            pdfinfo. The system call gives a warning anyway.
38    
39    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
40    
41            * R/textdoccol.R (asPlain): Conversion from
42            StructuredTextDocuments to PlainTextDocuments.
43    
44    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
45    
46            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
47            for accessing term-document matrices.
48    
49            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
50            are installed.
51    
52    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
53    
54            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
55            Christian Buchta.
56    
57    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
58    
59            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
60    
61    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
62    
63            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
64    
65            * R/reader.R (readPDF): Added PDF reader.
66    
67    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
68    
69            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
70    
71            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
72    
73            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
74    
75            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
76    
77    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
78    
79            * R/distmeasure.R (dissimilarity): Replaced dists call from
80            package cba by new dist call from package proxy.
81    
82    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
83    
84            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
85    
86    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
87    
88            * R/termdocmatrix.R: require() uses the quietly option to suppress
89            loading messages.
90    
91    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
92    
93            * R/dictionary.R: Added dictionary support.
94    
95    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
96    
97            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
98            documents. This simplifies some functions, e.g., asPlain.
99    
100    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
101    
102            * inst/doc/tm.Rnw: Fixed some typos in vignette.
103    
104    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
105    
106            * R/textdoccol.R (replaceWords): Added method to replace a set of
107            words by a single word. Useful for synonyms.
108    
109    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
110    
111            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
112    
113    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
114    
115            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
116            vectors. Thanks to Ariel Maguyon for his error report.
117            (removeSparseTerms): New function to remove columns from a
118            term-document matrix exceeding a sparse factor.
119    
120    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
121    
122            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
123    
124    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
125    
126            * man/sFilter.Rd: Corrected documentation on statement format (use
127            '==' instead of '=').
128    
129    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
130    
131            * R/aobjects.R (StructuredTextDocument): Inherits from
132            TextDocument.
133    
134    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
135    
136            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
137            on sparse matrices as proposed by Martin Maechler.
138    
139    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
140    
141            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
142            \pkg{filehash} version makes them deprecated.
143    
144    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
145    
146            * R/termdocmatrix.R (textvector): Stemming is now performed before
147            erasing stopwords.
148            (weightMatrix): Adapted to handle sparse matrices.
149            (TermDocMatrix): Sparse matrix is now efficiently built by
150            direct stepwise insertion of row values into it.
151    
152    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
153    
154            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
155            due to ongoing problems. For our purposes the latter is as useful
156            as the replaced package.
157    
158    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
159    
160            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
161    
162            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
163    
164    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
165    
166            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
167            languages with available stopwords.
168    
169    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
170    
171            * inst/doc/tm.Rnw: Minor corrections in the vignette.
172    
173    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
174    
175            * DESCRIPTION: Update to version 0.2, since a lot of new features
176            have been integrated.
177    
178            * inst/stopwords: Updated existing stopwords and added stopwords
179            for various other languages.
180    
181    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
182    
183            * man/: Updated documentation.
184    
185            * Work/testDb.R: Script to test database stuff.
186    
187            * R/: Fixed various database related bugs. Seems to be rather
188            useable now, i.e., consider as alpha status for now.
189    
190    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
191    
192            * R/: Fixed some bugs related to database support.
193    
194    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
195    
196            * man/: Added a lot of examples to the manuals.
197    
198    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
199    
200            * man/: Updated parts of the documentation.
201    
202            * R/textdoccol.R (asPlain): Added conversion from newsgroup
203            documents to plain text documents.
204    
205    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
206    
207            * R/textdoccol.R: Finished experimental database support. Not yet
208            intensively tested.
209    
210            * R/source.R: Now each source has a default reader.
211    
212            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
213            class anymore.
214    
215            * R/plaintextdoc.R: Custom show method for plain text documents.
216    
217            * R/aobjects.R: Added a class for structured text documents.
218    
219            * R/reader.R: Replaced remaining \code{parser} occurrences with
220            \code{reader}.
221    
222            * R/textdoccol.R (summary): Indent tags.
223    
224            * R/textdoccol.R (removePunctuation): Transform method to remove
225            punctuation marks.
226    
227    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
228    
229            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
230            using prescindMeta().
231    
232    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
233    
234            * R/textdoccol.R: Improved database support.
235    
236    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
237    
238            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
239    
240            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
241            language code.
242    
243            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
244            into parserControl argument.
245    
246            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
247    
248    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
249    
250            * Work/tmDataSetup.R: The datasets acq and crude can now be
251            created on the fly.
252    
253            * R/stopwords.R: Introduced a function returning the stopwords for
254            a given language (English, German and French at the moment)
255    
256            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
257            otherwise falls back to Snowball package.
258    
259    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
260    
261            * man/dissimilarity-methods.Rd: Make clear that any method offered
262            by "dists" from package "cba" can be used.
263    
264    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
265    
266            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
267            to Kurt's latex suggestion. Removed points and underscores in
268            variable names for consistent naming.
269    
270            * DESCRIPTION: Update to version 0.1-2.
271    
272            * man/TextRepository.Rd: Fixed bug in documentation.
273    
274    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
275    
276            * DESCRIPTION: Update to version 0.1-1.
277    
278    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
279    
280            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
281            wordStem.
282    
283    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
284    
285            * R/: Changes due to Kurt's review.
286    
287    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
288    
289            * R/: Implemented improvements based upon comments by David
290            Meyer.
291    
292    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
293    
294            * inst/doc/: Rewrote vignette.
295    
296            * man/: Improved documentation.
297    
298    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
299    
300            * man/: Updated documentation.
301    
302            * DESCRIPTION: Changed package name to "tm". Updated version to
303            0.1 for first CRAN release.
304    
305            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
306            list archive example.
307    
308            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
309            archive example.
310    
311            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
312            from (several mails per box) mbox format to (single mail per file)
313            eml format.
314    
315    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
316    
317            * data/crude.rda: Rebuilt.
318    
319            * data/acq.rda: Rebuilt.
320    
321            * R/reader.R: Factored out reader and parser methods from
322            textdoccol.R.
323    
324            * R/source.R: Factored out Source methods from aobjects.R and
325            textdoccol.R.
326            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
327            feeds.
328    
329            * R/textdoccol.R (DirSource): Added support for recursive
330            traversal of directories.
331    
332    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
333    
334            * R/textdoccol.R ([[): Loads the document corpus automatically
335            into memory upon access.
336            (tm_transform, tm_filter): Removed several checks whether the
337            document is already loaded ([[ ensures this now).
338            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
339            mailing list archive.
340    
341    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
342    
343            * R/aobjects.R (TextDocument): Is now a virtual class.
344            (Source): Is now a virtual class.
345    
346    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
347    
348            * R/textdoccol.R (c): Support for an arbitrary number of document
349            collections.
350    
351    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
352    
353            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
354            append_meta and remove_meta.
355    
356            * R/textdoccol.R: Removed modify_metadata method.
357    
358            * R/textrepo.R: Removed modify_metadata method.
359    
360            * R/textdoccol.R (remove_meta): Supports removal of document
361            collection metadata and document (= in data frame) metadata.
362    
363    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
364    
365            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
366    
367            * data/crude.rda: Rebuilt.
368    
369            * data/acq.rda: Rebuilt.
370    
371            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
372    
373            * R/textdoccol.R ([): Bug fix for subsetting a document
374            collection's data frame.
375    
376    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
377    
378            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
379            to s_filter.
380    
381            * R/textdoccol.R: Local text documents' metadata can now be copied
382            to a document collection's data frame with prescind_meta.
383    
384    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
385    
386            * R/: Text documents' slot metadata is now accessible in s_filter.
387    
388            * R/: Rewrote s_filter function (has still some restrictions).
389    
390    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
391    
392            * R/: Various fixes in handling metadata.
393    
394            * R/: Added update mechanism for text document collections.
395    
396    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
397    
398            * R/: Merging of document collections now creates a binary tree
399            for reconstructing merged document collections.
400    
401            * R/: Redesign of metadata for document collections.
402    
403    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
404    
405            * R/: Messages now use \code{ngettext}.
406    
407    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
408    
409            * R/: Added functions for modifying and removing metadata.
410    
411    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * man/: Updated some documentation.
414    
415            * R/: Corrected some connection issues.
416    
417            * inst/doc: Worked on the vignette.
418    
419    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
420    
421            * inst/: Added texts and started vignette.
422    
423            * R/: Final changes based upon David's comments.
424    
425    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
426    
427            * NAMESPACE: Corrected exports (generic methods need exportMethods
428            directives!).
429    
430    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
431    
432            * R/: Modified the TextDocCol constructur and various parsers. It
433            is now modular and supports various file formats via plugins (see
434            the new "Source" class).
435    
436    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
437    
438            * man/: Revised documentation after previous code changes.
439    
440    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
441    
442            * R/: Remaining changes as discussed with David.
443    
444    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
445    
446            * R/: Some changes as suggested by David. The rest will follow
447            within the next days.
448    
449    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
450    
451            * man/: Finished documentation.
452    
453    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * man/: Wrote some documentation.
456    
457    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
458    
459            * R/: Further syntactic sugar in form of additional assignment and
460            accessor methods.
461    
462    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
463    
464            * R/: Syntactic sugar in form of "length", "show" and "summary"
465            operators.
466    
467    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
468    
469            * R/: Diverse updates. Mainly on default operators ("[" or "c")
470            and dissimilarities.
471    
472    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
473    
474            * R/: Added similarity functions.
475    
476            * data/: Added english stopwords.
477    
478    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * data/: Examples compiled for new features
481    
482            * R/: Changes due to new structure.
483    
484            * NAMESPACE: Corrected namespace to reflect new structure.
485    
486            * R/termdocmatrix.R: Adapted for new naming scheme.
487    
488    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
489    
490            * R/textdoccol.R: Adapted code for new class structure. Wrote
491            several transform and filter functions operating on text document
492            collections (alias text document databases).
493    
494            * R/aobjects.R: Adapted class structure with inheritance,
495            repositories and additional meta data. Loading files on demand is
496            now possible.
497    
498    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * R/: Some cosmetic cleanups.
501    
502            * inst/: Removed vignette on clustering. That and much more is now
503            described in the JSS paper on text mining. Based upon that
504            article an elaborated vignette will be incorporated in the future.
505    
506    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * R/: Updated generic S4 methods to comply with signature changes
509            in newer versions of R (> 2.3)
510    
511    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
512    
513            * ext/R/importRIS.R: Automatic RIS import is now possible.
514    
515    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
516    
517            * R/textdoccol.R: Added RIS HTML input format.
518    
519    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
520    
521            * R/textdoccol.R: Removed bug that caused invalid text document
522            collections when handling many input files.
523    
524  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
525    
526          * R/textdoccol.R: Restructured and extended file import          * R/textdoccol.R: Restructured and extended file import

Legend:
Removed from v.37  
changed lines
  Added in v.793

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge