SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 21, Sat Nov 19 10:23:19 2005 UTC trunk/tm/ChangeLog revision 766, Sat Jul 14 08:46:23 2007 UTC
# Line 1  Line 1 
1    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/reader.R (readPDF): Added PDF reader.
4    
5    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
6    
7            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
8    
9            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
10    
11            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
12    
13            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
14    
15    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
16    
17            * R/distmeasure.R (dissimilarity): Replaced dists call from
18            package cba by new dist call from package proxy.
19    
20    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
21    
22            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
23    
24    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
25    
26            * R/termdocmatrix.R: require() uses the quietly option to suppress
27            loading messages.
28    
29    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
30    
31            * R/dictionary.R: Added dictionary support.
32    
33    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
34    
35            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
36            documents. This simplifies some functions, e.g., asPlain.
37    
38    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
39    
40            * inst/doc/tm.Rnw: Fixed some typos in vignette.
41    
42    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
43    
44            * R/textdoccol.R (replaceWords): Added method to replace a set of
45            words by a single word. Useful for synonyms.
46    
47    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
48    
49            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
50    
51    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
52    
53            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
54            vectors. Thanks to Ariel Maguyon for his error report.
55            (removeSparseTerms): New function to remove columns from a
56            term-document matrix exceeding a sparse factor.
57    
58    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
59    
60            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
61    
62    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
63    
64            * man/sFilter.Rd: Corrected documentation on statement format (use
65            '==' instead of '=').
66    
67    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
68    
69            * R/aobjects.R (StructuredTextDocument): Inherits from
70            TextDocument.
71    
72    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
73    
74            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
75            on sparse matrices as proposed by Martin Maechler.
76    
77    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
78    
79            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
80            \pkg{filehash} version makes them deprecated.
81    
82    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
83    
84            * R/termdocmatrix.R (textvector): Stemming is now performed before
85            erasing stopwords.
86            (weightMatrix): Adapted to handle sparse matrices.
87            (TermDocMatrix): Sparse matrix is now efficiently built by
88            direct stepwise insertion of row values into it.
89    
90    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
91    
92            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
93            due to ongoing problems. For our purposes the latter is as useful
94            as the replaced package.
95    
96    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
97    
98            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
99    
100            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
101    
102    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
103    
104            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
105            languages with available stopwords.
106    
107    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
108    
109            * inst/doc/tm.Rnw: Minor corrections in the vignette.
110    
111    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
112    
113            * DESCRIPTION: Update to version 0.2, since a lot of new features
114            have been integrated.
115    
116            * inst/stopwords: Updated existing stopwords and added stopwords
117            for various other languages.
118    
119    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
120    
121            * man/: Updated documentation.
122    
123            * Work/testDb.R: Script to test database stuff.
124    
125            * R/: Fixed various database related bugs. Seems to be rather
126            useable now, i.e., consider as alpha status for now.
127    
128    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
129    
130            * R/: Fixed some bugs related to database support.
131    
132    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
133    
134            * man/: Added a lot of examples to the manuals.
135    
136    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
137    
138            * man/: Updated parts of the documentation.
139    
140            * R/textdoccol.R (asPlain): Added conversion from newsgroup
141            documents to plain text documents.
142    
143    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
144    
145            * R/textdoccol.R: Finished experimental database support. Not yet
146            intensively tested.
147    
148            * R/source.R: Now each source has a default reader.
149    
150            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
151            class anymore.
152    
153            * R/plaintextdoc.R: Custom show method for plain text documents.
154    
155            * R/aobjects.R: Added a class for structured text documents.
156    
157            * R/reader.R: Replaced remaining \code{parser} occurrences with
158            \code{reader}.
159    
160            * R/textdoccol.R (summary): Indent tags.
161    
162            * R/textdoccol.R (removePunctuation): Transform method to remove
163            punctuation marks.
164    
165    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
166    
167            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
168            using prescindMeta().
169    
170    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
171    
172            * R/textdoccol.R: Improved database support.
173    
174    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
175    
176            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
177    
178            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
179            language code.
180    
181            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
182            into parserControl argument.
183    
184            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
185    
186    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
187    
188            * Work/tmDataSetup.R: The datasets acq and crude can now be
189            created on the fly.
190    
191            * R/stopwords.R: Introduced a function returning the stopwords for
192            a given language (English, German and French at the moment)
193    
194            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
195            otherwise falls back to Snowball package.
196    
197    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
198    
199            * man/dissimilarity-methods.Rd: Make clear that any method offered
200            by "dists" from package "cba" can be used.
201    
202    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
203    
204            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
205            to Kurt's latex suggestion. Removed points and underscores in
206            variable names for consistent naming.
207    
208            * DESCRIPTION: Update to version 0.1-2.
209    
210            * man/TextRepository.Rd: Fixed bug in documentation.
211    
212    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
213    
214            * DESCRIPTION: Update to version 0.1-1.
215    
216    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
217    
218            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
219            wordStem.
220    
221    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
222    
223            * R/: Changes due to Kurt's review.
224    
225    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
226    
227            * R/: Implemented improvements based upon comments by David
228            Meyer.
229    
230    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
231    
232            * inst/doc/: Rewrote vignette.
233    
234            * man/: Improved documentation.
235    
236    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
237    
238            * man/: Updated documentation.
239    
240            * DESCRIPTION: Changed package name to "tm". Updated version to
241            0.1 for first CRAN release.
242    
243            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
244            list archive example.
245    
246            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
247            archive example.
248    
249            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
250            from (several mails per box) mbox format to (single mail per file)
251            eml format.
252    
253    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
254    
255            * data/crude.rda: Rebuilt.
256    
257            * data/acq.rda: Rebuilt.
258    
259            * R/reader.R: Factored out reader and parser methods from
260            textdoccol.R.
261    
262            * R/source.R: Factored out Source methods from aobjects.R and
263            textdoccol.R.
264            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
265            feeds.
266    
267            * R/textdoccol.R (DirSource): Added support for recursive
268            traversal of directories.
269    
270    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
271    
272            * R/textdoccol.R ([[): Loads the document corpus automatically
273            into memory upon access.
274            (tm_transform, tm_filter): Removed several checks whether the
275            document is already loaded ([[ ensures this now).
276            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
277            mailing list archive.
278    
279    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
280    
281            * R/aobjects.R (TextDocument): Is now a virtual class.
282            (Source): Is now a virtual class.
283    
284    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
285    
286            * R/textdoccol.R (c): Support for an arbitrary number of document
287            collections.
288    
289    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
290    
291            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
292            append_meta and remove_meta.
293    
294            * R/textdoccol.R: Removed modify_metadata method.
295    
296            * R/textrepo.R: Removed modify_metadata method.
297    
298            * R/textdoccol.R (remove_meta): Supports removal of document
299            collection metadata and document (= in data frame) metadata.
300    
301    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
302    
303            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
304    
305            * data/crude.rda: Rebuilt.
306    
307            * data/acq.rda: Rebuilt.
308    
309            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
310    
311            * R/textdoccol.R ([): Bug fix for subsetting a document
312            collection's data frame.
313    
314    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
315    
316            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
317            to s_filter.
318    
319            * R/textdoccol.R: Local text documents' metadata can now be copied
320            to a document collection's data frame with prescind_meta.
321    
322    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
323    
324            * R/: Text documents' slot metadata is now accessible in s_filter.
325    
326            * R/: Rewrote s_filter function (has still some restrictions).
327    
328    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
329    
330            * R/: Various fixes in handling metadata.
331    
332            * R/: Added update mechanism for text document collections.
333    
334    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
335    
336            * R/: Merging of document collections now creates a binary tree
337            for reconstructing merged document collections.
338    
339            * R/: Redesign of metadata for document collections.
340    
341    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
342    
343            * R/: Messages now use \code{ngettext}.
344    
345    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
346    
347            * R/: Added functions for modifying and removing metadata.
348    
349    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
350    
351            * man/: Updated some documentation.
352    
353            * R/: Corrected some connection issues.
354    
355            * inst/doc: Worked on the vignette.
356    
357    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
358    
359            * inst/: Added texts and started vignette.
360    
361            * R/: Final changes based upon David's comments.
362    
363    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
364    
365            * NAMESPACE: Corrected exports (generic methods need exportMethods
366            directives!).
367    
368    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
369    
370            * R/: Modified the TextDocCol constructur and various parsers. It
371            is now modular and supports various file formats via plugins (see
372            the new "Source" class).
373    
374    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
375    
376            * man/: Revised documentation after previous code changes.
377    
378    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
379    
380            * R/: Remaining changes as discussed with David.
381    
382    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
383    
384            * R/: Some changes as suggested by David. The rest will follow
385            within the next days.
386    
387    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
388    
389            * man/: Finished documentation.
390    
391    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
392    
393            * man/: Wrote some documentation.
394    
395    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
396    
397            * R/: Further syntactic sugar in form of additional assignment and
398            accessor methods.
399    
400    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
401    
402            * R/: Syntactic sugar in form of "length", "show" and "summary"
403            operators.
404    
405    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
406    
407            * R/: Diverse updates. Mainly on default operators ("[" or "c")
408            and dissimilarities.
409    
410    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
411    
412            * R/: Added similarity functions.
413    
414            * data/: Added english stopwords.
415    
416    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * data/: Examples compiled for new features
419    
420            * R/: Changes due to new structure.
421    
422            * NAMESPACE: Corrected namespace to reflect new structure.
423    
424            * R/termdocmatrix.R: Adapted for new naming scheme.
425    
426    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
427    
428            * R/textdoccol.R: Adapted code for new class structure. Wrote
429            several transform and filter functions operating on text document
430            collections (alias text document databases).
431    
432            * R/aobjects.R: Adapted class structure with inheritance,
433            repositories and additional meta data. Loading files on demand is
434            now possible.
435    
436    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
437    
438            * R/: Some cosmetic cleanups.
439    
440            * inst/: Removed vignette on clustering. That and much more is now
441            described in the JSS paper on text mining. Based upon that
442            article an elaborated vignette will be incorporated in the future.
443    
444    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
445    
446            * R/: Updated generic S4 methods to comply with signature changes
447            in newer versions of R (> 2.3)
448    
449    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
450    
451            * ext/R/importRIS.R: Automatic RIS import is now possible.
452    
453    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/textdoccol.R: Added RIS HTML input format.
456    
457    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
458    
459            * R/textdoccol.R: Removed bug that caused invalid text document
460            collections when handling many input files.
461    
462    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
463    
464            * R/textdoccol.R: Restructured and extended file import
465            mechanism.
466    
467            * inst/doc/clustering.Rnw: Adapted vignette for use with
468            ReutNews.rda
469    
470            * man/ReutNews.Rd: Documentation for ReutNews.rda
471    
472            * data/ReutNews.rda: A tiny Reuters21578 example data set.
473    
474    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
475    
476            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
477            clustering facilities of this package.
478    
479    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
480    
481            * R/aobjects.R: Changed package document structure to avoid class
482            dependency problems.
483    
484    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
485    
486            * Wrote a script for the ModLewis Split for the Reuters-21578 XML
487            data set.
488    
489            * Finished documentation and reordered directory structure. Now "R
490            CMD check textmin" works without errors.
491    
492    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * src/: Various splits can now be easily created for the
495            Reuters21578 data set.
496    
497    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
498    
499            * Updated documentation
500    
501    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
502    
503            * Wrote R documentation for some classes and methods.
504    
505  2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
506    
507            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
508            files. See the questionnaire data/Umfrage.csv for such an example.
509            We are now able to import files in Reuters-21578 XML format.
510    
511          * Changed class interfaces in various files. Weighting of the text          * Changed class interfaces in various files. Weighting of the text
512          matrix is now possible.          matrix is now possible.
513    

Legend:
Removed from v.21  
changed lines
  Added in v.766

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge