SCM

SCM Repository

[tm] Diff of /trunk/tm/ChangeLog
ViewVC logotype

Diff of /trunk/tm/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 36, Wed Jan 11 15:42:56 2006 UTC trunk/tm/ChangeLog revision 775, Sat Jul 28 13:57:02 2007 UTC
# Line 1  Line 1 
1    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
2    
3            * R/textdoccol.R (asPlain): Conversion from
4            StructuredTextDocuments to PlainTextDocuments.
5    
6    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
7    
8            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
9            for accessing term-document matrices.
10    
11            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
12            are installed.
13    
14    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
15    
16            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
17            Christian Buchta.
18    
19    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
20    
21            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
22    
23    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
24    
25            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
26    
27            * R/reader.R (readPDF): Added PDF reader.
28    
29    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
30    
31            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
32    
33            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
34    
35            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
36    
37            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
38    
39    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
40    
41            * R/distmeasure.R (dissimilarity): Replaced dists call from
42            package cba by new dist call from package proxy.
43    
44    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
45    
46            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
47    
48    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
49    
50            * R/termdocmatrix.R: require() uses the quietly option to suppress
51            loading messages.
52    
53    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
54    
55            * R/dictionary.R: Added dictionary support.
56    
57    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
58    
59            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
60            documents. This simplifies some functions, e.g., asPlain.
61    
62    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
63    
64            * inst/doc/tm.Rnw: Fixed some typos in vignette.
65    
66    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
67    
68            * R/textdoccol.R (replaceWords): Added method to replace a set of
69            words by a single word. Useful for synonyms.
70    
71    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
72    
73            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
74    
75    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
76    
77            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
78            vectors. Thanks to Ariel Maguyon for his error report.
79            (removeSparseTerms): New function to remove columns from a
80            term-document matrix exceeding a sparse factor.
81    
82    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
83    
84            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
85    
86    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
87    
88            * man/sFilter.Rd: Corrected documentation on statement format (use
89            '==' instead of '=').
90    
91    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
92    
93            * R/aobjects.R (StructuredTextDocument): Inherits from
94            TextDocument.
95    
96    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
97    
98            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
99            on sparse matrices as proposed by Martin Maechler.
100    
101    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
102    
103            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
104            \pkg{filehash} version makes them deprecated.
105    
106    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
107    
108            * R/termdocmatrix.R (textvector): Stemming is now performed before
109            erasing stopwords.
110            (weightMatrix): Adapted to handle sparse matrices.
111            (TermDocMatrix): Sparse matrix is now efficiently built by
112            direct stepwise insertion of row values into it.
113    
114    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
115    
116            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
117            due to ongoing problems. For our purposes the latter is as useful
118            as the replaced package.
119    
120    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
121    
122            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
123    
124            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
125    
126    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
127    
128            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
129            languages with available stopwords.
130    
131    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
132    
133            * inst/doc/tm.Rnw: Minor corrections in the vignette.
134    
135    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
136    
137            * DESCRIPTION: Update to version 0.2, since a lot of new features
138            have been integrated.
139    
140            * inst/stopwords: Updated existing stopwords and added stopwords
141            for various other languages.
142    
143    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
144    
145            * man/: Updated documentation.
146    
147            * Work/testDb.R: Script to test database stuff.
148    
149            * R/: Fixed various database related bugs. Seems to be rather
150            useable now, i.e., consider as alpha status for now.
151    
152    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
153    
154            * R/: Fixed some bugs related to database support.
155    
156    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
157    
158            * man/: Added a lot of examples to the manuals.
159    
160    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
161    
162            * man/: Updated parts of the documentation.
163    
164            * R/textdoccol.R (asPlain): Added conversion from newsgroup
165            documents to plain text documents.
166    
167    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
168    
169            * R/textdoccol.R: Finished experimental database support. Not yet
170            intensively tested.
171    
172            * R/source.R: Now each source has a default reader.
173    
174            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
175            class anymore.
176    
177            * R/plaintextdoc.R: Custom show method for plain text documents.
178    
179            * R/aobjects.R: Added a class for structured text documents.
180    
181            * R/reader.R: Replaced remaining \code{parser} occurrences with
182            \code{reader}.
183    
184            * R/textdoccol.R (summary): Indent tags.
185    
186            * R/textdoccol.R (removePunctuation): Transform method to remove
187            punctuation marks.
188    
189    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
190    
191            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
192            using prescindMeta().
193    
194    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
195    
196            * R/textdoccol.R: Improved database support.
197    
198    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
199    
200            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
201    
202            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
203            language code.
204    
205            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
206            into parserControl argument.
207    
208            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
209    
210    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
211    
212            * Work/tmDataSetup.R: The datasets acq and crude can now be
213            created on the fly.
214    
215            * R/stopwords.R: Introduced a function returning the stopwords for
216            a given language (English, German and French at the moment)
217    
218            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
219            otherwise falls back to Snowball package.
220    
221    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
222    
223            * man/dissimilarity-methods.Rd: Make clear that any method offered
224            by "dists" from package "cba" can be used.
225    
226    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
227    
228            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
229            to Kurt's latex suggestion. Removed points and underscores in
230            variable names for consistent naming.
231    
232            * DESCRIPTION: Update to version 0.1-2.
233    
234            * man/TextRepository.Rd: Fixed bug in documentation.
235    
236    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
237    
238            * DESCRIPTION: Update to version 0.1-1.
239    
240    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
241    
242            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
243            wordStem.
244    
245    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
246    
247            * R/: Changes due to Kurt's review.
248    
249    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
250    
251            * R/: Implemented improvements based upon comments by David
252            Meyer.
253    
254    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
255    
256            * inst/doc/: Rewrote vignette.
257    
258            * man/: Improved documentation.
259    
260    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
261    
262            * man/: Updated documentation.
263    
264            * DESCRIPTION: Changed package name to "tm". Updated version to
265            0.1 for first CRAN release.
266    
267            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
268            list archive example.
269    
270            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
271            archive example.
272    
273            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
274            from (several mails per box) mbox format to (single mail per file)
275            eml format.
276    
277    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
278    
279            * data/crude.rda: Rebuilt.
280    
281            * data/acq.rda: Rebuilt.
282    
283            * R/reader.R: Factored out reader and parser methods from
284            textdoccol.R.
285    
286            * R/source.R: Factored out Source methods from aobjects.R and
287            textdoccol.R.
288            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
289            feeds.
290    
291            * R/textdoccol.R (DirSource): Added support for recursive
292            traversal of directories.
293    
294    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
295    
296            * R/textdoccol.R ([[): Loads the document corpus automatically
297            into memory upon access.
298            (tm_transform, tm_filter): Removed several checks whether the
299            document is already loaded ([[ ensures this now).
300            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
301            mailing list archive.
302    
303    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
304    
305            * R/aobjects.R (TextDocument): Is now a virtual class.
306            (Source): Is now a virtual class.
307    
308    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
309    
310            * R/textdoccol.R (c): Support for an arbitrary number of document
311            collections.
312    
313    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
314    
315            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
316            append_meta and remove_meta.
317    
318            * R/textdoccol.R: Removed modify_metadata method.
319    
320            * R/textrepo.R: Removed modify_metadata method.
321    
322            * R/textdoccol.R (remove_meta): Supports removal of document
323            collection metadata and document (= in data frame) metadata.
324    
325    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
326    
327            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
328    
329            * data/crude.rda: Rebuilt.
330    
331            * data/acq.rda: Rebuilt.
332    
333            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
334    
335            * R/textdoccol.R ([): Bug fix for subsetting a document
336            collection's data frame.
337    
338    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
339    
340            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
341            to s_filter.
342    
343            * R/textdoccol.R: Local text documents' metadata can now be copied
344            to a document collection's data frame with prescind_meta.
345    
346    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
347    
348            * R/: Text documents' slot metadata is now accessible in s_filter.
349    
350            * R/: Rewrote s_filter function (has still some restrictions).
351    
352    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
353    
354            * R/: Various fixes in handling metadata.
355    
356            * R/: Added update mechanism for text document collections.
357    
358    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
359    
360            * R/: Merging of document collections now creates a binary tree
361            for reconstructing merged document collections.
362    
363            * R/: Redesign of metadata for document collections.
364    
365    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
366    
367            * R/: Messages now use \code{ngettext}.
368    
369    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
370    
371            * R/: Added functions for modifying and removing metadata.
372    
373    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
374    
375            * man/: Updated some documentation.
376    
377            * R/: Corrected some connection issues.
378    
379            * inst/doc: Worked on the vignette.
380    
381    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
382    
383            * inst/: Added texts and started vignette.
384    
385            * R/: Final changes based upon David's comments.
386    
387    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
388    
389            * NAMESPACE: Corrected exports (generic methods need exportMethods
390            directives!).
391    
392    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
393    
394            * R/: Modified the TextDocCol constructur and various parsers. It
395            is now modular and supports various file formats via plugins (see
396            the new "Source" class).
397    
398    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
399    
400            * man/: Revised documentation after previous code changes.
401    
402    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
403    
404            * R/: Remaining changes as discussed with David.
405    
406    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
407    
408            * R/: Some changes as suggested by David. The rest will follow
409            within the next days.
410    
411    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * man/: Finished documentation.
414    
415    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
416    
417            * man/: Wrote some documentation.
418    
419    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
420    
421            * R/: Further syntactic sugar in form of additional assignment and
422            accessor methods.
423    
424    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
425    
426            * R/: Syntactic sugar in form of "length", "show" and "summary"
427            operators.
428    
429    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
430    
431            * R/: Diverse updates. Mainly on default operators ("[" or "c")
432            and dissimilarities.
433    
434    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * R/: Added similarity functions.
437    
438            * data/: Added english stopwords.
439    
440    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
441    
442            * data/: Examples compiled for new features
443    
444            * R/: Changes due to new structure.
445    
446            * NAMESPACE: Corrected namespace to reflect new structure.
447    
448            * R/termdocmatrix.R: Adapted for new naming scheme.
449    
450    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
451    
452            * R/textdoccol.R: Adapted code for new class structure. Wrote
453            several transform and filter functions operating on text document
454            collections (alias text document databases).
455    
456            * R/aobjects.R: Adapted class structure with inheritance,
457            repositories and additional meta data. Loading files on demand is
458            now possible.
459    
460    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
461    
462            * R/: Some cosmetic cleanups.
463    
464            * inst/: Removed vignette on clustering. That and much more is now
465            described in the JSS paper on text mining. Based upon that
466            article an elaborated vignette will be incorporated in the future.
467    
468    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/: Updated generic S4 methods to comply with signature changes
471            in newer versions of R (> 2.3)
472    
473    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * ext/R/importRIS.R: Automatic RIS import is now possible.
476    
477    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
478    
479            * R/textdoccol.R: Added RIS HTML input format.
480    
481    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * R/textdoccol.R: Removed bug that caused invalid text document
484            collections when handling many input files.
485    
486  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
487    
488            * R/textdoccol.R: Restructured and extended file import
489            mechanism.
490    
491          * inst/doc/clustering.Rnw: Adapted vignette for use with          * inst/doc/clustering.Rnw: Adapted vignette for use with
492          ReutNews.rda          ReutNews.rda
493    

Legend:
Removed from v.36  
changed lines
  Added in v.775

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge