SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1038, Fri Jan 15 12:12:41 2010 UTC
# Line 1  Line 1 
1    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
4            data.
5    
6    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/doc.R (`Content<-`): Be careful with names attribute.
9    
10    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
11    
12            * R/source.R (DirSource): Improved implementation especially when
13            handling many (>1M) files.
14    
15    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
16    
17            * R/source.R (getElem.URISource): Use encoding argument.
18    
19    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
20    
21            * R/doc.R (setOldClass): Register S3 document classes to be
22            recognized by S4 methods.
23    
24    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
25    
26            * R/matrix.R (termFreq): Add option to remove punctuation
27            characters.
28    
29    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
30    
31            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
32            merging multiple term-document matrices.
33    
34    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
35    
36            * R/corpus.R (setOldClass): Register S3 corpus classes to be
37            recognized by S4 methods.
38    
39            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
40            that CRAN Mac OS X builds do not fail any longer.
41    
42    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
43    
44            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
45            of RWeka:AlphabeticTokenizer() as default.
46    
47    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
48    
49            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
50            caused words at the beginning or the end of a line not to be removed. Do
51            not delete whitespace anymore.
52    
53    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
54    
55            * R/source.R (DirSource): Default to working directory if no path
56            is specified.
57    
58    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
59    
60            * R/source.R (DirSource): Stop on empty directories.
61    
62    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
63    
64            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
65            named documents.
66    
67    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
68    
69            * R/transform.R (removeWords): Improve regular expressions.
70    
71    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
72    
73            * R/meta.R (DublinCore): Allow lower case tags.
74    
75    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
76    
77            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
78            instead of x$children.
79    
80    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
81    
82            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
83    
84    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
85    
86            * R/: Use S3 instead of S4 class system.
87    
88    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/reader.R (readMail): Moved to tm.plugin.mail package.
91    
92    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
93    
94            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
95            postings are basically e-mails with some extra headers.
96    
97    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/transform.R: Move convertMboxEml, removeCitation,
100            removeMultipart, and removeSignature to the tm.plugin.mail package
101            since they are mainly utility functions (for handling e-mails) and
102            not very framework specific.
103    
104    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
105    
106            * man/: Fix documentation.
107    
108    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
109    
110            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
111            plain text document instead of an XML document for texts of the
112            Reuters-21578 dataset.
113    
114            * R/sparse.R: Removed since the slam package is now available on
115            CRAN.
116    
117            * DESCRIPTION (Depends): Add slam package.
118    
119    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
120    
121            * R/transform.R (stemDoc): Fix character(0) handling.
122    
123    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
124    
125            * R/doc.R (show): Pretty print.
126    
127    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
128    
129            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
130            gracefully.
131    
132    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
133    
134            * R/corpus.R: Make corpus virtual. Implement corpus with standard
135            and permanent storage semantics.
136    
137            * DESCRIPTION: New major release. A *lot* of improvements.
138    
139    2009-05-04   Ingo Feinerer <feinerer@logic.at>
140    
141            * NAMESPACE: Export some simple_triplet_matrix functions.
142    
143    2009-04-28   Ingo Feinerer <feinerer@logic.at>
144    
145            * R/weight.R: Adapt tf-idf to new matrix format.
146    
147    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
148    
149            * R/matrix.R: Create two distinct classes for term-document and
150            document-term matrices.
151    
152    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
153    
154            * R/termdocmatrix.R: No longer use Matrix package. This reduces
155            package start-up time significantly.
156    
157    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
158    
159            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
160    
161    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
162    
163            * R/transform.R (tmReduce): Combine multiple maps into one
164            transformation.
165    
166    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/weight.R: Remove weightLogical since it does not return a
169            dgCMatrix.
170    
171            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
172            or TermDocumentMatrix instead.
173    
174    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
175    
176            * inst/doc/extensions.Rnw: Finished vignette.
177    
178    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
179    
180            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
181            DocumentTermMatrix representations.
182    
183    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
184    
185            * R/reader.R (readXML): New reader for arbitrary XML files.
186    
187    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
188    
189            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
190            (XMLSource): New XMLSource class for arbitrary XML files.
191            (Source): New slot Vectorized.
192    
193    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
194    
195            * R/reader.R (readTabular): Experimental reader for tabular data
196            structures which can be customized via user-defined mappings.
197    
198            * R/reader.R: Always use UTC time zone.
199    
200            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
201    
202    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
203    
204            * R/reader.R (readDOC): Options can be passed over to antiword.
205    
206            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
207            pdftotext.
208    
209    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
210    
211            * R/source.R (DirSource): Add pattern and ignore.case arguments
212            which are internally passed over to list.files().
213    
214    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
215    
216            * inst/doc/tm.Rnw: Suppress pointless loading message.
217    
218    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
219    
220            * DESCRIPTION: Speed up package loading (via moving packages not
221            strictly necessary for normal operation to Suggests instead of
222            Depends).
223    
224    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
225    
226            * R/reader.R (readNewsgroup): The date format is now configurable.
227    
228    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
229    
230            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
231    
232    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
233    
234            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
235    
236    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
237    
238            * R/source.R (DataframeSource): New source class for data frames.
239    
240            * R/source.R: Fixed non-standard call evaluation.
241    
242    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
243    
244            * R/source.R (URISource): New source class for a single document.
245    
246    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
247    
248            * R/source.R: Refactoring.
249    
250    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
251    
252            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
253            Rmpi installations more gracefully.
254    
255    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
256    
257            * R/source.R (Source): Add Length slot.
258    
259    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
260    
261            * R/AAA.R: Unify duplicated .onLoad function.
262    
263    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
264    
265            * DESCRIPTION (Suggests): Added Rmpi.
266    
267    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
268    
269            * R/source.R (getElem): Fix 'no visible binding' warning.
270    
271            * man/WeightFunction.Rd: Fix signature.
272    
273    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
274    
275            * R/weight.R: Introduce name abbreviations for weighting functions.
276    
277    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
278    
279            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
280    
281            * R/cluster.R: Provide convenience functions for using a MPI
282            cluster.
283    
284            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
285            available.
286    
287            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
288            available.
289    
290    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
291    
292            * R/textdoccol.R (lapply): Removed debug print out.
293    
294    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
295    
296            * R/reader.R (readRCV1): Improved meta data extraction from
297            Reuters Corpus Volume 1 documents.
298    
299    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
300    
301            * R/transform.R: Ensure that all mappings preserve multiline
302            structures.
303    
304    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
305    
306            * R/filter.R: Every filter has now an attribute indicating whether
307            it sould be applied to document level (doclevel).
308    
309            * R/textdoccol.R (tmFilter): Set searchFullText as new default
310            filter.
311    
312    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
313    
314            * R/transform.R (replacePatterns): Replaced removeWords by
315            replacePatterns. Suggested by Christian Buchta.
316    
317            * R/textdoccol.R (inspect): Improved formatting.
318    
319    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
320    
321            * inst/CITATION: Updated JSS article information.
322    
323            * R/textdoccol.R (setAs): Added coerce method from list to
324            corpus.
325    
326            * R/meta.R (meta): Improved meta data handling.
327    
328    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
329    
330            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
331            Christian Buchta.
332    
333            * inst/CITATION: Added template to include JSS article reference.
334    
335    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
336    
337            * R/textdoccol.R (tmMap): Introduced lazy mapping.
338    
339            * R/source.R: Added VectorSource.
340    
341    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
342    
343            * man/: Language codes should be in ISO 639-1 format.
344    
345            * R/textdoccol.R (asPlain): Preserve local meta data.
346    
347    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
348    
349            * R/textdoccol.R (writeCorpus): Function for writing a corpus
350            containing plain text documents to disk.
351    
352    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
353    
354            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
355            always set correctly.
356    
357            * R/textdoccol.R: Set load = TRUE as default for load on demand
358            since in most cases this is the wanted behaviour.
359    
360    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
361    
362            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
363    
364            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
365    
366    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
367    
368            * R/meta.R (meta): New function for consistent access to meta data
369            of document collections, repositories, and texts.
370    
371    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
372    
373            * R/: Better support for encodings.
374    
375    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
376    
377            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
378            selection when no reader argument is given.
379    
380    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
381    
382            * R/source.R (CSVSource): Now uses read.csv instead of scan
383            internally.
384    
385    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
386    
387            * R/reader.R (getReaders): Returns available reader functions.
388    
389            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
390            as default.
391    
392    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
393    
394            * R/stopwords.R (stopwords): Shortened code, removed codetools
395            variable warnings.
396    
397            * man/: Documentation for showMeta, added an example for tmMap.
398    
399            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
400            some minor typos fixed.
401    
402    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
403    
404            * R/aobjects.R (showMeta): Added method for pretty printing a
405            text document's meta data.
406    
407    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
408    
409            * R/textdoccol.R (TextDocCol): Better handling of empty
410            arguments.
411    
412            * NAMESPACE: Exported readDOC.
413    
414            * man/completeStems.Rd: Added an example.
415    
416    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * R/stopwords.R (stopwords): Look up .dat files at every
419            call. Allows users to modify stopword .dat files interactively.
420    
421    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
422    
423            * R/termdocmatrix.R (termFreq): Correct processing of empty
424            documents.
425    
426    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
427    
428            * man/: Updated documentation.
429    
430    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
431    
432            * R/complete.R (completeStems): Completes (heuristically) word
433            stems.
434    
435            * R/termdocmatrix.R (TermDocMatrix2): New modular
436            constructor.
437    
438            * NAMESPACE: Exported termFreq.
439    
440    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
441    
442            * R/reader.R (readDOC): Added MS Word reader (using antiword).
443    
444    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
445    
446            * R/weight.R: Weighting functions for TermDocMatrix.
447    
448    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
449    
450            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
451            functions for accessing dimension, column, and row names.
452    
453            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
454    
455    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
456    
457            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
458    
459    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
460    
461            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
462    
463    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/reader.R (readPDF): Removed manual checks for pdftotext and
466            pdfinfo. The system call gives a warning anyway.
467    
468    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/textdoccol.R (asPlain): Conversion from
471            StructuredTextDocuments to PlainTextDocuments.
472    
473    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
476            for accessing term-document matrices.
477    
478            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
479            are installed.
480    
481    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
484            Christian Buchta.
485    
486    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
487    
488            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
489    
490    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
491    
492            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
493    
494            * R/reader.R (readPDF): Added PDF reader.
495    
496    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
497    
498            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
499    
500            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
501    
502            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
503    
504            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
505    
506    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * R/distmeasure.R (dissimilarity): Replaced dists call from
509            package cba by new dist call from package proxy.
510    
511    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
512    
513            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
514    
515    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
516    
517            * R/termdocmatrix.R: require() uses the quietly option to suppress
518            loading messages.
519    
520    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
521    
522            * R/dictionary.R: Added dictionary support.
523    
524    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
525    
526            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
527            documents. This simplifies some functions, e.g., asPlain.
528    
529    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
530    
531            * inst/doc/tm.Rnw: Fixed some typos in vignette.
532    
533    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
534    
535            * R/textdoccol.R (replaceWords): Added method to replace a set of
536            words by a single word. Useful for synonyms.
537    
538    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
539    
540            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
541    
542    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
543    
544            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
545            vectors. Thanks to Ariel Maguyon for his error report.
546            (removeSparseTerms): New function to remove columns from a
547            term-document matrix exceeding a sparse factor.
548    
549    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
550    
551            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
552    
553    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
554    
555            * man/sFilter.Rd: Corrected documentation on statement format (use
556            '==' instead of '=').
557    
558    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
559    
560            * R/aobjects.R (StructuredTextDocument): Inherits from
561            TextDocument.
562    
563    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
564    
565            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
566            on sparse matrices as proposed by Martin Maechler.
567    
568    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
571            \pkg{filehash} version makes them deprecated.
572    
573    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * R/termdocmatrix.R (textvector): Stemming is now performed before
576            erasing stopwords.
577            (weightMatrix): Adapted to handle sparse matrices.
578            (TermDocMatrix): Sparse matrix is now efficiently built by
579            direct stepwise insertion of row values into it.
580    
581    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
582    
583            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
584            due to ongoing problems. For our purposes the latter is as useful
585            as the replaced package.
586    
587    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
588    
589            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
590    
591            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
592    
593    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
594    
595            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
596            languages with available stopwords.
597    
598    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
599    
600            * inst/doc/tm.Rnw: Minor corrections in the vignette.
601    
602    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
603    
604            * DESCRIPTION: Update to version 0.2, since a lot of new features
605            have been integrated.
606    
607            * inst/stopwords: Updated existing stopwords and added stopwords
608            for various other languages.
609    
610    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
611    
612            * man/: Updated documentation.
613    
614            * Work/testDb.R: Script to test database stuff.
615    
616            * R/: Fixed various database related bugs. Seems to be rather
617            useable now, i.e., consider as alpha status for now.
618    
619    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
620    
621            * R/: Fixed some bugs related to database support.
622    
623    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
624    
625            * man/: Added a lot of examples to the manuals.
626    
627    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
628    
629            * man/: Updated parts of the documentation.
630    
631            * R/textdoccol.R (asPlain): Added conversion from newsgroup
632            documents to plain text documents.
633    
634    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
635    
636            * R/textdoccol.R: Finished experimental database support. Not yet
637            intensively tested.
638    
639            * R/source.R: Now each source has a default reader.
640    
641            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
642            class anymore.
643    
644            * R/plaintextdoc.R: Custom show method for plain text documents.
645    
646            * R/aobjects.R: Added a class for structured text documents.
647    
648            * R/reader.R: Replaced remaining \code{parser} occurrences with
649            \code{reader}.
650    
651            * R/textdoccol.R (summary): Indent tags.
652    
653            * R/textdoccol.R (removePunctuation): Transform method to remove
654            punctuation marks.
655    
656    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
657    
658            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
659            using prescindMeta().
660    
661    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
662    
663            * R/textdoccol.R: Improved database support.
664    
665    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
666    
667            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
668    
669            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
670            language code.
671    
672            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
673            into parserControl argument.
674    
675            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
676    
677    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
678    
679            * Work/tmDataSetup.R: The datasets acq and crude can now be
680            created on the fly.
681    
682            * R/stopwords.R: Introduced a function returning the stopwords for
683            a given language (English, German and French at the moment)
684    
685            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
686            otherwise falls back to Snowball package.
687    
688    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
689    
690            * man/dissimilarity-methods.Rd: Make clear that any method offered
691            by "dists" from package "cba" can be used.
692    
693    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
694    
695            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
696            to Kurt's latex suggestion. Removed points and underscores in
697            variable names for consistent naming.
698    
699            * DESCRIPTION: Update to version 0.1-2.
700    
701            * man/TextRepository.Rd: Fixed bug in documentation.
702    
703    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
704    
705            * DESCRIPTION: Update to version 0.1-1.
706    
707    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
708    
709            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
710            wordStem.
711    
712    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
713    
714            * R/: Changes due to Kurt's review.
715    
716    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
717    
718            * R/: Implemented improvements based upon comments by David
719            Meyer.
720    
721    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
722    
723            * inst/doc/: Rewrote vignette.
724    
725            * man/: Improved documentation.
726    
727    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
728    
729            * man/: Updated documentation.
730    
731            * DESCRIPTION: Changed package name to "tm". Updated version to
732            0.1 for first CRAN release.
733    
734            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
735            list archive example.
736    
737            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
738            archive example.
739    
740            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
741            from (several mails per box) mbox format to (single mail per file)
742            eml format.
743    
744    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
745    
746            * data/crude.rda: Rebuilt.
747    
748            * data/acq.rda: Rebuilt.
749    
750            * R/reader.R: Factored out reader and parser methods from
751            textdoccol.R.
752    
753            * R/source.R: Factored out Source methods from aobjects.R and
754            textdoccol.R.
755            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
756            feeds.
757    
758            * R/textdoccol.R (DirSource): Added support for recursive
759            traversal of directories.
760    
761    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * R/textdoccol.R ([[): Loads the document corpus automatically
764            into memory upon access.
765            (tm_transform, tm_filter): Removed several checks whether the
766            document is already loaded ([[ ensures this now).
767            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
768            mailing list archive.
769    
770    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
771    
772            * R/aobjects.R (TextDocument): Is now a virtual class.
773            (Source): Is now a virtual class.
774    
775    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
776    
777            * R/textdoccol.R (c): Support for an arbitrary number of document
778            collections.
779    
780    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
781    
782            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
783            append_meta and remove_meta.
784    
785            * R/textdoccol.R: Removed modify_metadata method.
786    
787            * R/textrepo.R: Removed modify_metadata method.
788    
789            * R/textdoccol.R (remove_meta): Supports removal of document
790            collection metadata and document (= in data frame) metadata.
791    
792    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
793    
794            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
795    
796            * data/crude.rda: Rebuilt.
797    
798            * data/acq.rda: Rebuilt.
799    
800            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
801    
802            * R/textdoccol.R ([): Bug fix for subsetting a document
803            collection's data frame.
804    
805    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
806    
807            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
808            to s_filter.
809    
810            * R/textdoccol.R: Local text documents' metadata can now be copied
811            to a document collection's data frame with prescind_meta.
812    
813    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
814    
815            * R/: Text documents' slot metadata is now accessible in s_filter.
816    
817            * R/: Rewrote s_filter function (has still some restrictions).
818    
819    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
820    
821            * R/: Various fixes in handling metadata.
822    
823            * R/: Added update mechanism for text document collections.
824    
825    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
826    
827            * R/: Merging of document collections now creates a binary tree
828            for reconstructing merged document collections.
829    
830            * R/: Redesign of metadata for document collections.
831    
832    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
833    
834            * R/: Messages now use \code{ngettext}.
835    
836    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
837    
838            * R/: Added functions for modifying and removing metadata.
839    
840    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
841    
842            * man/: Updated some documentation.
843    
844            * R/: Corrected some connection issues.
845    
846            * inst/doc: Worked on the vignette.
847    
848    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
849    
850            * inst/: Added texts and started vignette.
851    
852            * R/: Final changes based upon David's comments.
853    
854    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
855    
856            * NAMESPACE: Corrected exports (generic methods need exportMethods
857            directives!).
858    
859    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
860    
861            * R/: Modified the TextDocCol constructur and various parsers. It
862            is now modular and supports various file formats via plugins (see
863            the new "Source" class).
864    
865    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
866    
867            * man/: Revised documentation after previous code changes.
868    
869    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
870    
871            * R/: Remaining changes as discussed with David.
872    
873    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
874    
875            * R/: Some changes as suggested by David. The rest will follow
876            within the next days.
877    
878    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
879    
880            * man/: Finished documentation.
881    
882    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
883    
884            * man/: Wrote some documentation.
885    
886    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
887    
888            * R/: Further syntactic sugar in form of additional assignment and
889            accessor methods.
890    
891    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
892    
893            * R/: Syntactic sugar in form of "length", "show" and "summary"
894            operators.
895    
896    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
897    
898            * R/: Diverse updates. Mainly on default operators ("[" or "c")
899            and dissimilarities.
900    
901    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
902    
903            * R/: Added similarity functions.
904    
905            * data/: Added english stopwords.
906    
907    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
908    
909            * data/: Examples compiled for new features
910    
911            * R/: Changes due to new structure.
912    
913            * NAMESPACE: Corrected namespace to reflect new structure.
914    
915            * R/termdocmatrix.R: Adapted for new naming scheme.
916    
917    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
918    
919            * R/textdoccol.R: Adapted code for new class structure. Wrote
920            several transform and filter functions operating on text document
921            collections (alias text document databases).
922    
923            * R/aobjects.R: Adapted class structure with inheritance,
924            repositories and additional meta data. Loading files on demand is
925            now possible.
926    
927    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
928    
929            * R/: Some cosmetic cleanups.
930    
931            * inst/: Removed vignette on clustering. That and much more is now
932            described in the JSS paper on text mining. Based upon that
933            article an elaborated vignette will be incorporated in the future.
934    
935    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
936    
937            * R/: Updated generic S4 methods to comply with signature changes
938            in newer versions of R (> 2.3)
939    
940    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
941    
942            * ext/R/importRIS.R: Automatic RIS import is now possible.
943    
944    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
945    
946            * R/textdoccol.R: Added RIS HTML input format.
947    
948    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
949    
950            * R/textdoccol.R: Removed bug that caused invalid text document
951            collections when handling many input files.
952    
953    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
954    
955            * R/textdoccol.R: Restructured and extended file import
956            mechanism.
957    
958            * inst/doc/clustering.Rnw: Adapted vignette for use with
959            ReutNews.rda
960    
961            * man/ReutNews.Rd: Documentation for ReutNews.rda
962    
963            * data/ReutNews.rda: A tiny Reuters21578 example data set.
964    
965    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
966    
967            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
968            clustering facilities of this package.
969    
970    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
971    
972            * R/aobjects.R: Changed package document structure to avoid class
973            dependency problems.
974    
975    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
976    
977            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
978            data set.
979    
980            *  Finished documentation and reordered directory structure. Now "R
981            CMD check textmin" works without errors.
982    
983    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * src/: Various splits can now be easily created for the
986            Reuters21578 data set.
987    
988    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
989    
990            *  Updated documentation
991    
992    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
993    
994            *  Wrote R documentation for some classes and methods.
995    
996    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
997    
998            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
999            files. See the questionnaire data/Umfrage.csv for such an example.
1000            We are now able to import files in Reuters-21578 XML format.
1001    
1002            *  Changed class interfaces in various files. Weighting of the text
1003            matrix is now possible.
1004    
1005    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1006    
1007            * R/textdoccol.R: One can build term-document matrices if
1008            nessecary (with buildTDM(...)) and fill the field tdm from a text
1009            document collection with it.
1010    
1011            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1012    
1013    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1014    
1015            * R/textdoccol.R: We now can read in a whole XML file with several
1016            news items.
1017    
1018  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1019    
1020          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1038

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge