SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1039, Fri Jan 22 13:01:33 2010 UTC
# Line 1  Line 1 
1    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/transform.R (stemDocument.character): In case input is a
4            simple character just delegate to the default Snowball stemmer.
5    
6    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
9            data.
10    
11    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/doc.R (`Content<-`): Be careful with names attribute.
14    
15    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
16    
17            * R/source.R (DirSource): Improved implementation especially when
18            handling many (>1M) files.
19    
20    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
21    
22            * R/source.R (getElem.URISource): Use encoding argument.
23    
24    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
25    
26            * R/doc.R (setOldClass): Register S3 document classes to be
27            recognized by S4 methods.
28    
29    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
30    
31            * R/matrix.R (termFreq): Add option to remove punctuation
32            characters.
33    
34    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
35    
36            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
37            merging multiple term-document matrices.
38    
39    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
40    
41            * R/corpus.R (setOldClass): Register S3 corpus classes to be
42            recognized by S4 methods.
43    
44            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
45            that CRAN Mac OS X builds do not fail any longer.
46    
47    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
48    
49            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
50            of RWeka:AlphabeticTokenizer() as default.
51    
52    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
53    
54            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
55            caused words at the beginning or the end of a line not to be removed. Do
56            not delete whitespace anymore.
57    
58    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
59    
60            * R/source.R (DirSource): Default to working directory if no path
61            is specified.
62    
63    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/source.R (DirSource): Stop on empty directories.
66    
67    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
68    
69            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
70            named documents.
71    
72    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
73    
74            * R/transform.R (removeWords): Improve regular expressions.
75    
76    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
77    
78            * R/meta.R (DublinCore): Allow lower case tags.
79    
80    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
81    
82            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
83            instead of x$children.
84    
85    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
86    
87            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
88    
89    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
90    
91            * R/: Use S3 instead of S4 class system.
92    
93    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
94    
95            * R/reader.R (readMail): Moved to tm.plugin.mail package.
96    
97    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
100            postings are basically e-mails with some extra headers.
101    
102    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/transform.R: Move convertMboxEml, removeCitation,
105            removeMultipart, and removeSignature to the tm.plugin.mail package
106            since they are mainly utility functions (for handling e-mails) and
107            not very framework specific.
108    
109    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
110    
111            * man/: Fix documentation.
112    
113    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
114    
115            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
116            plain text document instead of an XML document for texts of the
117            Reuters-21578 dataset.
118    
119            * R/sparse.R: Removed since the slam package is now available on
120            CRAN.
121    
122            * DESCRIPTION (Depends): Add slam package.
123    
124    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
125    
126            * R/transform.R (stemDoc): Fix character(0) handling.
127    
128    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
129    
130            * R/doc.R (show): Pretty print.
131    
132    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
133    
134            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
135            gracefully.
136    
137    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
138    
139            * R/corpus.R: Make corpus virtual. Implement corpus with standard
140            and permanent storage semantics.
141    
142            * DESCRIPTION: New major release. A *lot* of improvements.
143    
144    2009-05-04   Ingo Feinerer <feinerer@logic.at>
145    
146            * NAMESPACE: Export some simple_triplet_matrix functions.
147    
148    2009-04-28   Ingo Feinerer <feinerer@logic.at>
149    
150            * R/weight.R: Adapt tf-idf to new matrix format.
151    
152    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
153    
154            * R/matrix.R: Create two distinct classes for term-document and
155            document-term matrices.
156    
157    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
158    
159            * R/termdocmatrix.R: No longer use Matrix package. This reduces
160            package start-up time significantly.
161    
162    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
163    
164            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
165    
166    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/transform.R (tmReduce): Combine multiple maps into one
169            transformation.
170    
171    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
172    
173            * R/weight.R: Remove weightLogical since it does not return a
174            dgCMatrix.
175    
176            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
177            or TermDocumentMatrix instead.
178    
179    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
180    
181            * inst/doc/extensions.Rnw: Finished vignette.
182    
183    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
184    
185            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
186            DocumentTermMatrix representations.
187    
188    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
189    
190            * R/reader.R (readXML): New reader for arbitrary XML files.
191    
192    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
193    
194            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
195            (XMLSource): New XMLSource class for arbitrary XML files.
196            (Source): New slot Vectorized.
197    
198    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
199    
200            * R/reader.R (readTabular): Experimental reader for tabular data
201            structures which can be customized via user-defined mappings.
202    
203            * R/reader.R: Always use UTC time zone.
204    
205            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
206    
207    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
208    
209            * R/reader.R (readDOC): Options can be passed over to antiword.
210    
211            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
212            pdftotext.
213    
214    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
215    
216            * R/source.R (DirSource): Add pattern and ignore.case arguments
217            which are internally passed over to list.files().
218    
219    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
220    
221            * inst/doc/tm.Rnw: Suppress pointless loading message.
222    
223    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
224    
225            * DESCRIPTION: Speed up package loading (via moving packages not
226            strictly necessary for normal operation to Suggests instead of
227            Depends).
228    
229    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
230    
231            * R/reader.R (readNewsgroup): The date format is now configurable.
232    
233    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
234    
235            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
236    
237    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
238    
239            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
240    
241    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
242    
243            * R/source.R (DataframeSource): New source class for data frames.
244    
245            * R/source.R: Fixed non-standard call evaluation.
246    
247    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
248    
249            * R/source.R (URISource): New source class for a single document.
250    
251    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
252    
253            * R/source.R: Refactoring.
254    
255    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
256    
257            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
258            Rmpi installations more gracefully.
259    
260    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
261    
262            * R/source.R (Source): Add Length slot.
263    
264    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
265    
266            * R/AAA.R: Unify duplicated .onLoad function.
267    
268    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
269    
270            * DESCRIPTION (Suggests): Added Rmpi.
271    
272    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
273    
274            * R/source.R (getElem): Fix 'no visible binding' warning.
275    
276            * man/WeightFunction.Rd: Fix signature.
277    
278    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
279    
280            * R/weight.R: Introduce name abbreviations for weighting functions.
281    
282    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
283    
284            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
285    
286            * R/cluster.R: Provide convenience functions for using a MPI
287            cluster.
288    
289            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
290            available.
291    
292            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
293            available.
294    
295    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
296    
297            * R/textdoccol.R (lapply): Removed debug print out.
298    
299    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
300    
301            * R/reader.R (readRCV1): Improved meta data extraction from
302            Reuters Corpus Volume 1 documents.
303    
304    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
305    
306            * R/transform.R: Ensure that all mappings preserve multiline
307            structures.
308    
309    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
310    
311            * R/filter.R: Every filter has now an attribute indicating whether
312            it sould be applied to document level (doclevel).
313    
314            * R/textdoccol.R (tmFilter): Set searchFullText as new default
315            filter.
316    
317    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
318    
319            * R/transform.R (replacePatterns): Replaced removeWords by
320            replacePatterns. Suggested by Christian Buchta.
321    
322            * R/textdoccol.R (inspect): Improved formatting.
323    
324    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
325    
326            * inst/CITATION: Updated JSS article information.
327    
328            * R/textdoccol.R (setAs): Added coerce method from list to
329            corpus.
330    
331            * R/meta.R (meta): Improved meta data handling.
332    
333    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
334    
335            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
336            Christian Buchta.
337    
338            * inst/CITATION: Added template to include JSS article reference.
339    
340    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
341    
342            * R/textdoccol.R (tmMap): Introduced lazy mapping.
343    
344            * R/source.R: Added VectorSource.
345    
346    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
347    
348            * man/: Language codes should be in ISO 639-1 format.
349    
350            * R/textdoccol.R (asPlain): Preserve local meta data.
351    
352    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
353    
354            * R/textdoccol.R (writeCorpus): Function for writing a corpus
355            containing plain text documents to disk.
356    
357    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
358    
359            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
360            always set correctly.
361    
362            * R/textdoccol.R: Set load = TRUE as default for load on demand
363            since in most cases this is the wanted behaviour.
364    
365    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
366    
367            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
368    
369            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
370    
371    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
372    
373            * R/meta.R (meta): New function for consistent access to meta data
374            of document collections, repositories, and texts.
375    
376    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
377    
378            * R/: Better support for encodings.
379    
380    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
381    
382            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
383            selection when no reader argument is given.
384    
385    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
386    
387            * R/source.R (CSVSource): Now uses read.csv instead of scan
388            internally.
389    
390    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
391    
392            * R/reader.R (getReaders): Returns available reader functions.
393    
394            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
395            as default.
396    
397    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
398    
399            * R/stopwords.R (stopwords): Shortened code, removed codetools
400            variable warnings.
401    
402            * man/: Documentation for showMeta, added an example for tmMap.
403    
404            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
405            some minor typos fixed.
406    
407    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
408    
409            * R/aobjects.R (showMeta): Added method for pretty printing a
410            text document's meta data.
411    
412    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
413    
414            * R/textdoccol.R (TextDocCol): Better handling of empty
415            arguments.
416    
417            * NAMESPACE: Exported readDOC.
418    
419            * man/completeStems.Rd: Added an example.
420    
421    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
422    
423            * R/stopwords.R (stopwords): Look up .dat files at every
424            call. Allows users to modify stopword .dat files interactively.
425    
426    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
427    
428            * R/termdocmatrix.R (termFreq): Correct processing of empty
429            documents.
430    
431    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
432    
433            * man/: Updated documentation.
434    
435    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
436    
437            * R/complete.R (completeStems): Completes (heuristically) word
438            stems.
439    
440            * R/termdocmatrix.R (TermDocMatrix2): New modular
441            constructor.
442    
443            * NAMESPACE: Exported termFreq.
444    
445    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
446    
447            * R/reader.R (readDOC): Added MS Word reader (using antiword).
448    
449    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
450    
451            * R/weight.R: Weighting functions for TermDocMatrix.
452    
453    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
456            functions for accessing dimension, column, and row names.
457    
458            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
459    
460    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
461    
462            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
463    
464    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
465    
466            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
467    
468    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/reader.R (readPDF): Removed manual checks for pdftotext and
471            pdfinfo. The system call gives a warning anyway.
472    
473    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/textdoccol.R (asPlain): Conversion from
476            StructuredTextDocuments to PlainTextDocuments.
477    
478    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
481            for accessing term-document matrices.
482    
483            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
484            are installed.
485    
486    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
487    
488            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
489            Christian Buchta.
490    
491    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
492    
493            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
494    
495    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
498    
499            * R/reader.R (readPDF): Added PDF reader.
500    
501    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
502    
503            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
504    
505            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
506    
507            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
508    
509            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
510    
511    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
512    
513            * R/distmeasure.R (dissimilarity): Replaced dists call from
514            package cba by new dist call from package proxy.
515    
516    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
519    
520    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
521    
522            * R/termdocmatrix.R: require() uses the quietly option to suppress
523            loading messages.
524    
525    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
526    
527            * R/dictionary.R: Added dictionary support.
528    
529    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
530    
531            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
532            documents. This simplifies some functions, e.g., asPlain.
533    
534    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
535    
536            * inst/doc/tm.Rnw: Fixed some typos in vignette.
537    
538    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
539    
540            * R/textdoccol.R (replaceWords): Added method to replace a set of
541            words by a single word. Useful for synonyms.
542    
543    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
546    
547    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
548    
549            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
550            vectors. Thanks to Ariel Maguyon for his error report.
551            (removeSparseTerms): New function to remove columns from a
552            term-document matrix exceeding a sparse factor.
553    
554    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
557    
558    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
559    
560            * man/sFilter.Rd: Corrected documentation on statement format (use
561            '==' instead of '=').
562    
563    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
564    
565            * R/aobjects.R (StructuredTextDocument): Inherits from
566            TextDocument.
567    
568    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
571            on sparse matrices as proposed by Martin Maechler.
572    
573    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
576            \pkg{filehash} version makes them deprecated.
577    
578    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
579    
580            * R/termdocmatrix.R (textvector): Stemming is now performed before
581            erasing stopwords.
582            (weightMatrix): Adapted to handle sparse matrices.
583            (TermDocMatrix): Sparse matrix is now efficiently built by
584            direct stepwise insertion of row values into it.
585    
586    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
587    
588            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
589            due to ongoing problems. For our purposes the latter is as useful
590            as the replaced package.
591    
592    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
593    
594            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
595    
596            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
597    
598    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
599    
600            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
601            languages with available stopwords.
602    
603    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
604    
605            * inst/doc/tm.Rnw: Minor corrections in the vignette.
606    
607    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
608    
609            * DESCRIPTION: Update to version 0.2, since a lot of new features
610            have been integrated.
611    
612            * inst/stopwords: Updated existing stopwords and added stopwords
613            for various other languages.
614    
615    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
616    
617            * man/: Updated documentation.
618    
619            * Work/testDb.R: Script to test database stuff.
620    
621            * R/: Fixed various database related bugs. Seems to be rather
622            useable now, i.e., consider as alpha status for now.
623    
624    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
625    
626            * R/: Fixed some bugs related to database support.
627    
628    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
629    
630            * man/: Added a lot of examples to the manuals.
631    
632    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
633    
634            * man/: Updated parts of the documentation.
635    
636            * R/textdoccol.R (asPlain): Added conversion from newsgroup
637            documents to plain text documents.
638    
639    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
640    
641            * R/textdoccol.R: Finished experimental database support. Not yet
642            intensively tested.
643    
644            * R/source.R: Now each source has a default reader.
645    
646            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
647            class anymore.
648    
649            * R/plaintextdoc.R: Custom show method for plain text documents.
650    
651            * R/aobjects.R: Added a class for structured text documents.
652    
653            * R/reader.R: Replaced remaining \code{parser} occurrences with
654            \code{reader}.
655    
656            * R/textdoccol.R (summary): Indent tags.
657    
658            * R/textdoccol.R (removePunctuation): Transform method to remove
659            punctuation marks.
660    
661    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
662    
663            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
664            using prescindMeta().
665    
666    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
667    
668            * R/textdoccol.R: Improved database support.
669    
670    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
673    
674            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
675            language code.
676    
677            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
678            into parserControl argument.
679    
680            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
681    
682    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
683    
684            * Work/tmDataSetup.R: The datasets acq and crude can now be
685            created on the fly.
686    
687            * R/stopwords.R: Introduced a function returning the stopwords for
688            a given language (English, German and French at the moment)
689    
690            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
691            otherwise falls back to Snowball package.
692    
693    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
694    
695            * man/dissimilarity-methods.Rd: Make clear that any method offered
696            by "dists" from package "cba" can be used.
697    
698    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
699    
700            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
701            to Kurt's latex suggestion. Removed points and underscores in
702            variable names for consistent naming.
703    
704            * DESCRIPTION: Update to version 0.1-2.
705    
706            * man/TextRepository.Rd: Fixed bug in documentation.
707    
708    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
709    
710            * DESCRIPTION: Update to version 0.1-1.
711    
712    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
713    
714            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
715            wordStem.
716    
717    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
718    
719            * R/: Changes due to Kurt's review.
720    
721    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
722    
723            * R/: Implemented improvements based upon comments by David
724            Meyer.
725    
726    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
727    
728            * inst/doc/: Rewrote vignette.
729    
730            * man/: Improved documentation.
731    
732    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
733    
734            * man/: Updated documentation.
735    
736            * DESCRIPTION: Changed package name to "tm". Updated version to
737            0.1 for first CRAN release.
738    
739            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
740            list archive example.
741    
742            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
743            archive example.
744    
745            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
746            from (several mails per box) mbox format to (single mail per file)
747            eml format.
748    
749    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
750    
751            * data/crude.rda: Rebuilt.
752    
753            * data/acq.rda: Rebuilt.
754    
755            * R/reader.R: Factored out reader and parser methods from
756            textdoccol.R.
757    
758            * R/source.R: Factored out Source methods from aobjects.R and
759            textdoccol.R.
760            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
761            feeds.
762    
763            * R/textdoccol.R (DirSource): Added support for recursive
764            traversal of directories.
765    
766    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
767    
768            * R/textdoccol.R ([[): Loads the document corpus automatically
769            into memory upon access.
770            (tm_transform, tm_filter): Removed several checks whether the
771            document is already loaded ([[ ensures this now).
772            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
773            mailing list archive.
774    
775    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
776    
777            * R/aobjects.R (TextDocument): Is now a virtual class.
778            (Source): Is now a virtual class.
779    
780    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
781    
782            * R/textdoccol.R (c): Support for an arbitrary number of document
783            collections.
784    
785    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
786    
787            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
788            append_meta and remove_meta.
789    
790            * R/textdoccol.R: Removed modify_metadata method.
791    
792            * R/textrepo.R: Removed modify_metadata method.
793    
794            * R/textdoccol.R (remove_meta): Supports removal of document
795            collection metadata and document (= in data frame) metadata.
796    
797    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
798    
799            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
800    
801            * data/crude.rda: Rebuilt.
802    
803            * data/acq.rda: Rebuilt.
804    
805            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
806    
807            * R/textdoccol.R ([): Bug fix for subsetting a document
808            collection's data frame.
809    
810    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
811    
812            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
813            to s_filter.
814    
815            * R/textdoccol.R: Local text documents' metadata can now be copied
816            to a document collection's data frame with prescind_meta.
817    
818    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
819    
820            * R/: Text documents' slot metadata is now accessible in s_filter.
821    
822            * R/: Rewrote s_filter function (has still some restrictions).
823    
824    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
825    
826            * R/: Various fixes in handling metadata.
827    
828            * R/: Added update mechanism for text document collections.
829    
830    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
831    
832            * R/: Merging of document collections now creates a binary tree
833            for reconstructing merged document collections.
834    
835            * R/: Redesign of metadata for document collections.
836    
837    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
838    
839            * R/: Messages now use \code{ngettext}.
840    
841    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
842    
843            * R/: Added functions for modifying and removing metadata.
844    
845    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
846    
847            * man/: Updated some documentation.
848    
849            * R/: Corrected some connection issues.
850    
851            * inst/doc: Worked on the vignette.
852    
853    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
854    
855            * inst/: Added texts and started vignette.
856    
857            * R/: Final changes based upon David's comments.
858    
859    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
860    
861            * NAMESPACE: Corrected exports (generic methods need exportMethods
862            directives!).
863    
864    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
865    
866            * R/: Modified the TextDocCol constructur and various parsers. It
867            is now modular and supports various file formats via plugins (see
868            the new "Source" class).
869    
870    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
871    
872            * man/: Revised documentation after previous code changes.
873    
874    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
875    
876            * R/: Remaining changes as discussed with David.
877    
878    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
879    
880            * R/: Some changes as suggested by David. The rest will follow
881            within the next days.
882    
883    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
884    
885            * man/: Finished documentation.
886    
887    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
888    
889            * man/: Wrote some documentation.
890    
891    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
892    
893            * R/: Further syntactic sugar in form of additional assignment and
894            accessor methods.
895    
896    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
897    
898            * R/: Syntactic sugar in form of "length", "show" and "summary"
899            operators.
900    
901    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
902    
903            * R/: Diverse updates. Mainly on default operators ("[" or "c")
904            and dissimilarities.
905    
906    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
907    
908            * R/: Added similarity functions.
909    
910            * data/: Added english stopwords.
911    
912    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
913    
914            * data/: Examples compiled for new features
915    
916            * R/: Changes due to new structure.
917    
918            * NAMESPACE: Corrected namespace to reflect new structure.
919    
920            * R/termdocmatrix.R: Adapted for new naming scheme.
921    
922    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
923    
924            * R/textdoccol.R: Adapted code for new class structure. Wrote
925            several transform and filter functions operating on text document
926            collections (alias text document databases).
927    
928            * R/aobjects.R: Adapted class structure with inheritance,
929            repositories and additional meta data. Loading files on demand is
930            now possible.
931    
932    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
933    
934            * R/: Some cosmetic cleanups.
935    
936            * inst/: Removed vignette on clustering. That and much more is now
937            described in the JSS paper on text mining. Based upon that
938            article an elaborated vignette will be incorporated in the future.
939    
940    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
941    
942            * R/: Updated generic S4 methods to comply with signature changes
943            in newer versions of R (> 2.3)
944    
945    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
946    
947            * ext/R/importRIS.R: Automatic RIS import is now possible.
948    
949    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
950    
951            * R/textdoccol.R: Added RIS HTML input format.
952    
953    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
954    
955            * R/textdoccol.R: Removed bug that caused invalid text document
956            collections when handling many input files.
957    
958    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
959    
960            * R/textdoccol.R: Restructured and extended file import
961            mechanism.
962    
963            * inst/doc/clustering.Rnw: Adapted vignette for use with
964            ReutNews.rda
965    
966            * man/ReutNews.Rd: Documentation for ReutNews.rda
967    
968            * data/ReutNews.rda: A tiny Reuters21578 example data set.
969    
970    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
971    
972            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
973            clustering facilities of this package.
974    
975    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
976    
977            * R/aobjects.R: Changed package document structure to avoid class
978            dependency problems.
979    
980    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
981    
982            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
983            data set.
984    
985            *  Finished documentation and reordered directory structure. Now "R
986            CMD check textmin" works without errors.
987    
988    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
989    
990            * src/: Various splits can now be easily created for the
991            Reuters21578 data set.
992    
993    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
994    
995            *  Updated documentation
996    
997    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
998    
999            *  Wrote R documentation for some classes and methods.
1000    
1001    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1002    
1003            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
1004            files. See the questionnaire data/Umfrage.csv for such an example.
1005            We are now able to import files in Reuters-21578 XML format.
1006    
1007            *  Changed class interfaces in various files. Weighting of the text
1008            matrix is now possible.
1009    
1010    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1011    
1012            * R/textdoccol.R: One can build term-document matrices if
1013            nessecary (with buildTDM(...)) and fill the field tdm from a text
1014            document collection with it.
1015    
1016            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1017    
1018    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1019    
1020            * R/textdoccol.R: We now can read in a whole XML file with several
1021            news items.
1022    
1023  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1024    
1025          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1039

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge