SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1040, Sat Feb 6 10:33:03 2010 UTC
# Line 1  Line 1 
1    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
2    
3            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
4            setOldClass(c(..., "list")) works.
5    
6    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/transform.R (stemDocument.character): In case input is a
9            simple character just delegate to the default Snowball stemmer.
10    
11    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
14            data.
15    
16    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
17    
18            * R/doc.R (`Content<-`): Be careful with names attribute.
19    
20    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
21    
22            * R/source.R (DirSource): Improved implementation especially when
23            handling many (>1M) files.
24    
25    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
26    
27            * R/source.R (getElem.URISource): Use encoding argument.
28    
29    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
30    
31            * R/doc.R (setOldClass): Register S3 document classes to be
32            recognized by S4 methods.
33    
34    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
35    
36            * R/matrix.R (termFreq): Add option to remove punctuation
37            characters.
38    
39    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
40    
41            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
42            merging multiple term-document matrices.
43    
44    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
45    
46            * R/corpus.R (setOldClass): Register S3 corpus classes to be
47            recognized by S4 methods.
48    
49            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
50            that CRAN Mac OS X builds do not fail any longer.
51    
52    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
53    
54            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
55            of RWeka:AlphabeticTokenizer() as default.
56    
57    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
58    
59            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
60            caused words at the beginning or the end of a line not to be removed. Do
61            not delete whitespace anymore.
62    
63    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/source.R (DirSource): Default to working directory if no path
66            is specified.
67    
68    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
69    
70            * R/source.R (DirSource): Stop on empty directories.
71    
72    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
73    
74            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
75            named documents.
76    
77    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
78    
79            * R/transform.R (removeWords): Improve regular expressions.
80    
81    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
82    
83            * R/meta.R (DublinCore): Allow lower case tags.
84    
85    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
86    
87            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
88            instead of x$children.
89    
90    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
91    
92            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
93    
94    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
95    
96            * R/: Use S3 instead of S4 class system.
97    
98    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
99    
100            * R/reader.R (readMail): Moved to tm.plugin.mail package.
101    
102    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
105            postings are basically e-mails with some extra headers.
106    
107    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/transform.R: Move convertMboxEml, removeCitation,
110            removeMultipart, and removeSignature to the tm.plugin.mail package
111            since they are mainly utility functions (for handling e-mails) and
112            not very framework specific.
113    
114    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
115    
116            * man/: Fix documentation.
117    
118    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
119    
120            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
121            plain text document instead of an XML document for texts of the
122            Reuters-21578 dataset.
123    
124            * R/sparse.R: Removed since the slam package is now available on
125            CRAN.
126    
127            * DESCRIPTION (Depends): Add slam package.
128    
129    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
130    
131            * R/transform.R (stemDoc): Fix character(0) handling.
132    
133    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
134    
135            * R/doc.R (show): Pretty print.
136    
137    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
138    
139            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
140            gracefully.
141    
142    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/corpus.R: Make corpus virtual. Implement corpus with standard
145            and permanent storage semantics.
146    
147            * DESCRIPTION: New major release. A *lot* of improvements.
148    
149    2009-05-04   Ingo Feinerer <feinerer@logic.at>
150    
151            * NAMESPACE: Export some simple_triplet_matrix functions.
152    
153    2009-04-28   Ingo Feinerer <feinerer@logic.at>
154    
155            * R/weight.R: Adapt tf-idf to new matrix format.
156    
157    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
158    
159            * R/matrix.R: Create two distinct classes for term-document and
160            document-term matrices.
161    
162    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
163    
164            * R/termdocmatrix.R: No longer use Matrix package. This reduces
165            package start-up time significantly.
166    
167    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
168    
169            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
170    
171    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
172    
173            * R/transform.R (tmReduce): Combine multiple maps into one
174            transformation.
175    
176    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
177    
178            * R/weight.R: Remove weightLogical since it does not return a
179            dgCMatrix.
180    
181            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
182            or TermDocumentMatrix instead.
183    
184    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
185    
186            * inst/doc/extensions.Rnw: Finished vignette.
187    
188    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
189    
190            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
191            DocumentTermMatrix representations.
192    
193    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
194    
195            * R/reader.R (readXML): New reader for arbitrary XML files.
196    
197    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
198    
199            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
200            (XMLSource): New XMLSource class for arbitrary XML files.
201            (Source): New slot Vectorized.
202    
203    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
204    
205            * R/reader.R (readTabular): Experimental reader for tabular data
206            structures which can be customized via user-defined mappings.
207    
208            * R/reader.R: Always use UTC time zone.
209    
210            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
211    
212    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
213    
214            * R/reader.R (readDOC): Options can be passed over to antiword.
215    
216            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
217            pdftotext.
218    
219    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
220    
221            * R/source.R (DirSource): Add pattern and ignore.case arguments
222            which are internally passed over to list.files().
223    
224    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
225    
226            * inst/doc/tm.Rnw: Suppress pointless loading message.
227    
228    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
229    
230            * DESCRIPTION: Speed up package loading (via moving packages not
231            strictly necessary for normal operation to Suggests instead of
232            Depends).
233    
234    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
235    
236            * R/reader.R (readNewsgroup): The date format is now configurable.
237    
238    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
239    
240            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
241    
242    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
243    
244            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
245    
246    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
247    
248            * R/source.R (DataframeSource): New source class for data frames.
249    
250            * R/source.R: Fixed non-standard call evaluation.
251    
252    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
253    
254            * R/source.R (URISource): New source class for a single document.
255    
256    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
257    
258            * R/source.R: Refactoring.
259    
260    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
261    
262            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
263            Rmpi installations more gracefully.
264    
265    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
266    
267            * R/source.R (Source): Add Length slot.
268    
269    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
270    
271            * R/AAA.R: Unify duplicated .onLoad function.
272    
273    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
274    
275            * DESCRIPTION (Suggests): Added Rmpi.
276    
277    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
278    
279            * R/source.R (getElem): Fix 'no visible binding' warning.
280    
281            * man/WeightFunction.Rd: Fix signature.
282    
283    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
284    
285            * R/weight.R: Introduce name abbreviations for weighting functions.
286    
287    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
288    
289            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
290    
291            * R/cluster.R: Provide convenience functions for using a MPI
292            cluster.
293    
294            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
295            available.
296    
297            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
298            available.
299    
300    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
301    
302            * R/textdoccol.R (lapply): Removed debug print out.
303    
304    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
305    
306            * R/reader.R (readRCV1): Improved meta data extraction from
307            Reuters Corpus Volume 1 documents.
308    
309    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
310    
311            * R/transform.R: Ensure that all mappings preserve multiline
312            structures.
313    
314    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
315    
316            * R/filter.R: Every filter has now an attribute indicating whether
317            it sould be applied to document level (doclevel).
318    
319            * R/textdoccol.R (tmFilter): Set searchFullText as new default
320            filter.
321    
322    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
323    
324            * R/transform.R (replacePatterns): Replaced removeWords by
325            replacePatterns. Suggested by Christian Buchta.
326    
327            * R/textdoccol.R (inspect): Improved formatting.
328    
329    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
330    
331            * inst/CITATION: Updated JSS article information.
332    
333            * R/textdoccol.R (setAs): Added coerce method from list to
334            corpus.
335    
336            * R/meta.R (meta): Improved meta data handling.
337    
338    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
339    
340            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
341            Christian Buchta.
342    
343            * inst/CITATION: Added template to include JSS article reference.
344    
345    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
346    
347            * R/textdoccol.R (tmMap): Introduced lazy mapping.
348    
349            * R/source.R: Added VectorSource.
350    
351    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
352    
353            * man/: Language codes should be in ISO 639-1 format.
354    
355            * R/textdoccol.R (asPlain): Preserve local meta data.
356    
357    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
358    
359            * R/textdoccol.R (writeCorpus): Function for writing a corpus
360            containing plain text documents to disk.
361    
362    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
363    
364            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
365            always set correctly.
366    
367            * R/textdoccol.R: Set load = TRUE as default for load on demand
368            since in most cases this is the wanted behaviour.
369    
370    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
371    
372            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
373    
374            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
375    
376    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
377    
378            * R/meta.R (meta): New function for consistent access to meta data
379            of document collections, repositories, and texts.
380    
381    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
382    
383            * R/: Better support for encodings.
384    
385    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
386    
387            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
388            selection when no reader argument is given.
389    
390    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
391    
392            * R/source.R (CSVSource): Now uses read.csv instead of scan
393            internally.
394    
395    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
396    
397            * R/reader.R (getReaders): Returns available reader functions.
398    
399            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
400            as default.
401    
402    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
403    
404            * R/stopwords.R (stopwords): Shortened code, removed codetools
405            variable warnings.
406    
407            * man/: Documentation for showMeta, added an example for tmMap.
408    
409            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
410            some minor typos fixed.
411    
412    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
413    
414            * R/aobjects.R (showMeta): Added method for pretty printing a
415            text document's meta data.
416    
417    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
418    
419            * R/textdoccol.R (TextDocCol): Better handling of empty
420            arguments.
421    
422            * NAMESPACE: Exported readDOC.
423    
424            * man/completeStems.Rd: Added an example.
425    
426    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
427    
428            * R/stopwords.R (stopwords): Look up .dat files at every
429            call. Allows users to modify stopword .dat files interactively.
430    
431    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
432    
433            * R/termdocmatrix.R (termFreq): Correct processing of empty
434            documents.
435    
436    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
437    
438            * man/: Updated documentation.
439    
440    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
441    
442            * R/complete.R (completeStems): Completes (heuristically) word
443            stems.
444    
445            * R/termdocmatrix.R (TermDocMatrix2): New modular
446            constructor.
447    
448            * NAMESPACE: Exported termFreq.
449    
450    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
451    
452            * R/reader.R (readDOC): Added MS Word reader (using antiword).
453    
454    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * R/weight.R: Weighting functions for TermDocMatrix.
457    
458    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
461            functions for accessing dimension, column, and row names.
462    
463            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
464    
465    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
466    
467            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
468    
469    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
470    
471            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
472    
473    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/reader.R (readPDF): Removed manual checks for pdftotext and
476            pdfinfo. The system call gives a warning anyway.
477    
478    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * R/textdoccol.R (asPlain): Conversion from
481            StructuredTextDocuments to PlainTextDocuments.
482    
483    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
484    
485            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
486            for accessing term-document matrices.
487    
488            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
489            are installed.
490    
491    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
492    
493            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
494            Christian Buchta.
495    
496    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
497    
498            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
499    
500    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
501    
502            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
503    
504            * R/reader.R (readPDF): Added PDF reader.
505    
506    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
509    
510            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
511    
512            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
513    
514            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
515    
516    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/distmeasure.R (dissimilarity): Replaced dists call from
519            package cba by new dist call from package proxy.
520    
521    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
524    
525    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
526    
527            * R/termdocmatrix.R: require() uses the quietly option to suppress
528            loading messages.
529    
530    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
531    
532            * R/dictionary.R: Added dictionary support.
533    
534    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
535    
536            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
537            documents. This simplifies some functions, e.g., asPlain.
538    
539    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
540    
541            * inst/doc/tm.Rnw: Fixed some typos in vignette.
542    
543    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * R/textdoccol.R (replaceWords): Added method to replace a set of
546            words by a single word. Useful for synonyms.
547    
548    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
549    
550            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
551    
552    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
553    
554            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
555            vectors. Thanks to Ariel Maguyon for his error report.
556            (removeSparseTerms): New function to remove columns from a
557            term-document matrix exceeding a sparse factor.
558    
559    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
560    
561            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
562    
563    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
564    
565            * man/sFilter.Rd: Corrected documentation on statement format (use
566            '==' instead of '=').
567    
568    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/aobjects.R (StructuredTextDocument): Inherits from
571            TextDocument.
572    
573    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
576            on sparse matrices as proposed by Martin Maechler.
577    
578    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
579    
580            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
581            \pkg{filehash} version makes them deprecated.
582    
583    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
584    
585            * R/termdocmatrix.R (textvector): Stemming is now performed before
586            erasing stopwords.
587            (weightMatrix): Adapted to handle sparse matrices.
588            (TermDocMatrix): Sparse matrix is now efficiently built by
589            direct stepwise insertion of row values into it.
590    
591    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
592    
593            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
594            due to ongoing problems. For our purposes the latter is as useful
595            as the replaced package.
596    
597    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
598    
599            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
600    
601            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
602    
603    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
604    
605            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
606            languages with available stopwords.
607    
608    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
609    
610            * inst/doc/tm.Rnw: Minor corrections in the vignette.
611    
612    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
613    
614            * DESCRIPTION: Update to version 0.2, since a lot of new features
615            have been integrated.
616    
617            * inst/stopwords: Updated existing stopwords and added stopwords
618            for various other languages.
619    
620    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
621    
622            * man/: Updated documentation.
623    
624            * Work/testDb.R: Script to test database stuff.
625    
626            * R/: Fixed various database related bugs. Seems to be rather
627            useable now, i.e., consider as alpha status for now.
628    
629    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
630    
631            * R/: Fixed some bugs related to database support.
632    
633    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
634    
635            * man/: Added a lot of examples to the manuals.
636    
637    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
638    
639            * man/: Updated parts of the documentation.
640    
641            * R/textdoccol.R (asPlain): Added conversion from newsgroup
642            documents to plain text documents.
643    
644    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
645    
646            * R/textdoccol.R: Finished experimental database support. Not yet
647            intensively tested.
648    
649            * R/source.R: Now each source has a default reader.
650    
651            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
652            class anymore.
653    
654            * R/plaintextdoc.R: Custom show method for plain text documents.
655    
656            * R/aobjects.R: Added a class for structured text documents.
657    
658            * R/reader.R: Replaced remaining \code{parser} occurrences with
659            \code{reader}.
660    
661            * R/textdoccol.R (summary): Indent tags.
662    
663            * R/textdoccol.R (removePunctuation): Transform method to remove
664            punctuation marks.
665    
666    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
667    
668            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
669            using prescindMeta().
670    
671    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
672    
673            * R/textdoccol.R: Improved database support.
674    
675    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
678    
679            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
680            language code.
681    
682            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
683            into parserControl argument.
684    
685            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
686    
687    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
688    
689            * Work/tmDataSetup.R: The datasets acq and crude can now be
690            created on the fly.
691    
692            * R/stopwords.R: Introduced a function returning the stopwords for
693            a given language (English, German and French at the moment)
694    
695            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
696            otherwise falls back to Snowball package.
697    
698    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
699    
700            * man/dissimilarity-methods.Rd: Make clear that any method offered
701            by "dists" from package "cba" can be used.
702    
703    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
704    
705            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
706            to Kurt's latex suggestion. Removed points and underscores in
707            variable names for consistent naming.
708    
709            * DESCRIPTION: Update to version 0.1-2.
710    
711            * man/TextRepository.Rd: Fixed bug in documentation.
712    
713    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
714    
715            * DESCRIPTION: Update to version 0.1-1.
716    
717    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
718    
719            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
720            wordStem.
721    
722    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
723    
724            * R/: Changes due to Kurt's review.
725    
726    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
727    
728            * R/: Implemented improvements based upon comments by David
729            Meyer.
730    
731    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
732    
733            * inst/doc/: Rewrote vignette.
734    
735            * man/: Improved documentation.
736    
737    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
738    
739            * man/: Updated documentation.
740    
741            * DESCRIPTION: Changed package name to "tm". Updated version to
742            0.1 for first CRAN release.
743    
744            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
745            list archive example.
746    
747            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
748            archive example.
749    
750            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
751            from (several mails per box) mbox format to (single mail per file)
752            eml format.
753    
754    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
755    
756            * data/crude.rda: Rebuilt.
757    
758            * data/acq.rda: Rebuilt.
759    
760            * R/reader.R: Factored out reader and parser methods from
761            textdoccol.R.
762    
763            * R/source.R: Factored out Source methods from aobjects.R and
764            textdoccol.R.
765            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
766            feeds.
767    
768            * R/textdoccol.R (DirSource): Added support for recursive
769            traversal of directories.
770    
771    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
772    
773            * R/textdoccol.R ([[): Loads the document corpus automatically
774            into memory upon access.
775            (tm_transform, tm_filter): Removed several checks whether the
776            document is already loaded ([[ ensures this now).
777            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
778            mailing list archive.
779    
780    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
781    
782            * R/aobjects.R (TextDocument): Is now a virtual class.
783            (Source): Is now a virtual class.
784    
785    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
786    
787            * R/textdoccol.R (c): Support for an arbitrary number of document
788            collections.
789    
790    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
791    
792            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
793            append_meta and remove_meta.
794    
795            * R/textdoccol.R: Removed modify_metadata method.
796    
797            * R/textrepo.R: Removed modify_metadata method.
798    
799            * R/textdoccol.R (remove_meta): Supports removal of document
800            collection metadata and document (= in data frame) metadata.
801    
802    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
803    
804            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
805    
806            * data/crude.rda: Rebuilt.
807    
808            * data/acq.rda: Rebuilt.
809    
810            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
811    
812            * R/textdoccol.R ([): Bug fix for subsetting a document
813            collection's data frame.
814    
815    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
816    
817            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
818            to s_filter.
819    
820            * R/textdoccol.R: Local text documents' metadata can now be copied
821            to a document collection's data frame with prescind_meta.
822    
823    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
824    
825            * R/: Text documents' slot metadata is now accessible in s_filter.
826    
827            * R/: Rewrote s_filter function (has still some restrictions).
828    
829    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
830    
831            * R/: Various fixes in handling metadata.
832    
833            * R/: Added update mechanism for text document collections.
834    
835    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
836    
837            * R/: Merging of document collections now creates a binary tree
838            for reconstructing merged document collections.
839    
840            * R/: Redesign of metadata for document collections.
841    
842    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
843    
844            * R/: Messages now use \code{ngettext}.
845    
846    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
847    
848            * R/: Added functions for modifying and removing metadata.
849    
850    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
851    
852            * man/: Updated some documentation.
853    
854            * R/: Corrected some connection issues.
855    
856            * inst/doc: Worked on the vignette.
857    
858    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
859    
860            * inst/: Added texts and started vignette.
861    
862            * R/: Final changes based upon David's comments.
863    
864    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
865    
866            * NAMESPACE: Corrected exports (generic methods need exportMethods
867            directives!).
868    
869    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
870    
871            * R/: Modified the TextDocCol constructur and various parsers. It
872            is now modular and supports various file formats via plugins (see
873            the new "Source" class).
874    
875    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
876    
877            * man/: Revised documentation after previous code changes.
878    
879    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
880    
881            * R/: Remaining changes as discussed with David.
882    
883    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
884    
885            * R/: Some changes as suggested by David. The rest will follow
886            within the next days.
887    
888    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
889    
890            * man/: Finished documentation.
891    
892    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
893    
894            * man/: Wrote some documentation.
895    
896    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
897    
898            * R/: Further syntactic sugar in form of additional assignment and
899            accessor methods.
900    
901    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
902    
903            * R/: Syntactic sugar in form of "length", "show" and "summary"
904            operators.
905    
906    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
907    
908            * R/: Diverse updates. Mainly on default operators ("[" or "c")
909            and dissimilarities.
910    
911    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
912    
913            * R/: Added similarity functions.
914    
915            * data/: Added english stopwords.
916    
917    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
918    
919            * data/: Examples compiled for new features
920    
921            * R/: Changes due to new structure.
922    
923            * NAMESPACE: Corrected namespace to reflect new structure.
924    
925            * R/termdocmatrix.R: Adapted for new naming scheme.
926    
927    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
928    
929            * R/textdoccol.R: Adapted code for new class structure. Wrote
930            several transform and filter functions operating on text document
931            collections (alias text document databases).
932    
933            * R/aobjects.R: Adapted class structure with inheritance,
934            repositories and additional meta data. Loading files on demand is
935            now possible.
936    
937    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
938    
939            * R/: Some cosmetic cleanups.
940    
941            * inst/: Removed vignette on clustering. That and much more is now
942            described in the JSS paper on text mining. Based upon that
943            article an elaborated vignette will be incorporated in the future.
944    
945    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
946    
947            * R/: Updated generic S4 methods to comply with signature changes
948            in newer versions of R (> 2.3)
949    
950    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
951    
952            * ext/R/importRIS.R: Automatic RIS import is now possible.
953    
954    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
955    
956            * R/textdoccol.R: Added RIS HTML input format.
957    
958    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
959    
960            * R/textdoccol.R: Removed bug that caused invalid text document
961            collections when handling many input files.
962    
963    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
964    
965            * R/textdoccol.R: Restructured and extended file import
966            mechanism.
967    
968            * inst/doc/clustering.Rnw: Adapted vignette for use with
969            ReutNews.rda
970    
971            * man/ReutNews.Rd: Documentation for ReutNews.rda
972    
973            * data/ReutNews.rda: A tiny Reuters21578 example data set.
974    
975    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
976    
977            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
978            clustering facilities of this package.
979    
980    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
981    
982            * R/aobjects.R: Changed package document structure to avoid class
983            dependency problems.
984    
985    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
986    
987            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
988            data set.
989    
990            *  Finished documentation and reordered directory structure. Now "R
991            CMD check textmin" works without errors.
992    
993    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
994    
995            * src/: Various splits can now be easily created for the
996            Reuters21578 data set.
997    
998    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
999    
1000            *  Updated documentation
1001    
1002    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1003    
1004            *  Wrote R documentation for some classes and methods.
1005    
1006    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1007    
1008            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
1009            files. See the questionnaire data/Umfrage.csv for such an example.
1010            We are now able to import files in Reuters-21578 XML format.
1011    
1012            *  Changed class interfaces in various files. Weighting of the text
1013            matrix is now possible.
1014    
1015    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1016    
1017            * R/textdoccol.R: One can build term-document matrices if
1018            nessecary (with buildTDM(...)) and fill the field tdm from a text
1019            document collection with it.
1020    
1021            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1022    
1023    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1024    
1025            * R/textdoccol.R: We now can read in a whole XML file with several
1026            news items.
1027    
1028  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1029    
1030          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1040

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge