SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC pkg/ChangeLog revision 1193, Thu Oct 4 07:44:21 2012 UTC
# Line 1  Line 1 
1    2012-10-03 Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/weight.R (weightTfIdf, weightSMART): Gracefully handle empty
4            columns and rows (avoids blow-up due to NaN values). Suggested by Jaap
5            Frölich.
6    
7    2012-07-27 Ingo Feinerer  <feinerer@logic.at>
8    
9            * R/transform.R (removeWords): Allow longer stopword lists.
10    
11    2012-01-31  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/reader.R (readXML): Readers can now set the document language
14            themselves.
15    
16    2012-01-14  Ingo Feinerer  <feinerer@logic.at>
17    
18            * R/source.R (XMLSource, getElem.XMLSource): Simplifications as
19            proposed by Milan Bouchet-Valat.
20    
21    2012-01-11  Ingo Feinerer  <feinerer@logic.at>
22    
23            * R/matrix.R (termFreq): Fix processing of user provided
24            stopwords. Reported by Bettina Grün.
25    
26    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
27    
28            * R/matrix.R (termFreq): Fix invalid handling of
29            control$wordLengths[1]. Reported by Steven C. Bagley.
30    
31    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
32    
33            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
34    
35    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
38    
39    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
40    
41            * R/transform.R (removePunctuation): Add option to preserve
42            intra-word dashes.
43    
44    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
45    
46            * R/matrix.R (termFreq): Allow reordering of control option
47            processing.
48    
49    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
50    
51            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
52            pdfinfo tool.
53    
54            * inst/stopwords/SMART.dat: Add SMART information retrieval system
55            stopwords (which are also used by the MC toolkit).
56    
57            * R/matrix (termFreq): Allow local option \code{bounds$local} to
58            restrict how often a term may appear in each document (generalizes
59            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
60            for word length bounds (generalizes \code{minWordLength}).
61    
62            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
63            \code{bounds$global} for restricting how often a term is allowed
64            to appear in different documents.
65    
66            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
67            local options delegated internally to termFreq() and global
68            options which are processed by the term-document matrix
69            constructor itself.
70    
71    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
72    
73            * man/getTokenizers.Rd: Document getTokenizers().
74    
75            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
76    
77    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
78    
79            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
80    
81            * man/combine.Rd: Document c.term_frequency().
82    
83    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
84    
85            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
86            can be accessed via '[' and not '[['.
87    
88    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/stopwords.R (stopwords): Raise an error if no stopwords are
91            available for requested language. Suggested by Derek M Jones.
92    
93    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
94    
95            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
96            normalization.
97    
98    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
99    
100            * R/transform.R (stemDocument.PlainTextDocument): Use language
101            argument.
102    
103    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
104    
105            * R/source.R: Store strings and connections instead of unevaluated
106            calls.
107    
108    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
109    
110            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
111    
112    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
115            (instead of a list element).
116    
117    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
120            documents by names (fallback to IDs if names are not set).
121    
122    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
125            \code{recursive} now determines whether existing corpus meta data
126            is used.
127    
128    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
129    
130            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
131    
132    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
133    
134            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
135            remove terms not occurring in the corpus anymore.
136    
137    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
138    
139            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
140            and Heaps' law.
141    
142    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
145            provided by a source.
146    
147    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
148    
149            * R/source.R (.Source): Provide document names.
150    
151    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
152    
153            * R/meta.R (`content_or_meta`): Utility function.
154    
155    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
156    
157            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
158            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
159    
160    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
161    
162            * R/weight.R (weightTfIdf): Added normalization option.
163    
164            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
165            analysis.
166    
167    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
168    
169            * R/score.R (tm_tag_score): Compute a score from the number of
170            tags matching in a document.
171    
172    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
173    
174            * R/complete.R (stemCompletion): New completion heuristics.
175    
176    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
177    
178            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
179    
180    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
181    
182            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
183            setOldClass(c(..., "list")) works.
184    
185    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
186    
187            * R/transform.R (stemDocument.character): In case input is a
188            simple character just delegate to the default Snowball stemmer.
189    
190    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
191    
192            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
193            data.
194    
195    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
196    
197            * R/doc.R (`Content<-`): Be careful with names attribute.
198    
199    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
200    
201            * R/source.R (DirSource): Improved implementation especially when
202            handling many (> 1M) files.
203    
204    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
205    
206            * R/source.R (getElem.URISource): Use encoding argument.
207    
208    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
209    
210            * R/doc.R (setOldClass): Register S3 document classes to be
211            recognized by S4 methods.
212    
213    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
214    
215            * R/matrix.R (termFreq): Add option to remove punctuation
216            characters.
217    
218    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
219    
220            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
221            merging multiple term-document matrices.
222    
223    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
224    
225            * R/corpus.R (setOldClass): Register S3 corpus classes to be
226            recognized by S4 methods.
227    
228            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
229            that CRAN Mac OS X builds do not fail any longer.
230    
231    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
232    
233            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
234            of RWeka:AlphabeticTokenizer() as default.
235    
236    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
237    
238            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
239            caused words at the beginning or the end of a line not to be removed. Do
240            not delete whitespace anymore.
241    
242    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
243    
244            * R/source.R (DirSource): Default to working directory if no path
245            is specified.
246    
247    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
248    
249            * R/source.R (DirSource): Stop on empty directories.
250    
251    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
252    
253            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
254            named documents.
255    
256    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
257    
258            * R/transform.R (removeWords): Improve regular expressions.
259    
260    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
261    
262            * R/meta.R (DublinCore): Allow lower case tags.
263    
264    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
265    
266            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
267            instead of x$children.
268    
269    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
270    
271            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
272    
273    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
274    
275            * R/: Use S3 instead of S4 class system.
276    
277    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
278    
279            * R/reader.R (readMail): Moved to tm.plugin.mail package.
280    
281    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
282    
283            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
284            postings are basically e-mails with some extra headers.
285    
286    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
287    
288            * R/transform.R: Move convertMboxEml, removeCitation,
289            removeMultipart, and removeSignature to the tm.plugin.mail package
290            since they are mainly utility functions (for handling e-mails) and
291            not very framework specific.
292    
293    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
294    
295            * man/: Fix documentation.
296    
297    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
298    
299            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
300            plain text document instead of an XML document for texts of the
301            Reuters-21578 dataset.
302    
303            * R/sparse.R: Removed since the slam package is now available on
304            CRAN.
305    
306            * DESCRIPTION (Depends): Add slam package.
307    
308    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
309    
310            * R/transform.R (stemDoc): Fix character(0) handling.
311    
312    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
313    
314            * R/doc.R (show): Pretty print.
315    
316    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
317    
318            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
319            gracefully.
320    
321    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/corpus.R: Make corpus virtual. Implement corpus with standard
324            and permanent storage semantics.
325    
326            * DESCRIPTION: New major release. A *lot* of improvements.
327    
328    2009-05-04   Ingo Feinerer <feinerer@logic.at>
329    
330            * NAMESPACE: Export some simple_triplet_matrix functions.
331    
332    2009-04-28   Ingo Feinerer <feinerer@logic.at>
333    
334            * R/weight.R: Adapt tf-idf to new matrix format.
335    
336    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
337    
338            * R/matrix.R: Create two distinct classes for term-document and
339            document-term matrices.
340    
341    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
342    
343            * R/termdocmatrix.R: No longer use Matrix package. This reduces
344            package start-up time significantly.
345    
346    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
347    
348            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
349    
350    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
351    
352            * R/transform.R (tmReduce): Combine multiple maps into one
353            transformation.
354    
355    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
356    
357            * R/weight.R: Remove weightLogical since it does not return a
358            dgCMatrix.
359    
360            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
361            or TermDocumentMatrix instead.
362    
363    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
364    
365            * inst/doc/extensions.Rnw: Finished vignette.
366    
367    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
368    
369            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
370            DocumentTermMatrix representations.
371    
372    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
373    
374            * R/reader.R (readXML): New reader for arbitrary XML files.
375    
376    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
377    
378            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
379            (XMLSource): New XMLSource class for arbitrary XML files.
380            (Source): New slot Vectorized.
381    
382    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
383    
384            * R/reader.R (readTabular): Experimental reader for tabular data
385            structures which can be customized via user-defined mappings.
386    
387            * R/reader.R: Always use UTC time zone.
388    
389            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
390    
391    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
392    
393            * R/reader.R (readDOC): Options can be passed over to antiword.
394    
395            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
396            pdftotext.
397    
398    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
399    
400            * R/source.R (DirSource): Add pattern and ignore.case arguments
401            which are internally passed over to list.files().
402    
403    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
404    
405            * inst/doc/tm.Rnw: Suppress pointless loading message.
406    
407    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
408    
409            * DESCRIPTION: Speed up package loading (via moving packages not
410            strictly necessary for normal operation to Suggests instead of
411            Depends).
412    
413    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
414    
415            * R/reader.R (readNewsgroup): The date format is now configurable.
416    
417    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
418    
419            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
420    
421    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
422    
423            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
424    
425    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
426    
427            * R/source.R (DataframeSource): New source class for data frames.
428    
429            * R/source.R: Fixed non-standard call evaluation.
430    
431    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
432    
433            * R/source.R (URISource): New source class for a single document.
434    
435    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
436    
437            * R/source.R: Refactoring.
438    
439    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
440    
441            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
442            Rmpi installations more gracefully.
443    
444    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
445    
446            * R/source.R (Source): Add Length slot.
447    
448    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
449    
450            * R/AAA.R: Unify duplicated .onLoad function.
451    
452    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
453    
454            * DESCRIPTION (Suggests): Added Rmpi.
455    
456    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
457    
458            * R/source.R (getElem): Fix 'no visible binding' warning.
459    
460            * man/WeightFunction.Rd: Fix signature.
461    
462    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
463    
464            * R/weight.R: Introduce name abbreviations for weighting functions.
465    
466    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
467    
468            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
469    
470            * R/cluster.R: Provide convenience functions for using a MPI
471            cluster.
472    
473            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
474            available.
475    
476            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
477            available.
478    
479    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
480    
481            * R/textdoccol.R (lapply): Removed debug print out.
482    
483    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
484    
485            * R/reader.R (readRCV1): Improved meta data extraction from
486            Reuters Corpus Volume 1 documents.
487    
488    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
489    
490            * R/transform.R: Ensure that all mappings preserve multiline
491            structures.
492    
493    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
494    
495            * R/filter.R: Every filter has now an attribute indicating whether
496            it sould be applied to document level (doclevel).
497    
498            * R/textdoccol.R (tmFilter): Set searchFullText as new default
499            filter.
500    
501    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
502    
503            * R/transform.R (replacePatterns): Replaced removeWords by
504            replacePatterns. Suggested by Christian Buchta.
505    
506            * R/textdoccol.R (inspect): Improved formatting.
507    
508    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * inst/CITATION: Updated JSS article information.
511    
512            * R/textdoccol.R (setAs): Added coerce method from list to
513            corpus.
514    
515            * R/meta.R (meta): Improved meta data handling.
516    
517    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
520            Christian Buchta.
521    
522            * inst/CITATION: Added template to include JSS article reference.
523    
524    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
525    
526            * R/textdoccol.R (tmMap): Introduced lazy mapping.
527    
528            * R/source.R: Added VectorSource.
529    
530    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
531    
532            * man/: Language codes should be in ISO 639-1 format.
533    
534            * R/textdoccol.R (asPlain): Preserve local meta data.
535    
536    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
537    
538            * R/textdoccol.R (writeCorpus): Function for writing a corpus
539            containing plain text documents to disk.
540    
541    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
542    
543            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
544            always set correctly.
545    
546            * R/textdoccol.R: Set load = TRUE as default for load on demand
547            since in most cases this is the wanted behaviour.
548    
549    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
550    
551            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
552    
553            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
554    
555    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
556    
557            * R/meta.R (meta): New function for consistent access to meta data
558            of document collections, repositories, and texts.
559    
560    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
561    
562            * R/: Better support for encodings.
563    
564    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
565    
566            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
567            selection when no reader argument is given.
568    
569    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
570    
571            * R/source.R (CSVSource): Now uses read.csv instead of scan
572            internally.
573    
574    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
575    
576            * R/reader.R (getReaders): Returns available reader functions.
577    
578            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
579            as default.
580    
581    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
582    
583            * R/stopwords.R (stopwords): Shortened code, removed codetools
584            variable warnings.
585    
586            * man/: Documentation for showMeta, added an example for tmMap.
587    
588            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
589            some minor typos fixed.
590    
591    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
592    
593            * R/aobjects.R (showMeta): Added method for pretty printing a
594            text document's meta data.
595    
596    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
597    
598            * R/textdoccol.R (TextDocCol): Better handling of empty
599            arguments.
600    
601            * NAMESPACE: Exported readDOC.
602    
603            * man/completeStems.Rd: Added an example.
604    
605    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
606    
607            * R/stopwords.R (stopwords): Look up .dat files at every
608            call. Allows users to modify stopword .dat files interactively.
609    
610    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
611    
612            * R/termdocmatrix.R (termFreq): Correct processing of empty
613            documents.
614    
615    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
616    
617            * man/: Updated documentation.
618    
619    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
620    
621            * R/complete.R (completeStems): Completes (heuristically) word
622            stems.
623    
624            * R/termdocmatrix.R (TermDocMatrix2): New modular
625            constructor.
626    
627            * NAMESPACE: Exported termFreq.
628    
629    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
630    
631            * R/reader.R (readDOC): Added MS Word reader (using antiword).
632    
633    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
634    
635            * R/weight.R: Weighting functions for TermDocMatrix.
636    
637    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
638    
639            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
640            functions for accessing dimension, column, and row names.
641    
642            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
643    
644    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
645    
646            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
647    
648    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
649    
650            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
651    
652    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
653    
654            * R/reader.R (readPDF): Removed manual checks for pdftotext and
655            pdfinfo. The system call gives a warning anyway.
656    
657    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
658    
659            * R/textdoccol.R (asPlain): Conversion from
660            StructuredTextDocuments to PlainTextDocuments.
661    
662    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
663    
664            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
665            for accessing term-document matrices.
666    
667            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
668            are installed.
669    
670    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
673            Christian Buchta.
674    
675    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
678    
679    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
680    
681            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
682    
683            * R/reader.R (readPDF): Added PDF reader.
684    
685    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
686    
687            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
688    
689            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
690    
691            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
692    
693            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
694    
695    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
696    
697            * R/distmeasure.R (dissimilarity): Replaced dists call from
698            package cba by new dist call from package proxy.
699    
700    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
701    
702            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
703    
704    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
705    
706            * R/termdocmatrix.R: require() uses the quietly option to suppress
707            loading messages.
708    
709    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
710    
711            * R/dictionary.R: Added dictionary support.
712    
713    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
714    
715            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
716            documents. This simplifies some functions, e.g., asPlain.
717    
718    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
719    
720            * inst/doc/tm.Rnw: Fixed some typos in vignette.
721    
722    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
723    
724            * R/textdoccol.R (replaceWords): Added method to replace a set of
725            words by a single word. Useful for synonyms.
726    
727    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
728    
729            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
730    
731    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
732    
733            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
734            vectors. Thanks to Ariel Maguyon for his error report.
735            (removeSparseTerms): New function to remove columns from a
736            term-document matrix exceeding a sparse factor.
737    
738    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
739    
740            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
741    
742    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
743    
744            * man/sFilter.Rd: Corrected documentation on statement format (use
745            '==' instead of '=').
746    
747    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
748    
749            * R/aobjects.R (StructuredTextDocument): Inherits from
750            TextDocument.
751    
752    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
753    
754            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
755            on sparse matrices as proposed by Martin Maechler.
756    
757    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
758    
759            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
760            \pkg{filehash} version makes them deprecated.
761    
762    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
763    
764            * R/termdocmatrix.R (textvector): Stemming is now performed before
765            erasing stopwords.
766            (weightMatrix): Adapted to handle sparse matrices.
767            (TermDocMatrix): Sparse matrix is now efficiently built by
768            direct stepwise insertion of row values into it.
769    
770    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
771    
772            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
773            due to ongoing problems. For our purposes the latter is as useful
774            as the replaced package.
775    
776    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
777    
778            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
779    
780            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
781    
782    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
783    
784            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
785            languages with available stopwords.
786    
787    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
788    
789            * inst/doc/tm.Rnw: Minor corrections in the vignette.
790    
791    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
792    
793            * DESCRIPTION: Update to version 0.2, since a lot of new features
794            have been integrated.
795    
796            * inst/stopwords: Updated existing stopwords and added stopwords
797            for various other languages.
798    
799    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
800    
801            * man/: Updated documentation.
802    
803            * Work/testDb.R: Script to test database stuff.
804    
805            * R/: Fixed various database related bugs. Seems to be rather
806            useable now, i.e., consider as alpha status for now.
807    
808    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
809    
810            * R/: Fixed some bugs related to database support.
811    
812    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
813    
814            * man/: Added a lot of examples to the manuals.
815    
816    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
817    
818            * man/: Updated parts of the documentation.
819    
820            * R/textdoccol.R (asPlain): Added conversion from newsgroup
821            documents to plain text documents.
822    
823    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
824    
825            * R/textdoccol.R: Finished experimental database support. Not yet
826            intensively tested.
827    
828            * R/source.R: Now each source has a default reader.
829    
830            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
831            class anymore.
832    
833            * R/plaintextdoc.R: Custom show method for plain text documents.
834    
835            * R/aobjects.R: Added a class for structured text documents.
836    
837            * R/reader.R: Replaced remaining \code{parser} occurrences with
838            \code{reader}.
839    
840            * R/textdoccol.R (summary): Indent tags.
841    
842            * R/textdoccol.R (removePunctuation): Transform method to remove
843            punctuation marks.
844    
845    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
846    
847            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
848            using prescindMeta().
849    
850    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
851    
852            * R/textdoccol.R: Improved database support.
853    
854    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
855    
856            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
857    
858            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
859            language code.
860    
861            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
862            into parserControl argument.
863    
864            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
865    
866    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
867    
868            * Work/tmDataSetup.R: The datasets acq and crude can now be
869            created on the fly.
870    
871            * R/stopwords.R: Introduced a function returning the stopwords for
872            a given language (English, German and French at the moment)
873    
874            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
875            otherwise falls back to Snowball package.
876    
877    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
878    
879            * man/dissimilarity-methods.Rd: Make clear that any method offered
880            by "dists" from package "cba" can be used.
881    
882    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
883    
884            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
885            to Kurt's latex suggestion. Removed points and underscores in
886            variable names for consistent naming.
887    
888            * DESCRIPTION: Update to version 0.1-2.
889    
890            * man/TextRepository.Rd: Fixed bug in documentation.
891    
892    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
893    
894            * DESCRIPTION: Update to version 0.1-1.
895    
896    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
897    
898            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
899            wordStem.
900    
901    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
902    
903            * R/: Changes due to Kurt's review.
904    
905    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
906    
907            * R/: Implemented improvements based upon comments by David
908            Meyer.
909    
910    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
911    
912            * inst/doc/: Rewrote vignette.
913    
914            * man/: Improved documentation.
915    
916    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
917    
918            * man/: Updated documentation.
919    
920            * DESCRIPTION: Changed package name to "tm". Updated version to
921            0.1 for first CRAN release.
922    
923            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
924            list archive example.
925    
926            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
927            archive example.
928    
929            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
930            from (several mails per box) mbox format to (single mail per file)
931            eml format.
932    
933    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
934    
935            * data/crude.rda: Rebuilt.
936    
937            * data/acq.rda: Rebuilt.
938    
939            * R/reader.R: Factored out reader and parser methods from
940            textdoccol.R.
941    
942            * R/source.R: Factored out Source methods from aobjects.R and
943            textdoccol.R.
944            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
945            feeds.
946    
947            * R/textdoccol.R (DirSource): Added support for recursive
948            traversal of directories.
949    
950    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
951    
952            * R/textdoccol.R ([[): Loads the document corpus automatically
953            into memory upon access.
954            (tm_transform, tm_filter): Removed several checks whether the
955            document is already loaded ([[ ensures this now).
956            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
957            mailing list archive.
958    
959    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
960    
961            * R/aobjects.R (TextDocument): Is now a virtual class.
962            (Source): Is now a virtual class.
963    
964    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
965    
966            * R/textdoccol.R (c): Support for an arbitrary number of document
967            collections.
968    
969    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
970    
971            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
972            append_meta and remove_meta.
973    
974            * R/textdoccol.R: Removed modify_metadata method.
975    
976            * R/textrepo.R: Removed modify_metadata method.
977    
978            * R/textdoccol.R (remove_meta): Supports removal of document
979            collection metadata and document (= in data frame) metadata.
980    
981    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
982    
983            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
984    
985            * data/crude.rda: Rebuilt.
986    
987            * data/acq.rda: Rebuilt.
988    
989            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
990    
991            * R/textdoccol.R ([): Bug fix for subsetting a document
992            collection's data frame.
993    
994    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
995    
996            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
997            to s_filter.
998    
999            * R/textdoccol.R: Local text documents' metadata can now be copied
1000            to a document collection's data frame with prescind_meta.
1001    
1002    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1003    
1004            * R/: Text documents' slot metadata is now accessible in s_filter.
1005    
1006            * R/: Rewrote s_filter function (has still some restrictions).
1007    
1008    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1009    
1010            * R/: Various fixes in handling metadata.
1011    
1012            * R/: Added update mechanism for text document collections.
1013    
1014    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1015    
1016            * R/: Merging of document collections now creates a binary tree
1017            for reconstructing merged document collections.
1018    
1019            * R/: Redesign of metadata for document collections.
1020    
1021    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1022    
1023            * R/: Messages now use \code{ngettext}.
1024    
1025    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1026    
1027            * R/: Added functions for modifying and removing metadata.
1028    
1029    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1030    
1031            * man/: Updated some documentation.
1032    
1033            * R/: Corrected some connection issues.
1034    
1035            * inst/doc: Worked on the vignette.
1036    
1037    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            * inst/: Added texts and started vignette.
1040    
1041            * R/: Final changes based upon David's comments.
1042    
1043    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1044    
1045            * NAMESPACE: Corrected exports (generic methods need exportMethods
1046            directives!).
1047    
1048    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1049    
1050            * R/: Modified the TextDocCol constructur and various parsers. It
1051            is now modular and supports various file formats via plugins (see
1052            the new "Source" class).
1053    
1054    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1055    
1056            * man/: Revised documentation after previous code changes.
1057    
1058    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1059    
1060            * R/: Remaining changes as discussed with David.
1061    
1062    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1063    
1064            * R/: Some changes as suggested by David. The rest will follow
1065            within the next days.
1066    
1067    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1068    
1069            * man/: Finished documentation.
1070    
1071    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1072    
1073            * man/: Wrote some documentation.
1074    
1075    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1076    
1077            * R/: Further syntactic sugar in form of additional assignment and
1078            accessor methods.
1079    
1080    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1081    
1082            * R/: Syntactic sugar in form of "length", "show" and "summary"
1083            operators.
1084    
1085    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1086    
1087            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1088            and dissimilarities.
1089    
1090    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1091    
1092            * R/: Added similarity functions.
1093    
1094            * data/: Added english stopwords.
1095    
1096    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1097    
1098            * data/: Examples compiled for new features
1099    
1100            * R/: Changes due to new structure.
1101    
1102            * NAMESPACE: Corrected namespace to reflect new structure.
1103    
1104            * R/termdocmatrix.R: Adapted for new naming scheme.
1105    
1106    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1107    
1108            * R/textdoccol.R: Adapted code for new class structure. Wrote
1109            several transform and filter functions operating on text document
1110            collections (alias text document databases).
1111    
1112            * R/aobjects.R: Adapted class structure with inheritance,
1113            repositories and additional meta data. Loading files on demand is
1114            now possible.
1115    
1116    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1117    
1118            * R/: Some cosmetic cleanups.
1119    
1120            * inst/: Removed vignette on clustering. That and much more is now
1121            described in the JSS paper on text mining. Based upon that
1122            article an elaborated vignette will be incorporated in the future.
1123    
1124    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1125    
1126            * R/: Updated generic S4 methods to comply with signature changes
1127            in newer versions of R (> 2.3)
1128    
1129    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1130    
1131            * ext/R/importRIS.R: Automatic RIS import is now possible.
1132    
1133    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1134    
1135            * R/textdoccol.R: Added RIS HTML input format.
1136    
1137    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1138    
1139            * R/textdoccol.R: Removed bug that caused invalid text document
1140            collections when handling many input files.
1141    
1142    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1143    
1144            * R/textdoccol.R: Restructured and extended file import
1145            mechanism.
1146    
1147            * inst/doc/clustering.Rnw: Adapted vignette for use with
1148            ReutNews.rda
1149    
1150            * man/ReutNews.Rd: Documentation for ReutNews.rda
1151    
1152            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1153    
1154  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1155    
1156          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.1193

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge