SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC pkg/ChangeLog revision 1168, Wed Jan 11 10:35:44 2012 UTC
# Line 1  Line 1 
1    2012-01-11  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/matrix.R (termFreq): Fix processing of user provided
4            stopwords. Reported by Bettina GrĂ¼n.
5    
6    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/matrix.R (termFreq): Fix invalid handling of
9            control$wordLengths[1]. Reported by Steven C. Bagley.
10    
11    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
12    
13            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
14    
15    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
16    
17            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
18    
19    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
20    
21            * R/transform.R (removePunctuation): Add option to preserve
22            intra-word dashes.
23    
24    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
25    
26            * R/matrix.R (termFreq): Allow reordering of control option
27            processing.
28    
29    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
30    
31            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
32            pdfinfo tool.
33    
34            * inst/stopwords/SMART.dat: Add SMART information retrieval system
35            stopwords (which are also used by the MC toolkit).
36    
37            * R/matrix (termFreq): Allow local option \code{bounds$local} to
38            restrict how often a term may appear in each document (generalizes
39            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
40            for word length bounds (generalizes \code{minWordLength}).
41    
42            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
43            \code{bounds$global} for restricting how often a term is allowed
44            to appear in different documents.
45    
46            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
47            local options delegated internally to termFreq() and global
48            options which are processed by the term-document matrix
49            constructor itself.
50    
51    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
52    
53            * man/getTokenizers.Rd: Document getTokenizers().
54    
55            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
56    
57    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
58    
59            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
60    
61            * man/combine.Rd: Document c.term_frequency().
62    
63    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
66            can be accessed via '[' and not '[['.
67    
68    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
69    
70            * R/stopwords.R (stopwords): Raise an error if no stopwords are
71            available for requested language. Suggested by Derek M Jones.
72    
73    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
74    
75            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
76            normalization.
77    
78    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
79    
80            * R/transform.R (stemDocument.PlainTextDocument): Use language
81            argument.
82    
83    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
84    
85            * R/source.R: Store strings and connections instead of unevaluated
86            calls.
87    
88    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
91    
92    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
93    
94            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
95            (instead of a list element).
96    
97    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
100            documents by names (fallback to IDs if names are not set).
101    
102    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
105            \code{recursive} now determines whether existing corpus meta data
106            is used.
107    
108    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
109    
110            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
111    
112    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
115            remove terms not occurring in the corpus anymore.
116    
117    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
120            and Heaps' law.
121    
122    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
125            provided by a source.
126    
127    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
128    
129            * R/source.R (.Source): Provide document names.
130    
131    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/meta.R (`content_or_meta`): Utility function.
134    
135    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
136    
137            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
138            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
139    
140    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
141    
142            * R/weight.R (weightTfIdf): Added normalization option.
143    
144            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
145            analysis.
146    
147    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
148    
149            * R/score.R (tm_tag_score): Compute a score from the number of
150            tags matching in a document.
151    
152    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
153    
154            * R/complete.R (stemCompletion): New completion heuristics.
155    
156    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
157    
158            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
159    
160    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
161    
162            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
163            setOldClass(c(..., "list")) works.
164    
165    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
166    
167            * R/transform.R (stemDocument.character): In case input is a
168            simple character just delegate to the default Snowball stemmer.
169    
170    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
173            data.
174    
175    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/doc.R (`Content<-`): Be careful with names attribute.
178    
179    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
180    
181            * R/source.R (DirSource): Improved implementation especially when
182            handling many (> 1M) files.
183    
184    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
185    
186            * R/source.R (getElem.URISource): Use encoding argument.
187    
188    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
189    
190            * R/doc.R (setOldClass): Register S3 document classes to be
191            recognized by S4 methods.
192    
193    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
194    
195            * R/matrix.R (termFreq): Add option to remove punctuation
196            characters.
197    
198    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
199    
200            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
201            merging multiple term-document matrices.
202    
203    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
204    
205            * R/corpus.R (setOldClass): Register S3 corpus classes to be
206            recognized by S4 methods.
207    
208            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
209            that CRAN Mac OS X builds do not fail any longer.
210    
211    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
212    
213            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
214            of RWeka:AlphabeticTokenizer() as default.
215    
216    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
217    
218            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
219            caused words at the beginning or the end of a line not to be removed. Do
220            not delete whitespace anymore.
221    
222    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
223    
224            * R/source.R (DirSource): Default to working directory if no path
225            is specified.
226    
227    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
228    
229            * R/source.R (DirSource): Stop on empty directories.
230    
231    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
232    
233            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
234            named documents.
235    
236    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
237    
238            * R/transform.R (removeWords): Improve regular expressions.
239    
240    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
241    
242            * R/meta.R (DublinCore): Allow lower case tags.
243    
244    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
245    
246            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
247            instead of x$children.
248    
249    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
250    
251            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
252    
253    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
254    
255            * R/: Use S3 instead of S4 class system.
256    
257    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
258    
259            * R/reader.R (readMail): Moved to tm.plugin.mail package.
260    
261    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
262    
263            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
264            postings are basically e-mails with some extra headers.
265    
266    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
267    
268            * R/transform.R: Move convertMboxEml, removeCitation,
269            removeMultipart, and removeSignature to the tm.plugin.mail package
270            since they are mainly utility functions (for handling e-mails) and
271            not very framework specific.
272    
273    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
274    
275            * man/: Fix documentation.
276    
277    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
278    
279            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
280            plain text document instead of an XML document for texts of the
281            Reuters-21578 dataset.
282    
283            * R/sparse.R: Removed since the slam package is now available on
284            CRAN.
285    
286            * DESCRIPTION (Depends): Add slam package.
287    
288    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
289    
290            * R/transform.R (stemDoc): Fix character(0) handling.
291    
292    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
293    
294            * R/doc.R (show): Pretty print.
295    
296    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
297    
298            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
299            gracefully.
300    
301    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
302    
303            * R/corpus.R: Make corpus virtual. Implement corpus with standard
304            and permanent storage semantics.
305    
306            * DESCRIPTION: New major release. A *lot* of improvements.
307    
308    2009-05-04   Ingo Feinerer <feinerer@logic.at>
309    
310            * NAMESPACE: Export some simple_triplet_matrix functions.
311    
312    2009-04-28   Ingo Feinerer <feinerer@logic.at>
313    
314            * R/weight.R: Adapt tf-idf to new matrix format.
315    
316    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
317    
318            * R/matrix.R: Create two distinct classes for term-document and
319            document-term matrices.
320    
321    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/termdocmatrix.R: No longer use Matrix package. This reduces
324            package start-up time significantly.
325    
326    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
327    
328            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
329    
330    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
331    
332            * R/transform.R (tmReduce): Combine multiple maps into one
333            transformation.
334    
335    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
336    
337            * R/weight.R: Remove weightLogical since it does not return a
338            dgCMatrix.
339    
340            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
341            or TermDocumentMatrix instead.
342    
343    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
344    
345            * inst/doc/extensions.Rnw: Finished vignette.
346    
347    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
348    
349            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
350            DocumentTermMatrix representations.
351    
352    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
353    
354            * R/reader.R (readXML): New reader for arbitrary XML files.
355    
356    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
357    
358            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
359            (XMLSource): New XMLSource class for arbitrary XML files.
360            (Source): New slot Vectorized.
361    
362    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
363    
364            * R/reader.R (readTabular): Experimental reader for tabular data
365            structures which can be customized via user-defined mappings.
366    
367            * R/reader.R: Always use UTC time zone.
368    
369            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
370    
371    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
372    
373            * R/reader.R (readDOC): Options can be passed over to antiword.
374    
375            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
376            pdftotext.
377    
378    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
379    
380            * R/source.R (DirSource): Add pattern and ignore.case arguments
381            which are internally passed over to list.files().
382    
383    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
384    
385            * inst/doc/tm.Rnw: Suppress pointless loading message.
386    
387    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
388    
389            * DESCRIPTION: Speed up package loading (via moving packages not
390            strictly necessary for normal operation to Suggests instead of
391            Depends).
392    
393    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
394    
395            * R/reader.R (readNewsgroup): The date format is now configurable.
396    
397    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
398    
399            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
400    
401    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
402    
403            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
404    
405    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
406    
407            * R/source.R (DataframeSource): New source class for data frames.
408    
409            * R/source.R: Fixed non-standard call evaluation.
410    
411    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
412    
413            * R/source.R (URISource): New source class for a single document.
414    
415    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
416    
417            * R/source.R: Refactoring.
418    
419    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
420    
421            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
422            Rmpi installations more gracefully.
423    
424    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
425    
426            * R/source.R (Source): Add Length slot.
427    
428    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
429    
430            * R/AAA.R: Unify duplicated .onLoad function.
431    
432    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
433    
434            * DESCRIPTION (Suggests): Added Rmpi.
435    
436    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
437    
438            * R/source.R (getElem): Fix 'no visible binding' warning.
439    
440            * man/WeightFunction.Rd: Fix signature.
441    
442    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
443    
444            * R/weight.R: Introduce name abbreviations for weighting functions.
445    
446    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
447    
448            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
449    
450            * R/cluster.R: Provide convenience functions for using a MPI
451            cluster.
452    
453            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
454            available.
455    
456            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
457            available.
458    
459    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
460    
461            * R/textdoccol.R (lapply): Removed debug print out.
462    
463    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/reader.R (readRCV1): Improved meta data extraction from
466            Reuters Corpus Volume 1 documents.
467    
468    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/transform.R: Ensure that all mappings preserve multiline
471            structures.
472    
473    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/filter.R: Every filter has now an attribute indicating whether
476            it sould be applied to document level (doclevel).
477    
478            * R/textdoccol.R (tmFilter): Set searchFullText as new default
479            filter.
480    
481    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * R/transform.R (replacePatterns): Replaced removeWords by
484            replacePatterns. Suggested by Christian Buchta.
485    
486            * R/textdoccol.R (inspect): Improved formatting.
487    
488    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
489    
490            * inst/CITATION: Updated JSS article information.
491    
492            * R/textdoccol.R (setAs): Added coerce method from list to
493            corpus.
494    
495            * R/meta.R (meta): Improved meta data handling.
496    
497    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
498    
499            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
500            Christian Buchta.
501    
502            * inst/CITATION: Added template to include JSS article reference.
503    
504    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
505    
506            * R/textdoccol.R (tmMap): Introduced lazy mapping.
507    
508            * R/source.R: Added VectorSource.
509    
510    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
511    
512            * man/: Language codes should be in ISO 639-1 format.
513    
514            * R/textdoccol.R (asPlain): Preserve local meta data.
515    
516    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/textdoccol.R (writeCorpus): Function for writing a corpus
519            containing plain text documents to disk.
520    
521    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
524            always set correctly.
525    
526            * R/textdoccol.R: Set load = TRUE as default for load on demand
527            since in most cases this is the wanted behaviour.
528    
529    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
530    
531            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
532    
533            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
534    
535    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
536    
537            * R/meta.R (meta): New function for consistent access to meta data
538            of document collections, repositories, and texts.
539    
540    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
541    
542            * R/: Better support for encodings.
543    
544    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
545    
546            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
547            selection when no reader argument is given.
548    
549    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
550    
551            * R/source.R (CSVSource): Now uses read.csv instead of scan
552            internally.
553    
554    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * R/reader.R (getReaders): Returns available reader functions.
557    
558            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
559            as default.
560    
561    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
562    
563            * R/stopwords.R (stopwords): Shortened code, removed codetools
564            variable warnings.
565    
566            * man/: Documentation for showMeta, added an example for tmMap.
567    
568            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
569            some minor typos fixed.
570    
571    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
572    
573            * R/aobjects.R (showMeta): Added method for pretty printing a
574            text document's meta data.
575    
576    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
577    
578            * R/textdoccol.R (TextDocCol): Better handling of empty
579            arguments.
580    
581            * NAMESPACE: Exported readDOC.
582    
583            * man/completeStems.Rd: Added an example.
584    
585    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
586    
587            * R/stopwords.R (stopwords): Look up .dat files at every
588            call. Allows users to modify stopword .dat files interactively.
589    
590    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
591    
592            * R/termdocmatrix.R (termFreq): Correct processing of empty
593            documents.
594    
595    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
596    
597            * man/: Updated documentation.
598    
599    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
600    
601            * R/complete.R (completeStems): Completes (heuristically) word
602            stems.
603    
604            * R/termdocmatrix.R (TermDocMatrix2): New modular
605            constructor.
606    
607            * NAMESPACE: Exported termFreq.
608    
609    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
610    
611            * R/reader.R (readDOC): Added MS Word reader (using antiword).
612    
613    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
614    
615            * R/weight.R: Weighting functions for TermDocMatrix.
616    
617    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
618    
619            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
620            functions for accessing dimension, column, and row names.
621    
622            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
623    
624    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
625    
626            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
627    
628    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
629    
630            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
631    
632    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
633    
634            * R/reader.R (readPDF): Removed manual checks for pdftotext and
635            pdfinfo. The system call gives a warning anyway.
636    
637    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
638    
639            * R/textdoccol.R (asPlain): Conversion from
640            StructuredTextDocuments to PlainTextDocuments.
641    
642    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
643    
644            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
645            for accessing term-document matrices.
646    
647            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
648            are installed.
649    
650    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
651    
652            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
653            Christian Buchta.
654    
655    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
658    
659    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
660    
661            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
662    
663            * R/reader.R (readPDF): Added PDF reader.
664    
665    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
666    
667            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
668    
669            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
670    
671            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
672    
673            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
674    
675    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * R/distmeasure.R (dissimilarity): Replaced dists call from
678            package cba by new dist call from package proxy.
679    
680    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
681    
682            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
683    
684    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * R/termdocmatrix.R: require() uses the quietly option to suppress
687            loading messages.
688    
689    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
690    
691            * R/dictionary.R: Added dictionary support.
692    
693    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
694    
695            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
696            documents. This simplifies some functions, e.g., asPlain.
697    
698    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
699    
700            * inst/doc/tm.Rnw: Fixed some typos in vignette.
701    
702    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
703    
704            * R/textdoccol.R (replaceWords): Added method to replace a set of
705            words by a single word. Useful for synonyms.
706    
707    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
708    
709            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
710    
711    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
712    
713            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
714            vectors. Thanks to Ariel Maguyon for his error report.
715            (removeSparseTerms): New function to remove columns from a
716            term-document matrix exceeding a sparse factor.
717    
718    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
719    
720            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
721    
722    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
723    
724            * man/sFilter.Rd: Corrected documentation on statement format (use
725            '==' instead of '=').
726    
727    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
728    
729            * R/aobjects.R (StructuredTextDocument): Inherits from
730            TextDocument.
731    
732    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
733    
734            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
735            on sparse matrices as proposed by Martin Maechler.
736    
737    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
738    
739            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
740            \pkg{filehash} version makes them deprecated.
741    
742    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
743    
744            * R/termdocmatrix.R (textvector): Stemming is now performed before
745            erasing stopwords.
746            (weightMatrix): Adapted to handle sparse matrices.
747            (TermDocMatrix): Sparse matrix is now efficiently built by
748            direct stepwise insertion of row values into it.
749    
750    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
751    
752            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
753            due to ongoing problems. For our purposes the latter is as useful
754            as the replaced package.
755    
756    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
757    
758            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
759    
760            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
761    
762    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
763    
764            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
765            languages with available stopwords.
766    
767    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
768    
769            * inst/doc/tm.Rnw: Minor corrections in the vignette.
770    
771    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
772    
773            * DESCRIPTION: Update to version 0.2, since a lot of new features
774            have been integrated.
775    
776            * inst/stopwords: Updated existing stopwords and added stopwords
777            for various other languages.
778    
779    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
780    
781            * man/: Updated documentation.
782    
783            * Work/testDb.R: Script to test database stuff.
784    
785            * R/: Fixed various database related bugs. Seems to be rather
786            useable now, i.e., consider as alpha status for now.
787    
788    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
789    
790            * R/: Fixed some bugs related to database support.
791    
792    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
793    
794            * man/: Added a lot of examples to the manuals.
795    
796    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
797    
798            * man/: Updated parts of the documentation.
799    
800            * R/textdoccol.R (asPlain): Added conversion from newsgroup
801            documents to plain text documents.
802    
803    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
804    
805            * R/textdoccol.R: Finished experimental database support. Not yet
806            intensively tested.
807    
808            * R/source.R: Now each source has a default reader.
809    
810            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
811            class anymore.
812    
813            * R/plaintextdoc.R: Custom show method for plain text documents.
814    
815            * R/aobjects.R: Added a class for structured text documents.
816    
817            * R/reader.R: Replaced remaining \code{parser} occurrences with
818            \code{reader}.
819    
820            * R/textdoccol.R (summary): Indent tags.
821    
822            * R/textdoccol.R (removePunctuation): Transform method to remove
823            punctuation marks.
824    
825    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
826    
827            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
828            using prescindMeta().
829    
830    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
831    
832            * R/textdoccol.R: Improved database support.
833    
834    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
835    
836            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
837    
838            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
839            language code.
840    
841            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
842            into parserControl argument.
843    
844            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
845    
846    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
847    
848            * Work/tmDataSetup.R: The datasets acq and crude can now be
849            created on the fly.
850    
851            * R/stopwords.R: Introduced a function returning the stopwords for
852            a given language (English, German and French at the moment)
853    
854            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
855            otherwise falls back to Snowball package.
856    
857    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
858    
859            * man/dissimilarity-methods.Rd: Make clear that any method offered
860            by "dists" from package "cba" can be used.
861    
862    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
863    
864            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
865            to Kurt's latex suggestion. Removed points and underscores in
866            variable names for consistent naming.
867    
868            * DESCRIPTION: Update to version 0.1-2.
869    
870            * man/TextRepository.Rd: Fixed bug in documentation.
871    
872    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
873    
874            * DESCRIPTION: Update to version 0.1-1.
875    
876    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
877    
878            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
879            wordStem.
880    
881    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
882    
883            * R/: Changes due to Kurt's review.
884    
885    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
886    
887            * R/: Implemented improvements based upon comments by David
888            Meyer.
889    
890    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
891    
892            * inst/doc/: Rewrote vignette.
893    
894            * man/: Improved documentation.
895    
896    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
897    
898            * man/: Updated documentation.
899    
900            * DESCRIPTION: Changed package name to "tm". Updated version to
901            0.1 for first CRAN release.
902    
903            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
904            list archive example.
905    
906            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
907            archive example.
908    
909            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
910            from (several mails per box) mbox format to (single mail per file)
911            eml format.
912    
913    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
914    
915            * data/crude.rda: Rebuilt.
916    
917            * data/acq.rda: Rebuilt.
918    
919            * R/reader.R: Factored out reader and parser methods from
920            textdoccol.R.
921    
922            * R/source.R: Factored out Source methods from aobjects.R and
923            textdoccol.R.
924            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
925            feeds.
926    
927            * R/textdoccol.R (DirSource): Added support for recursive
928            traversal of directories.
929    
930    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
931    
932            * R/textdoccol.R ([[): Loads the document corpus automatically
933            into memory upon access.
934            (tm_transform, tm_filter): Removed several checks whether the
935            document is already loaded ([[ ensures this now).
936            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
937            mailing list archive.
938    
939    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
940    
941            * R/aobjects.R (TextDocument): Is now a virtual class.
942            (Source): Is now a virtual class.
943    
944    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
945    
946            * R/textdoccol.R (c): Support for an arbitrary number of document
947            collections.
948    
949    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
950    
951            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
952            append_meta and remove_meta.
953    
954            * R/textdoccol.R: Removed modify_metadata method.
955    
956            * R/textrepo.R: Removed modify_metadata method.
957    
958            * R/textdoccol.R (remove_meta): Supports removal of document
959            collection metadata and document (= in data frame) metadata.
960    
961    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
962    
963            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
964    
965            * data/crude.rda: Rebuilt.
966    
967            * data/acq.rda: Rebuilt.
968    
969            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
970    
971            * R/textdoccol.R ([): Bug fix for subsetting a document
972            collection's data frame.
973    
974    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
975    
976            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
977            to s_filter.
978    
979            * R/textdoccol.R: Local text documents' metadata can now be copied
980            to a document collection's data frame with prescind_meta.
981    
982    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
983    
984            * R/: Text documents' slot metadata is now accessible in s_filter.
985    
986            * R/: Rewrote s_filter function (has still some restrictions).
987    
988    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
989    
990            * R/: Various fixes in handling metadata.
991    
992            * R/: Added update mechanism for text document collections.
993    
994    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
995    
996            * R/: Merging of document collections now creates a binary tree
997            for reconstructing merged document collections.
998    
999            * R/: Redesign of metadata for document collections.
1000    
1001    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1002    
1003            * R/: Messages now use \code{ngettext}.
1004    
1005    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1006    
1007            * R/: Added functions for modifying and removing metadata.
1008    
1009    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1010    
1011            * man/: Updated some documentation.
1012    
1013            * R/: Corrected some connection issues.
1014    
1015            * inst/doc: Worked on the vignette.
1016    
1017    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1018    
1019            * inst/: Added texts and started vignette.
1020    
1021            * R/: Final changes based upon David's comments.
1022    
1023    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1024    
1025            * NAMESPACE: Corrected exports (generic methods need exportMethods
1026            directives!).
1027    
1028    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1029    
1030            * R/: Modified the TextDocCol constructur and various parsers. It
1031            is now modular and supports various file formats via plugins (see
1032            the new "Source" class).
1033    
1034    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1035    
1036            * man/: Revised documentation after previous code changes.
1037    
1038    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1039    
1040            * R/: Remaining changes as discussed with David.
1041    
1042    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1043    
1044            * R/: Some changes as suggested by David. The rest will follow
1045            within the next days.
1046    
1047    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1048    
1049            * man/: Finished documentation.
1050    
1051    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1052    
1053            * man/: Wrote some documentation.
1054    
1055    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1056    
1057            * R/: Further syntactic sugar in form of additional assignment and
1058            accessor methods.
1059    
1060    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1061    
1062            * R/: Syntactic sugar in form of "length", "show" and "summary"
1063            operators.
1064    
1065    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1066    
1067            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1068            and dissimilarities.
1069    
1070    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1071    
1072            * R/: Added similarity functions.
1073    
1074            * data/: Added english stopwords.
1075    
1076    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1077    
1078            * data/: Examples compiled for new features
1079    
1080            * R/: Changes due to new structure.
1081    
1082            * NAMESPACE: Corrected namespace to reflect new structure.
1083    
1084            * R/termdocmatrix.R: Adapted for new naming scheme.
1085    
1086    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1087    
1088            * R/textdoccol.R: Adapted code for new class structure. Wrote
1089            several transform and filter functions operating on text document
1090            collections (alias text document databases).
1091    
1092            * R/aobjects.R: Adapted class structure with inheritance,
1093            repositories and additional meta data. Loading files on demand is
1094            now possible.
1095    
1096    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1097    
1098            * R/: Some cosmetic cleanups.
1099    
1100            * inst/: Removed vignette on clustering. That and much more is now
1101            described in the JSS paper on text mining. Based upon that
1102            article an elaborated vignette will be incorporated in the future.
1103    
1104    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1105    
1106            * R/: Updated generic S4 methods to comply with signature changes
1107            in newer versions of R (> 2.3)
1108    
1109    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1110    
1111            * ext/R/importRIS.R: Automatic RIS import is now possible.
1112    
1113    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1114    
1115            * R/textdoccol.R: Added RIS HTML input format.
1116    
1117    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1118    
1119            * R/textdoccol.R: Removed bug that caused invalid text document
1120            collections when handling many input files.
1121    
1122    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1123    
1124            * R/textdoccol.R: Restructured and extended file import
1125            mechanism.
1126    
1127            * inst/doc/clustering.Rnw: Adapted vignette for use with
1128            ReutNews.rda
1129    
1130            * man/ReutNews.Rd: Documentation for ReutNews.rda
1131    
1132            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1133    
1134  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1135    
1136          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.1168

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge