SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC pkg/ChangeLog revision 1188, Fri Jul 27 08:47:50 2012 UTC
# Line 1  Line 1 
1    2012-07-27 Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/transform.R (removeWords): Allow longer stopword lists.
4    
5    2012-01-31  Ingo Feinerer  <feinerer@logic.at>
6    
7            * R/reader.R (readXML): Readers can now set the document language
8            themselves.
9    
10    2012-01-14  Ingo Feinerer  <feinerer@logic.at>
11    
12            * R/source.R (XMLSource, getElem.XMLSource): Simplifications as
13            proposed by Milan Bouchet-Valat.
14    
15    2012-01-11  Ingo Feinerer  <feinerer@logic.at>
16    
17            * R/matrix.R (termFreq): Fix processing of user provided
18            stopwords. Reported by Bettina GrĂ¼n.
19    
20    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
21    
22            * R/matrix.R (termFreq): Fix invalid handling of
23            control$wordLengths[1]. Reported by Steven C. Bagley.
24    
25    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
26    
27            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
28    
29    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
30    
31            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
32    
33    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
34    
35            * R/transform.R (removePunctuation): Add option to preserve
36            intra-word dashes.
37    
38    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
39    
40            * R/matrix.R (termFreq): Allow reordering of control option
41            processing.
42    
43    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
44    
45            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
46            pdfinfo tool.
47    
48            * inst/stopwords/SMART.dat: Add SMART information retrieval system
49            stopwords (which are also used by the MC toolkit).
50    
51            * R/matrix (termFreq): Allow local option \code{bounds$local} to
52            restrict how often a term may appear in each document (generalizes
53            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
54            for word length bounds (generalizes \code{minWordLength}).
55    
56            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
57            \code{bounds$global} for restricting how often a term is allowed
58            to appear in different documents.
59    
60            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
61            local options delegated internally to termFreq() and global
62            options which are processed by the term-document matrix
63            constructor itself.
64    
65    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
66    
67            * man/getTokenizers.Rd: Document getTokenizers().
68    
69            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
70    
71    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
72    
73            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
74    
75            * man/combine.Rd: Document c.term_frequency().
76    
77    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
78    
79            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
80            can be accessed via '[' and not '[['.
81    
82    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
83    
84            * R/stopwords.R (stopwords): Raise an error if no stopwords are
85            available for requested language. Suggested by Derek M Jones.
86    
87    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
88    
89            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
90            normalization.
91    
92    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
93    
94            * R/transform.R (stemDocument.PlainTextDocument): Use language
95            argument.
96    
97    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/source.R: Store strings and connections instead of unevaluated
100            calls.
101    
102    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
105    
106    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
107    
108            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
109            (instead of a list element).
110    
111    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
112    
113            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
114            documents by names (fallback to IDs if names are not set).
115    
116    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
117    
118            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
119            \code{recursive} now determines whether existing corpus meta data
120            is used.
121    
122    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
125    
126    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
127    
128            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
129            remove terms not occurring in the corpus anymore.
130    
131    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
134            and Heaps' law.
135    
136    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
137    
138            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
139            provided by a source.
140    
141    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
142    
143            * R/source.R (.Source): Provide document names.
144    
145    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
146    
147            * R/meta.R (`content_or_meta`): Utility function.
148    
149    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
150    
151            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
152            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
153    
154    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
155    
156            * R/weight.R (weightTfIdf): Added normalization option.
157    
158            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
159            analysis.
160    
161    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
162    
163            * R/score.R (tm_tag_score): Compute a score from the number of
164            tags matching in a document.
165    
166    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/complete.R (stemCompletion): New completion heuristics.
169    
170    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
173    
174    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
175    
176            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
177            setOldClass(c(..., "list")) works.
178    
179    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
180    
181            * R/transform.R (stemDocument.character): In case input is a
182            simple character just delegate to the default Snowball stemmer.
183    
184    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
185    
186            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
187            data.
188    
189    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
190    
191            * R/doc.R (`Content<-`): Be careful with names attribute.
192    
193    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
194    
195            * R/source.R (DirSource): Improved implementation especially when
196            handling many (> 1M) files.
197    
198    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
199    
200            * R/source.R (getElem.URISource): Use encoding argument.
201    
202    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
203    
204            * R/doc.R (setOldClass): Register S3 document classes to be
205            recognized by S4 methods.
206    
207    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
208    
209            * R/matrix.R (termFreq): Add option to remove punctuation
210            characters.
211    
212    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
213    
214            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
215            merging multiple term-document matrices.
216    
217    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
218    
219            * R/corpus.R (setOldClass): Register S3 corpus classes to be
220            recognized by S4 methods.
221    
222            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
223            that CRAN Mac OS X builds do not fail any longer.
224    
225    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
226    
227            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
228            of RWeka:AlphabeticTokenizer() as default.
229    
230    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
231    
232            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
233            caused words at the beginning or the end of a line not to be removed. Do
234            not delete whitespace anymore.
235    
236    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
237    
238            * R/source.R (DirSource): Default to working directory if no path
239            is specified.
240    
241    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
242    
243            * R/source.R (DirSource): Stop on empty directories.
244    
245    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
246    
247            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
248            named documents.
249    
250    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
251    
252            * R/transform.R (removeWords): Improve regular expressions.
253    
254    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
255    
256            * R/meta.R (DublinCore): Allow lower case tags.
257    
258    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
259    
260            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
261            instead of x$children.
262    
263    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
264    
265            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
266    
267    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
268    
269            * R/: Use S3 instead of S4 class system.
270    
271    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
272    
273            * R/reader.R (readMail): Moved to tm.plugin.mail package.
274    
275    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
276    
277            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
278            postings are basically e-mails with some extra headers.
279    
280    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
281    
282            * R/transform.R: Move convertMboxEml, removeCitation,
283            removeMultipart, and removeSignature to the tm.plugin.mail package
284            since they are mainly utility functions (for handling e-mails) and
285            not very framework specific.
286    
287    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
288    
289            * man/: Fix documentation.
290    
291    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
292    
293            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
294            plain text document instead of an XML document for texts of the
295            Reuters-21578 dataset.
296    
297            * R/sparse.R: Removed since the slam package is now available on
298            CRAN.
299    
300            * DESCRIPTION (Depends): Add slam package.
301    
302    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
303    
304            * R/transform.R (stemDoc): Fix character(0) handling.
305    
306    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
307    
308            * R/doc.R (show): Pretty print.
309    
310    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
311    
312            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
313            gracefully.
314    
315    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
316    
317            * R/corpus.R: Make corpus virtual. Implement corpus with standard
318            and permanent storage semantics.
319    
320            * DESCRIPTION: New major release. A *lot* of improvements.
321    
322    2009-05-04   Ingo Feinerer <feinerer@logic.at>
323    
324            * NAMESPACE: Export some simple_triplet_matrix functions.
325    
326    2009-04-28   Ingo Feinerer <feinerer@logic.at>
327    
328            * R/weight.R: Adapt tf-idf to new matrix format.
329    
330    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
331    
332            * R/matrix.R: Create two distinct classes for term-document and
333            document-term matrices.
334    
335    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
336    
337            * R/termdocmatrix.R: No longer use Matrix package. This reduces
338            package start-up time significantly.
339    
340    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
341    
342            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
343    
344    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
345    
346            * R/transform.R (tmReduce): Combine multiple maps into one
347            transformation.
348    
349    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
350    
351            * R/weight.R: Remove weightLogical since it does not return a
352            dgCMatrix.
353    
354            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
355            or TermDocumentMatrix instead.
356    
357    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
358    
359            * inst/doc/extensions.Rnw: Finished vignette.
360    
361    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
362    
363            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
364            DocumentTermMatrix representations.
365    
366    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
367    
368            * R/reader.R (readXML): New reader for arbitrary XML files.
369    
370    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
371    
372            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
373            (XMLSource): New XMLSource class for arbitrary XML files.
374            (Source): New slot Vectorized.
375    
376    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
377    
378            * R/reader.R (readTabular): Experimental reader for tabular data
379            structures which can be customized via user-defined mappings.
380    
381            * R/reader.R: Always use UTC time zone.
382    
383            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
384    
385    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
386    
387            * R/reader.R (readDOC): Options can be passed over to antiword.
388    
389            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
390            pdftotext.
391    
392    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
393    
394            * R/source.R (DirSource): Add pattern and ignore.case arguments
395            which are internally passed over to list.files().
396    
397    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
398    
399            * inst/doc/tm.Rnw: Suppress pointless loading message.
400    
401    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
402    
403            * DESCRIPTION: Speed up package loading (via moving packages not
404            strictly necessary for normal operation to Suggests instead of
405            Depends).
406    
407    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
408    
409            * R/reader.R (readNewsgroup): The date format is now configurable.
410    
411    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
412    
413            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
414    
415    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
416    
417            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
418    
419    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
420    
421            * R/source.R (DataframeSource): New source class for data frames.
422    
423            * R/source.R: Fixed non-standard call evaluation.
424    
425    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
426    
427            * R/source.R (URISource): New source class for a single document.
428    
429    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
430    
431            * R/source.R: Refactoring.
432    
433    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
434    
435            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
436            Rmpi installations more gracefully.
437    
438    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
439    
440            * R/source.R (Source): Add Length slot.
441    
442    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
443    
444            * R/AAA.R: Unify duplicated .onLoad function.
445    
446    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
447    
448            * DESCRIPTION (Suggests): Added Rmpi.
449    
450    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
451    
452            * R/source.R (getElem): Fix 'no visible binding' warning.
453    
454            * man/WeightFunction.Rd: Fix signature.
455    
456    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
457    
458            * R/weight.R: Introduce name abbreviations for weighting functions.
459    
460    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
461    
462            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
463    
464            * R/cluster.R: Provide convenience functions for using a MPI
465            cluster.
466    
467            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
468            available.
469    
470            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
471            available.
472    
473    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
474    
475            * R/textdoccol.R (lapply): Removed debug print out.
476    
477    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
478    
479            * R/reader.R (readRCV1): Improved meta data extraction from
480            Reuters Corpus Volume 1 documents.
481    
482    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * R/transform.R: Ensure that all mappings preserve multiline
485            structures.
486    
487    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
488    
489            * R/filter.R: Every filter has now an attribute indicating whether
490            it sould be applied to document level (doclevel).
491    
492            * R/textdoccol.R (tmFilter): Set searchFullText as new default
493            filter.
494    
495    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497            * R/transform.R (replacePatterns): Replaced removeWords by
498            replacePatterns. Suggested by Christian Buchta.
499    
500            * R/textdoccol.R (inspect): Improved formatting.
501    
502    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
503    
504            * inst/CITATION: Updated JSS article information.
505    
506            * R/textdoccol.R (setAs): Added coerce method from list to
507            corpus.
508    
509            * R/meta.R (meta): Improved meta data handling.
510    
511    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
512    
513            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
514            Christian Buchta.
515    
516            * inst/CITATION: Added template to include JSS article reference.
517    
518    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
519    
520            * R/textdoccol.R (tmMap): Introduced lazy mapping.
521    
522            * R/source.R: Added VectorSource.
523    
524    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
525    
526            * man/: Language codes should be in ISO 639-1 format.
527    
528            * R/textdoccol.R (asPlain): Preserve local meta data.
529    
530    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
531    
532            * R/textdoccol.R (writeCorpus): Function for writing a corpus
533            containing plain text documents to disk.
534    
535    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
536    
537            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
538            always set correctly.
539    
540            * R/textdoccol.R: Set load = TRUE as default for load on demand
541            since in most cases this is the wanted behaviour.
542    
543    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
546    
547            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
548    
549    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
550    
551            * R/meta.R (meta): New function for consistent access to meta data
552            of document collections, repositories, and texts.
553    
554    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * R/: Better support for encodings.
557    
558    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
559    
560            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
561            selection when no reader argument is given.
562    
563    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
564    
565            * R/source.R (CSVSource): Now uses read.csv instead of scan
566            internally.
567    
568    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/reader.R (getReaders): Returns available reader functions.
571    
572            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
573            as default.
574    
575    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
576    
577            * R/stopwords.R (stopwords): Shortened code, removed codetools
578            variable warnings.
579    
580            * man/: Documentation for showMeta, added an example for tmMap.
581    
582            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
583            some minor typos fixed.
584    
585    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
586    
587            * R/aobjects.R (showMeta): Added method for pretty printing a
588            text document's meta data.
589    
590    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
591    
592            * R/textdoccol.R (TextDocCol): Better handling of empty
593            arguments.
594    
595            * NAMESPACE: Exported readDOC.
596    
597            * man/completeStems.Rd: Added an example.
598    
599    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
600    
601            * R/stopwords.R (stopwords): Look up .dat files at every
602            call. Allows users to modify stopword .dat files interactively.
603    
604    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
605    
606            * R/termdocmatrix.R (termFreq): Correct processing of empty
607            documents.
608    
609    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
610    
611            * man/: Updated documentation.
612    
613    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
614    
615            * R/complete.R (completeStems): Completes (heuristically) word
616            stems.
617    
618            * R/termdocmatrix.R (TermDocMatrix2): New modular
619            constructor.
620    
621            * NAMESPACE: Exported termFreq.
622    
623    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
624    
625            * R/reader.R (readDOC): Added MS Word reader (using antiword).
626    
627    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
628    
629            * R/weight.R: Weighting functions for TermDocMatrix.
630    
631    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
632    
633            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
634            functions for accessing dimension, column, and row names.
635    
636            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
637    
638    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
639    
640            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
641    
642    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
643    
644            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
645    
646    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
647    
648            * R/reader.R (readPDF): Removed manual checks for pdftotext and
649            pdfinfo. The system call gives a warning anyway.
650    
651    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
652    
653            * R/textdoccol.R (asPlain): Conversion from
654            StructuredTextDocuments to PlainTextDocuments.
655    
656    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
657    
658            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
659            for accessing term-document matrices.
660    
661            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
662            are installed.
663    
664    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
665    
666            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
667            Christian Buchta.
668    
669    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
670    
671            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
672    
673    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
674    
675            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
676    
677            * R/reader.R (readPDF): Added PDF reader.
678    
679    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
680    
681            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
682    
683            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
684    
685            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
686    
687            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
688    
689    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
690    
691            * R/distmeasure.R (dissimilarity): Replaced dists call from
692            package cba by new dist call from package proxy.
693    
694    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
695    
696            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
697    
698    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
699    
700            * R/termdocmatrix.R: require() uses the quietly option to suppress
701            loading messages.
702    
703    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
704    
705            * R/dictionary.R: Added dictionary support.
706    
707    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
708    
709            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
710            documents. This simplifies some functions, e.g., asPlain.
711    
712    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
713    
714            * inst/doc/tm.Rnw: Fixed some typos in vignette.
715    
716    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
717    
718            * R/textdoccol.R (replaceWords): Added method to replace a set of
719            words by a single word. Useful for synonyms.
720    
721    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
722    
723            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
724    
725    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
726    
727            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
728            vectors. Thanks to Ariel Maguyon for his error report.
729            (removeSparseTerms): New function to remove columns from a
730            term-document matrix exceeding a sparse factor.
731    
732    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
733    
734            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
735    
736    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
737    
738            * man/sFilter.Rd: Corrected documentation on statement format (use
739            '==' instead of '=').
740    
741    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
742    
743            * R/aobjects.R (StructuredTextDocument): Inherits from
744            TextDocument.
745    
746    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
747    
748            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
749            on sparse matrices as proposed by Martin Maechler.
750    
751    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
752    
753            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
754            \pkg{filehash} version makes them deprecated.
755    
756    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
757    
758            * R/termdocmatrix.R (textvector): Stemming is now performed before
759            erasing stopwords.
760            (weightMatrix): Adapted to handle sparse matrices.
761            (TermDocMatrix): Sparse matrix is now efficiently built by
762            direct stepwise insertion of row values into it.
763    
764    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
765    
766            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
767            due to ongoing problems. For our purposes the latter is as useful
768            as the replaced package.
769    
770    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
771    
772            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
773    
774            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
775    
776    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
777    
778            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
779            languages with available stopwords.
780    
781    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
782    
783            * inst/doc/tm.Rnw: Minor corrections in the vignette.
784    
785    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
786    
787            * DESCRIPTION: Update to version 0.2, since a lot of new features
788            have been integrated.
789    
790            * inst/stopwords: Updated existing stopwords and added stopwords
791            for various other languages.
792    
793    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
794    
795            * man/: Updated documentation.
796    
797            * Work/testDb.R: Script to test database stuff.
798    
799            * R/: Fixed various database related bugs. Seems to be rather
800            useable now, i.e., consider as alpha status for now.
801    
802    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
803    
804            * R/: Fixed some bugs related to database support.
805    
806    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
807    
808            * man/: Added a lot of examples to the manuals.
809    
810    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
811    
812            * man/: Updated parts of the documentation.
813    
814            * R/textdoccol.R (asPlain): Added conversion from newsgroup
815            documents to plain text documents.
816    
817    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
818    
819            * R/textdoccol.R: Finished experimental database support. Not yet
820            intensively tested.
821    
822            * R/source.R: Now each source has a default reader.
823    
824            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
825            class anymore.
826    
827            * R/plaintextdoc.R: Custom show method for plain text documents.
828    
829            * R/aobjects.R: Added a class for structured text documents.
830    
831            * R/reader.R: Replaced remaining \code{parser} occurrences with
832            \code{reader}.
833    
834            * R/textdoccol.R (summary): Indent tags.
835    
836            * R/textdoccol.R (removePunctuation): Transform method to remove
837            punctuation marks.
838    
839    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
840    
841            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
842            using prescindMeta().
843    
844    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
845    
846            * R/textdoccol.R: Improved database support.
847    
848    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
849    
850            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
851    
852            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
853            language code.
854    
855            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
856            into parserControl argument.
857    
858            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
859    
860    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
861    
862            * Work/tmDataSetup.R: The datasets acq and crude can now be
863            created on the fly.
864    
865            * R/stopwords.R: Introduced a function returning the stopwords for
866            a given language (English, German and French at the moment)
867    
868            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
869            otherwise falls back to Snowball package.
870    
871    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
872    
873            * man/dissimilarity-methods.Rd: Make clear that any method offered
874            by "dists" from package "cba" can be used.
875    
876    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
877    
878            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
879            to Kurt's latex suggestion. Removed points and underscores in
880            variable names for consistent naming.
881    
882            * DESCRIPTION: Update to version 0.1-2.
883    
884            * man/TextRepository.Rd: Fixed bug in documentation.
885    
886    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
887    
888            * DESCRIPTION: Update to version 0.1-1.
889    
890    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
891    
892            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
893            wordStem.
894    
895    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
896    
897            * R/: Changes due to Kurt's review.
898    
899    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
900    
901            * R/: Implemented improvements based upon comments by David
902            Meyer.
903    
904    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
905    
906            * inst/doc/: Rewrote vignette.
907    
908            * man/: Improved documentation.
909    
910    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
911    
912            * man/: Updated documentation.
913    
914            * DESCRIPTION: Changed package name to "tm". Updated version to
915            0.1 for first CRAN release.
916    
917            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
918            list archive example.
919    
920            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
921            archive example.
922    
923            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
924            from (several mails per box) mbox format to (single mail per file)
925            eml format.
926    
927    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
928    
929            * data/crude.rda: Rebuilt.
930    
931            * data/acq.rda: Rebuilt.
932    
933            * R/reader.R: Factored out reader and parser methods from
934            textdoccol.R.
935    
936            * R/source.R: Factored out Source methods from aobjects.R and
937            textdoccol.R.
938            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
939            feeds.
940    
941            * R/textdoccol.R (DirSource): Added support for recursive
942            traversal of directories.
943    
944    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
945    
946            * R/textdoccol.R ([[): Loads the document corpus automatically
947            into memory upon access.
948            (tm_transform, tm_filter): Removed several checks whether the
949            document is already loaded ([[ ensures this now).
950            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
951            mailing list archive.
952    
953    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
954    
955            * R/aobjects.R (TextDocument): Is now a virtual class.
956            (Source): Is now a virtual class.
957    
958    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
959    
960            * R/textdoccol.R (c): Support for an arbitrary number of document
961            collections.
962    
963    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
964    
965            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
966            append_meta and remove_meta.
967    
968            * R/textdoccol.R: Removed modify_metadata method.
969    
970            * R/textrepo.R: Removed modify_metadata method.
971    
972            * R/textdoccol.R (remove_meta): Supports removal of document
973            collection metadata and document (= in data frame) metadata.
974    
975    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
976    
977            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
978    
979            * data/crude.rda: Rebuilt.
980    
981            * data/acq.rda: Rebuilt.
982    
983            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
984    
985            * R/textdoccol.R ([): Bug fix for subsetting a document
986            collection's data frame.
987    
988    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
989    
990            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
991            to s_filter.
992    
993            * R/textdoccol.R: Local text documents' metadata can now be copied
994            to a document collection's data frame with prescind_meta.
995    
996    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
997    
998            * R/: Text documents' slot metadata is now accessible in s_filter.
999    
1000            * R/: Rewrote s_filter function (has still some restrictions).
1001    
1002    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1003    
1004            * R/: Various fixes in handling metadata.
1005    
1006            * R/: Added update mechanism for text document collections.
1007    
1008    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1009    
1010            * R/: Merging of document collections now creates a binary tree
1011            for reconstructing merged document collections.
1012    
1013            * R/: Redesign of metadata for document collections.
1014    
1015    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1016    
1017            * R/: Messages now use \code{ngettext}.
1018    
1019    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1020    
1021            * R/: Added functions for modifying and removing metadata.
1022    
1023    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1024    
1025            * man/: Updated some documentation.
1026    
1027            * R/: Corrected some connection issues.
1028    
1029            * inst/doc: Worked on the vignette.
1030    
1031    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1032    
1033            * inst/: Added texts and started vignette.
1034    
1035            * R/: Final changes based upon David's comments.
1036    
1037    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            * NAMESPACE: Corrected exports (generic methods need exportMethods
1040            directives!).
1041    
1042    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1043    
1044            * R/: Modified the TextDocCol constructur and various parsers. It
1045            is now modular and supports various file formats via plugins (see
1046            the new "Source" class).
1047    
1048    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1049    
1050            * man/: Revised documentation after previous code changes.
1051    
1052    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1053    
1054            * R/: Remaining changes as discussed with David.
1055    
1056    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1057    
1058            * R/: Some changes as suggested by David. The rest will follow
1059            within the next days.
1060    
1061    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1062    
1063            * man/: Finished documentation.
1064    
1065    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1066    
1067            * man/: Wrote some documentation.
1068    
1069    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1070    
1071            * R/: Further syntactic sugar in form of additional assignment and
1072            accessor methods.
1073    
1074    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1075    
1076            * R/: Syntactic sugar in form of "length", "show" and "summary"
1077            operators.
1078    
1079    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1080    
1081            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1082            and dissimilarities.
1083    
1084    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1085    
1086            * R/: Added similarity functions.
1087    
1088            * data/: Added english stopwords.
1089    
1090    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1091    
1092            * data/: Examples compiled for new features
1093    
1094            * R/: Changes due to new structure.
1095    
1096            * NAMESPACE: Corrected namespace to reflect new structure.
1097    
1098            * R/termdocmatrix.R: Adapted for new naming scheme.
1099    
1100    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1101    
1102            * R/textdoccol.R: Adapted code for new class structure. Wrote
1103            several transform and filter functions operating on text document
1104            collections (alias text document databases).
1105    
1106            * R/aobjects.R: Adapted class structure with inheritance,
1107            repositories and additional meta data. Loading files on demand is
1108            now possible.
1109    
1110    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1111    
1112            * R/: Some cosmetic cleanups.
1113    
1114            * inst/: Removed vignette on clustering. That and much more is now
1115            described in the JSS paper on text mining. Based upon that
1116            article an elaborated vignette will be incorporated in the future.
1117    
1118    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1119    
1120            * R/: Updated generic S4 methods to comply with signature changes
1121            in newer versions of R (> 2.3)
1122    
1123    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1124    
1125            * ext/R/importRIS.R: Automatic RIS import is now possible.
1126    
1127    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1128    
1129            * R/textdoccol.R: Added RIS HTML input format.
1130    
1131    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1132    
1133            * R/textdoccol.R: Removed bug that caused invalid text document
1134            collections when handling many input files.
1135    
1136    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1137    
1138            * R/textdoccol.R: Restructured and extended file import
1139            mechanism.
1140    
1141            * inst/doc/clustering.Rnw: Adapted vignette for use with
1142            ReutNews.rda
1143    
1144            * man/ReutNews.Rd: Documentation for ReutNews.rda
1145    
1146            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1147    
1148    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1149    
1150            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1151            clustering facilities of this package.
1152    
1153    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1154    
1155            * R/aobjects.R: Changed package document structure to avoid class
1156            dependency problems.
1157    
1158  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1159    
1160            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1161            data set.
1162    
1163          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
1164          CMD check textmin" works without errors.          CMD check textmin" works without errors.
1165    

Legend:
Removed from v.28  
changed lines
  Added in v.1188

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge