SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1191, Wed Oct 3 17:31:39 2012 UTC
# Line 1  Line 1 
1    2012-10-03 Ingo Feinerer  <feinerer@logic.at>
2            * R.weight.R (weightTfIdf, weightSMART): Gracefully handle empty
3            columns and rows (avoids blow-up due to NaN values). Suggested by Jaap
4            Frölich.
5    
6    2012-07-27 Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/transform.R (removeWords): Allow longer stopword lists.
9    
10    2012-01-31  Ingo Feinerer  <feinerer@logic.at>
11    
12            * R/reader.R (readXML): Readers can now set the document language
13            themselves.
14    
15    2012-01-14  Ingo Feinerer  <feinerer@logic.at>
16    
17            * R/source.R (XMLSource, getElem.XMLSource): Simplifications as
18            proposed by Milan Bouchet-Valat.
19    
20    2012-01-11  Ingo Feinerer  <feinerer@logic.at>
21    
22            * R/matrix.R (termFreq): Fix processing of user provided
23            stopwords. Reported by Bettina Grün.
24    
25    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
26    
27            * R/matrix.R (termFreq): Fix invalid handling of
28            control$wordLengths[1]. Reported by Steven C. Bagley.
29    
30    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
31    
32            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
33    
34    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
35    
36            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
37    
38    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
39    
40            * R/transform.R (removePunctuation): Add option to preserve
41            intra-word dashes.
42    
43    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
44    
45            * R/matrix.R (termFreq): Allow reordering of control option
46            processing.
47    
48    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
49    
50            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
51            pdfinfo tool.
52    
53            * inst/stopwords/SMART.dat: Add SMART information retrieval system
54            stopwords (which are also used by the MC toolkit).
55    
56            * R/matrix (termFreq): Allow local option \code{bounds$local} to
57            restrict how often a term may appear in each document (generalizes
58            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
59            for word length bounds (generalizes \code{minWordLength}).
60    
61            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
62            \code{bounds$global} for restricting how often a term is allowed
63            to appear in different documents.
64    
65            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
66            local options delegated internally to termFreq() and global
67            options which are processed by the term-document matrix
68            constructor itself.
69    
70    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
71    
72            * man/getTokenizers.Rd: Document getTokenizers().
73    
74            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
75    
76    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
77    
78            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
79    
80            * man/combine.Rd: Document c.term_frequency().
81    
82    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
83    
84            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
85            can be accessed via '[' and not '[['.
86    
87    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
88    
89            * R/stopwords.R (stopwords): Raise an error if no stopwords are
90            available for requested language. Suggested by Derek M Jones.
91    
92    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
93    
94            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
95            normalization.
96    
97    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/transform.R (stemDocument.PlainTextDocument): Use language
100            argument.
101    
102    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/source.R: Store strings and connections instead of unevaluated
105            calls.
106    
107    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
110    
111    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
112    
113            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
114            (instead of a list element).
115    
116    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
117    
118            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
119            documents by names (fallback to IDs if names are not set).
120    
121    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
122    
123            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
124            \code{recursive} now determines whether existing corpus meta data
125            is used.
126    
127    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
128    
129            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
130    
131    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
134            remove terms not occurring in the corpus anymore.
135    
136    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
137    
138            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
139            and Heaps' law.
140    
141    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
142    
143            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
144            provided by a source.
145    
146    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
147    
148            * R/source.R (.Source): Provide document names.
149    
150    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
151    
152            * R/meta.R (`content_or_meta`): Utility function.
153    
154    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
155    
156            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
157            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
158    
159    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
160    
161            * R/weight.R (weightTfIdf): Added normalization option.
162    
163            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
164            analysis.
165    
166    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/score.R (tm_tag_score): Compute a score from the number of
169            tags matching in a document.
170    
171    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
172    
173            * R/complete.R (stemCompletion): New completion heuristics.
174    
175    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
178    
179    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
180    
181            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
182            setOldClass(c(..., "list")) works.
183    
184    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
185    
186            * R/transform.R (stemDocument.character): In case input is a
187            simple character just delegate to the default Snowball stemmer.
188    
189    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
190    
191            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
192            data.
193    
194    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
195    
196            * R/doc.R (`Content<-`): Be careful with names attribute.
197    
198    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
199    
200            * R/source.R (DirSource): Improved implementation especially when
201            handling many (> 1M) files.
202    
203    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
204    
205            * R/source.R (getElem.URISource): Use encoding argument.
206    
207    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
208    
209            * R/doc.R (setOldClass): Register S3 document classes to be
210            recognized by S4 methods.
211    
212    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
213    
214            * R/matrix.R (termFreq): Add option to remove punctuation
215            characters.
216    
217    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
218    
219            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
220            merging multiple term-document matrices.
221    
222    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
223    
224            * R/corpus.R (setOldClass): Register S3 corpus classes to be
225            recognized by S4 methods.
226    
227            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
228            that CRAN Mac OS X builds do not fail any longer.
229    
230    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
231    
232            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
233            of RWeka:AlphabeticTokenizer() as default.
234    
235    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
236    
237            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
238            caused words at the beginning or the end of a line not to be removed. Do
239            not delete whitespace anymore.
240    
241    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
242    
243            * R/source.R (DirSource): Default to working directory if no path
244            is specified.
245    
246    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
247    
248            * R/source.R (DirSource): Stop on empty directories.
249    
250    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
251    
252            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
253            named documents.
254    
255    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
256    
257            * R/transform.R (removeWords): Improve regular expressions.
258    
259    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
260    
261            * R/meta.R (DublinCore): Allow lower case tags.
262    
263    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
264    
265            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
266            instead of x$children.
267    
268    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
269    
270            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
271    
272    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
273    
274            * R/: Use S3 instead of S4 class system.
275    
276    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
277    
278            * R/reader.R (readMail): Moved to tm.plugin.mail package.
279    
280    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
281    
282            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
283            postings are basically e-mails with some extra headers.
284    
285    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
286    
287            * R/transform.R: Move convertMboxEml, removeCitation,
288            removeMultipart, and removeSignature to the tm.plugin.mail package
289            since they are mainly utility functions (for handling e-mails) and
290            not very framework specific.
291    
292    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
293    
294            * man/: Fix documentation.
295    
296    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
297    
298            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
299            plain text document instead of an XML document for texts of the
300            Reuters-21578 dataset.
301    
302            * R/sparse.R: Removed since the slam package is now available on
303            CRAN.
304    
305            * DESCRIPTION (Depends): Add slam package.
306    
307    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
308    
309            * R/transform.R (stemDoc): Fix character(0) handling.
310    
311    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
312    
313            * R/doc.R (show): Pretty print.
314    
315    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
316    
317            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
318            gracefully.
319    
320    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
321    
322            * R/corpus.R: Make corpus virtual. Implement corpus with standard
323            and permanent storage semantics.
324    
325            * DESCRIPTION: New major release. A *lot* of improvements.
326    
327    2009-05-04   Ingo Feinerer <feinerer@logic.at>
328    
329            * NAMESPACE: Export some simple_triplet_matrix functions.
330    
331    2009-04-28   Ingo Feinerer <feinerer@logic.at>
332    
333            * R/weight.R: Adapt tf-idf to new matrix format.
334    
335    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
336    
337            * R/matrix.R: Create two distinct classes for term-document and
338            document-term matrices.
339    
340    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
341    
342            * R/termdocmatrix.R: No longer use Matrix package. This reduces
343            package start-up time significantly.
344    
345    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
346    
347            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
348    
349    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
350    
351            * R/transform.R (tmReduce): Combine multiple maps into one
352            transformation.
353    
354    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
355    
356            * R/weight.R: Remove weightLogical since it does not return a
357            dgCMatrix.
358    
359            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
360            or TermDocumentMatrix instead.
361    
362    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
363    
364            * inst/doc/extensions.Rnw: Finished vignette.
365    
366    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
367    
368            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
369            DocumentTermMatrix representations.
370    
371    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
372    
373            * R/reader.R (readXML): New reader for arbitrary XML files.
374    
375    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
376    
377            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
378            (XMLSource): New XMLSource class for arbitrary XML files.
379            (Source): New slot Vectorized.
380    
381    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
382    
383            * R/reader.R (readTabular): Experimental reader for tabular data
384            structures which can be customized via user-defined mappings.
385    
386            * R/reader.R: Always use UTC time zone.
387    
388            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
389    
390    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
391    
392            * R/reader.R (readDOC): Options can be passed over to antiword.
393    
394            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
395            pdftotext.
396    
397    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
398    
399            * R/source.R (DirSource): Add pattern and ignore.case arguments
400            which are internally passed over to list.files().
401    
402    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
403    
404            * inst/doc/tm.Rnw: Suppress pointless loading message.
405    
406    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
407    
408            * DESCRIPTION: Speed up package loading (via moving packages not
409            strictly necessary for normal operation to Suggests instead of
410            Depends).
411    
412    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
413    
414            * R/reader.R (readNewsgroup): The date format is now configurable.
415    
416    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
417    
418            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
419    
420    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
421    
422            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
423    
424    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
425    
426            * R/source.R (DataframeSource): New source class for data frames.
427    
428            * R/source.R: Fixed non-standard call evaluation.
429    
430    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
431    
432            * R/source.R (URISource): New source class for a single document.
433    
434    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
435    
436            * R/source.R: Refactoring.
437    
438    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
439    
440            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
441            Rmpi installations more gracefully.
442    
443    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
444    
445            * R/source.R (Source): Add Length slot.
446    
447    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
448    
449            * R/AAA.R: Unify duplicated .onLoad function.
450    
451    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
452    
453            * DESCRIPTION (Suggests): Added Rmpi.
454    
455    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
456    
457            * R/source.R (getElem): Fix 'no visible binding' warning.
458    
459            * man/WeightFunction.Rd: Fix signature.
460    
461    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
462    
463            * R/weight.R: Introduce name abbreviations for weighting functions.
464    
465    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
466    
467            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
468    
469            * R/cluster.R: Provide convenience functions for using a MPI
470            cluster.
471    
472            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
473            available.
474    
475            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
476            available.
477    
478    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
479    
480            * R/textdoccol.R (lapply): Removed debug print out.
481    
482    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * R/reader.R (readRCV1): Improved meta data extraction from
485            Reuters Corpus Volume 1 documents.
486    
487    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
488    
489            * R/transform.R: Ensure that all mappings preserve multiline
490            structures.
491    
492    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * R/filter.R: Every filter has now an attribute indicating whether
495            it sould be applied to document level (doclevel).
496    
497            * R/textdoccol.R (tmFilter): Set searchFullText as new default
498            filter.
499    
500    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
501    
502            * R/transform.R (replacePatterns): Replaced removeWords by
503            replacePatterns. Suggested by Christian Buchta.
504    
505            * R/textdoccol.R (inspect): Improved formatting.
506    
507    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
508    
509            * inst/CITATION: Updated JSS article information.
510    
511            * R/textdoccol.R (setAs): Added coerce method from list to
512            corpus.
513    
514            * R/meta.R (meta): Improved meta data handling.
515    
516    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
519            Christian Buchta.
520    
521            * inst/CITATION: Added template to include JSS article reference.
522    
523    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
524    
525            * R/textdoccol.R (tmMap): Introduced lazy mapping.
526    
527            * R/source.R: Added VectorSource.
528    
529    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
530    
531            * man/: Language codes should be in ISO 639-1 format.
532    
533            * R/textdoccol.R (asPlain): Preserve local meta data.
534    
535    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
536    
537            * R/textdoccol.R (writeCorpus): Function for writing a corpus
538            containing plain text documents to disk.
539    
540    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
541    
542            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
543            always set correctly.
544    
545            * R/textdoccol.R: Set load = TRUE as default for load on demand
546            since in most cases this is the wanted behaviour.
547    
548    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
549    
550            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
551    
552            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
553    
554    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * R/meta.R (meta): New function for consistent access to meta data
557            of document collections, repositories, and texts.
558    
559    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
560    
561            * R/: Better support for encodings.
562    
563    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
564    
565            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
566            selection when no reader argument is given.
567    
568    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/source.R (CSVSource): Now uses read.csv instead of scan
571            internally.
572    
573    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * R/reader.R (getReaders): Returns available reader functions.
576    
577            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
578            as default.
579    
580    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
581    
582            * R/stopwords.R (stopwords): Shortened code, removed codetools
583            variable warnings.
584    
585            * man/: Documentation for showMeta, added an example for tmMap.
586    
587            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
588            some minor typos fixed.
589    
590    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
591    
592            * R/aobjects.R (showMeta): Added method for pretty printing a
593            text document's meta data.
594    
595    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
596    
597            * R/textdoccol.R (TextDocCol): Better handling of empty
598            arguments.
599    
600            * NAMESPACE: Exported readDOC.
601    
602            * man/completeStems.Rd: Added an example.
603    
604    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
605    
606            * R/stopwords.R (stopwords): Look up .dat files at every
607            call. Allows users to modify stopword .dat files interactively.
608    
609    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
610    
611            * R/termdocmatrix.R (termFreq): Correct processing of empty
612            documents.
613    
614    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
615    
616            * man/: Updated documentation.
617    
618    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
619    
620            * R/complete.R (completeStems): Completes (heuristically) word
621            stems.
622    
623            * R/termdocmatrix.R (TermDocMatrix2): New modular
624            constructor.
625    
626            * NAMESPACE: Exported termFreq.
627    
628    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
629    
630            * R/reader.R (readDOC): Added MS Word reader (using antiword).
631    
632    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
633    
634            * R/weight.R: Weighting functions for TermDocMatrix.
635    
636    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
637    
638            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
639            functions for accessing dimension, column, and row names.
640    
641            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
642    
643    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
644    
645            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
646    
647    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
648    
649            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
650    
651    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
652    
653            * R/reader.R (readPDF): Removed manual checks for pdftotext and
654            pdfinfo. The system call gives a warning anyway.
655    
656    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
657    
658            * R/textdoccol.R (asPlain): Conversion from
659            StructuredTextDocuments to PlainTextDocuments.
660    
661    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
662    
663            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
664            for accessing term-document matrices.
665    
666            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
667            are installed.
668    
669    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
670    
671            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
672            Christian Buchta.
673    
674    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
675    
676            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
677    
678    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
679    
680            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
681    
682            * R/reader.R (readPDF): Added PDF reader.
683    
684    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
687    
688            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
689    
690            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
691    
692            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
693    
694    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
695    
696            * R/distmeasure.R (dissimilarity): Replaced dists call from
697            package cba by new dist call from package proxy.
698    
699    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
700    
701            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
702    
703    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
704    
705            * R/termdocmatrix.R: require() uses the quietly option to suppress
706            loading messages.
707    
708    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
709    
710            * R/dictionary.R: Added dictionary support.
711    
712    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
713    
714            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
715            documents. This simplifies some functions, e.g., asPlain.
716    
717    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
718    
719            * inst/doc/tm.Rnw: Fixed some typos in vignette.
720    
721    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
722    
723            * R/textdoccol.R (replaceWords): Added method to replace a set of
724            words by a single word. Useful for synonyms.
725    
726    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
727    
728            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
729    
730    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
731    
732            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
733            vectors. Thanks to Ariel Maguyon for his error report.
734            (removeSparseTerms): New function to remove columns from a
735            term-document matrix exceeding a sparse factor.
736    
737    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
738    
739            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
740    
741    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
742    
743            * man/sFilter.Rd: Corrected documentation on statement format (use
744            '==' instead of '=').
745    
746    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
747    
748            * R/aobjects.R (StructuredTextDocument): Inherits from
749            TextDocument.
750    
751    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
752    
753            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
754            on sparse matrices as proposed by Martin Maechler.
755    
756    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
757    
758            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
759            \pkg{filehash} version makes them deprecated.
760    
761    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * R/termdocmatrix.R (textvector): Stemming is now performed before
764            erasing stopwords.
765            (weightMatrix): Adapted to handle sparse matrices.
766            (TermDocMatrix): Sparse matrix is now efficiently built by
767            direct stepwise insertion of row values into it.
768    
769    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
770    
771            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
772            due to ongoing problems. For our purposes the latter is as useful
773            as the replaced package.
774    
775    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
776    
777            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
778    
779            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
780    
781    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
782    
783            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
784            languages with available stopwords.
785    
786    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
787    
788            * inst/doc/tm.Rnw: Minor corrections in the vignette.
789    
790    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
791    
792            * DESCRIPTION: Update to version 0.2, since a lot of new features
793            have been integrated.
794    
795            * inst/stopwords: Updated existing stopwords and added stopwords
796            for various other languages.
797    
798    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
799    
800            * man/: Updated documentation.
801    
802            * Work/testDb.R: Script to test database stuff.
803    
804            * R/: Fixed various database related bugs. Seems to be rather
805            useable now, i.e., consider as alpha status for now.
806    
807    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
808    
809            * R/: Fixed some bugs related to database support.
810    
811    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
812    
813            * man/: Added a lot of examples to the manuals.
814    
815    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
816    
817            * man/: Updated parts of the documentation.
818    
819            * R/textdoccol.R (asPlain): Added conversion from newsgroup
820            documents to plain text documents.
821    
822    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
823    
824            * R/textdoccol.R: Finished experimental database support. Not yet
825            intensively tested.
826    
827            * R/source.R: Now each source has a default reader.
828    
829            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
830            class anymore.
831    
832            * R/plaintextdoc.R: Custom show method for plain text documents.
833    
834            * R/aobjects.R: Added a class for structured text documents.
835    
836            * R/reader.R: Replaced remaining \code{parser} occurrences with
837            \code{reader}.
838    
839            * R/textdoccol.R (summary): Indent tags.
840    
841            * R/textdoccol.R (removePunctuation): Transform method to remove
842            punctuation marks.
843    
844    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
845    
846            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
847            using prescindMeta().
848    
849    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
850    
851            * R/textdoccol.R: Improved database support.
852    
853    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
854    
855            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
856    
857            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
858            language code.
859    
860            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
861            into parserControl argument.
862    
863            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
864    
865    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
866    
867            * Work/tmDataSetup.R: The datasets acq and crude can now be
868            created on the fly.
869    
870            * R/stopwords.R: Introduced a function returning the stopwords for
871            a given language (English, German and French at the moment)
872    
873            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
874            otherwise falls back to Snowball package.
875    
876    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
877    
878            * man/dissimilarity-methods.Rd: Make clear that any method offered
879            by "dists" from package "cba" can be used.
880    
881    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
882    
883            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
884            to Kurt's latex suggestion. Removed points and underscores in
885            variable names for consistent naming.
886    
887            * DESCRIPTION: Update to version 0.1-2.
888    
889            * man/TextRepository.Rd: Fixed bug in documentation.
890    
891    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
892    
893            * DESCRIPTION: Update to version 0.1-1.
894    
895    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
896    
897            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
898            wordStem.
899    
900    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
901    
902            * R/: Changes due to Kurt's review.
903    
904    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
905    
906            * R/: Implemented improvements based upon comments by David
907            Meyer.
908    
909    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
910    
911            * inst/doc/: Rewrote vignette.
912    
913            * man/: Improved documentation.
914    
915    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
916    
917            * man/: Updated documentation.
918    
919            * DESCRIPTION: Changed package name to "tm". Updated version to
920            0.1 for first CRAN release.
921    
922            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
923            list archive example.
924    
925            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
926            archive example.
927    
928            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
929            from (several mails per box) mbox format to (single mail per file)
930            eml format.
931    
932    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
933    
934            * data/crude.rda: Rebuilt.
935    
936            * data/acq.rda: Rebuilt.
937    
938            * R/reader.R: Factored out reader and parser methods from
939            textdoccol.R.
940    
941            * R/source.R: Factored out Source methods from aobjects.R and
942            textdoccol.R.
943            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
944            feeds.
945    
946            * R/textdoccol.R (DirSource): Added support for recursive
947            traversal of directories.
948    
949    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
950    
951            * R/textdoccol.R ([[): Loads the document corpus automatically
952            into memory upon access.
953            (tm_transform, tm_filter): Removed several checks whether the
954            document is already loaded ([[ ensures this now).
955            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
956            mailing list archive.
957    
958    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
959    
960            * R/aobjects.R (TextDocument): Is now a virtual class.
961            (Source): Is now a virtual class.
962    
963    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
964    
965            * R/textdoccol.R (c): Support for an arbitrary number of document
966            collections.
967    
968    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
969    
970            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
971            append_meta and remove_meta.
972    
973            * R/textdoccol.R: Removed modify_metadata method.
974    
975            * R/textrepo.R: Removed modify_metadata method.
976    
977            * R/textdoccol.R (remove_meta): Supports removal of document
978            collection metadata and document (= in data frame) metadata.
979    
980    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
981    
982            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
983    
984            * data/crude.rda: Rebuilt.
985    
986            * data/acq.rda: Rebuilt.
987    
988            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
989    
990            * R/textdoccol.R ([): Bug fix for subsetting a document
991            collection's data frame.
992    
993    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
994    
995            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
996            to s_filter.
997    
998            * R/textdoccol.R: Local text documents' metadata can now be copied
999            to a document collection's data frame with prescind_meta.
1000    
1001    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1002    
1003            * R/: Text documents' slot metadata is now accessible in s_filter.
1004    
1005            * R/: Rewrote s_filter function (has still some restrictions).
1006    
1007    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1008    
1009            * R/: Various fixes in handling metadata.
1010    
1011            * R/: Added update mechanism for text document collections.
1012    
1013    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1014    
1015            * R/: Merging of document collections now creates a binary tree
1016            for reconstructing merged document collections.
1017    
1018            * R/: Redesign of metadata for document collections.
1019    
1020    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1021    
1022            * R/: Messages now use \code{ngettext}.
1023    
1024    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1025    
1026            * R/: Added functions for modifying and removing metadata.
1027    
1028    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1029    
1030            * man/: Updated some documentation.
1031    
1032            * R/: Corrected some connection issues.
1033    
1034            * inst/doc: Worked on the vignette.
1035    
1036    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1037    
1038            * inst/: Added texts and started vignette.
1039    
1040            * R/: Final changes based upon David's comments.
1041    
1042    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1043    
1044            * NAMESPACE: Corrected exports (generic methods need exportMethods
1045            directives!).
1046    
1047    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1048    
1049            * R/: Modified the TextDocCol constructur and various parsers. It
1050            is now modular and supports various file formats via plugins (see
1051            the new "Source" class).
1052    
1053    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1054    
1055            * man/: Revised documentation after previous code changes.
1056    
1057    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1058    
1059            * R/: Remaining changes as discussed with David.
1060    
1061    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1062    
1063            * R/: Some changes as suggested by David. The rest will follow
1064            within the next days.
1065    
1066    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1067    
1068            * man/: Finished documentation.
1069    
1070    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1071    
1072            * man/: Wrote some documentation.
1073    
1074    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1075    
1076            * R/: Further syntactic sugar in form of additional assignment and
1077            accessor methods.
1078    
1079    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1080    
1081            * R/: Syntactic sugar in form of "length", "show" and "summary"
1082            operators.
1083    
1084    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1085    
1086            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1087            and dissimilarities.
1088    
1089    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1090    
1091            * R/: Added similarity functions.
1092    
1093            * data/: Added english stopwords.
1094    
1095    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1096    
1097            * data/: Examples compiled for new features
1098    
1099            * R/: Changes due to new structure.
1100    
1101            * NAMESPACE: Corrected namespace to reflect new structure.
1102    
1103            * R/termdocmatrix.R: Adapted for new naming scheme.
1104    
1105    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1106    
1107            * R/textdoccol.R: Adapted code for new class structure. Wrote
1108            several transform and filter functions operating on text document
1109            collections (alias text document databases).
1110    
1111            * R/aobjects.R: Adapted class structure with inheritance,
1112            repositories and additional meta data. Loading files on demand is
1113            now possible.
1114    
1115    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1116    
1117            * R/: Some cosmetic cleanups.
1118    
1119            * inst/: Removed vignette on clustering. That and much more is now
1120            described in the JSS paper on text mining. Based upon that
1121            article an elaborated vignette will be incorporated in the future.
1122    
1123    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1124    
1125            * R/: Updated generic S4 methods to comply with signature changes
1126            in newer versions of R (> 2.3)
1127    
1128    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1129    
1130            * ext/R/importRIS.R: Automatic RIS import is now possible.
1131    
1132    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1133    
1134            * R/textdoccol.R: Added RIS HTML input format.
1135    
1136    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1137    
1138            * R/textdoccol.R: Removed bug that caused invalid text document
1139            collections when handling many input files.
1140    
1141    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1142    
1143            * R/textdoccol.R: Restructured and extended file import
1144            mechanism.
1145    
1146            * inst/doc/clustering.Rnw: Adapted vignette for use with
1147            ReutNews.rda
1148    
1149            * man/ReutNews.Rd: Documentation for ReutNews.rda
1150    
1151            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1152    
1153    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1154    
1155            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1156            clustering facilities of this package.
1157    
1158    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1159    
1160            * R/aobjects.R: Changed package document structure to avoid class
1161            dependency problems.
1162    
1163    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1164    
1165            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1166            data set.
1167    
1168            *  Finished documentation and reordered directory structure. Now "R
1169            CMD check textmin" works without errors.
1170    
1171    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1172    
1173            * src/: Various splits can now be easily created for the
1174            Reuters21578 data set.
1175    
1176    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1177    
1178            *  Updated documentation
1179    
1180    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1181    
1182            *  Wrote R documentation for some classes and methods.
1183    
1184    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1185    
1186            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
1187            files. See the questionnaire data/Umfrage.csv for such an example.
1188            We are now able to import files in Reuters-21578 XML format.
1189    
1190            *  Changed class interfaces in various files. Weighting of the text
1191            matrix is now possible.
1192    
1193    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1194    
1195            * R/textdoccol.R: One can build term-document matrices if
1196            nessecary (with buildTDM(...)) and fill the field tdm from a text
1197            document collection with it.
1198    
1199            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1200    
1201    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1202    
1203            * R/textdoccol.R: We now can read in a whole XML file with several
1204            news items.
1205    
1206  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1207    
1208          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1191

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge