SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC pkg/ChangeLog revision 1199, Mon Dec 10 14:37:54 2012 UTC
# Line 1  Line 1 
1    2012-12-10  Ingo Feinerer <feinerer@logic.at>
2    
3            * man/tm_reduce.Rd: Document right to left folding order. Adapt
4            example as well. Suggested by Mark Rosenstein.
5    
6    2012-12-04  Ingo Feinerer <feinerer@logic.at>
7    
8            * R/filter.R (sFilter): Avoid attach() and simplify.
9    
10    2012-11-02  Ingo Feinerer <feinerer@logic.at>
11    
12            * R/doc.R (.TextDocument): Use casts to ensure data types and to avoid
13            removal of attributes.
14    
15    2012-10-03 Ingo Feinerer  <feinerer@logic.at>
16    
17            * R/weight.R (weightTfIdf, weightSMART): Gracefully handle empty
18            columns and rows (avoids blow-up due to NaN values). Suggested by Jaap
19            Frölich.
20    
21    2012-07-27 Ingo Feinerer  <feinerer@logic.at>
22    
23            * R/transform.R (removeWords): Allow longer stopword lists.
24    
25    2012-01-31  Ingo Feinerer  <feinerer@logic.at>
26    
27            * R/reader.R (readXML): Readers can now set the document language
28            themselves.
29    
30    2012-01-14  Ingo Feinerer  <feinerer@logic.at>
31    
32            * R/source.R (XMLSource, getElem.XMLSource): Simplifications as
33            proposed by Milan Bouchet-Valat.
34    
35    2012-01-11  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/matrix.R (termFreq): Fix processing of user provided
38            stopwords. Reported by Bettina Grün.
39    
40    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/matrix.R (termFreq): Fix invalid handling of
43            control$wordLengths[1]. Reported by Steven C. Bagley.
44    
45    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
46    
47            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
48    
49    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
50    
51            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
52    
53    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
54    
55            * R/transform.R (removePunctuation): Add option to preserve
56            intra-word dashes.
57    
58    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
59    
60            * R/matrix.R (termFreq): Allow reordering of control option
61            processing.
62    
63    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
66            pdfinfo tool.
67    
68            * inst/stopwords/SMART.dat: Add SMART information retrieval system
69            stopwords (which are also used by the MC toolkit).
70    
71            * R/matrix (termFreq): Allow local option \code{bounds$local} to
72            restrict how often a term may appear in each document (generalizes
73            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
74            for word length bounds (generalizes \code{minWordLength}).
75    
76            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
77            \code{bounds$global} for restricting how often a term is allowed
78            to appear in different documents.
79    
80            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
81            local options delegated internally to termFreq() and global
82            options which are processed by the term-document matrix
83            constructor itself.
84    
85    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
86    
87            * man/getTokenizers.Rd: Document getTokenizers().
88    
89            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
90    
91    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
92    
93            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
94    
95            * man/combine.Rd: Document c.term_frequency().
96    
97    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
100            can be accessed via '[' and not '[['.
101    
102    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/stopwords.R (stopwords): Raise an error if no stopwords are
105            available for requested language. Suggested by Derek M Jones.
106    
107    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
110            normalization.
111    
112    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/transform.R (stemDocument.PlainTextDocument): Use language
115            argument.
116    
117    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/source.R: Store strings and connections instead of unevaluated
120            calls.
121    
122    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
125    
126    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
127    
128            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
129            (instead of a list element).
130    
131    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
134            documents by names (fallback to IDs if names are not set).
135    
136    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
137    
138            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
139            \code{recursive} now determines whether existing corpus meta data
140            is used.
141    
142    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
145    
146    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
147    
148            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
149            remove terms not occurring in the corpus anymore.
150    
151    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
152    
153            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
154            and Heaps' law.
155    
156    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
157    
158            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
159            provided by a source.
160    
161    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
162    
163            * R/source.R (.Source): Provide document names.
164    
165    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
166    
167            * R/meta.R (`content_or_meta`): Utility function.
168    
169    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
170    
171            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
172            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
173    
174    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
175    
176            * R/weight.R (weightTfIdf): Added normalization option.
177    
178            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
179            analysis.
180    
181    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
182    
183            * R/score.R (tm_tag_score): Compute a score from the number of
184            tags matching in a document.
185    
186    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
187    
188            * R/complete.R (stemCompletion): New completion heuristics.
189    
190    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
191    
192            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
193    
194    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
195    
196            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
197            setOldClass(c(..., "list")) works.
198    
199    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
200    
201            * R/transform.R (stemDocument.character): In case input is a
202            simple character just delegate to the default Snowball stemmer.
203    
204    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
205    
206            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
207            data.
208    
209    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
210    
211            * R/doc.R (`Content<-`): Be careful with names attribute.
212    
213    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
214    
215            * R/source.R (DirSource): Improved implementation especially when
216            handling many (> 1M) files.
217    
218    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
219    
220            * R/source.R (getElem.URISource): Use encoding argument.
221    
222    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
223    
224            * R/doc.R (setOldClass): Register S3 document classes to be
225            recognized by S4 methods.
226    
227    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
228    
229            * R/matrix.R (termFreq): Add option to remove punctuation
230            characters.
231    
232    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
233    
234            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
235            merging multiple term-document matrices.
236    
237    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
238    
239            * R/corpus.R (setOldClass): Register S3 corpus classes to be
240            recognized by S4 methods.
241    
242            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
243            that CRAN Mac OS X builds do not fail any longer.
244    
245    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
246    
247            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
248            of RWeka:AlphabeticTokenizer() as default.
249    
250    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
251    
252            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
253            caused words at the beginning or the end of a line not to be removed. Do
254            not delete whitespace anymore.
255    
256    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
257    
258            * R/source.R (DirSource): Default to working directory if no path
259            is specified.
260    
261    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
262    
263            * R/source.R (DirSource): Stop on empty directories.
264    
265    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
266    
267            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
268            named documents.
269    
270    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
271    
272            * R/transform.R (removeWords): Improve regular expressions.
273    
274    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
275    
276            * R/meta.R (DublinCore): Allow lower case tags.
277    
278    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
279    
280            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
281            instead of x$children.
282    
283    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
284    
285            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
286    
287    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
288    
289            * R/: Use S3 instead of S4 class system.
290    
291    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
292    
293            * R/reader.R (readMail): Moved to tm.plugin.mail package.
294    
295    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
296    
297            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
298            postings are basically e-mails with some extra headers.
299    
300    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
301    
302            * R/transform.R: Move convertMboxEml, removeCitation,
303            removeMultipart, and removeSignature to the tm.plugin.mail package
304            since they are mainly utility functions (for handling e-mails) and
305            not very framework specific.
306    
307    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
308    
309            * man/: Fix documentation.
310    
311    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
312    
313            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
314            plain text document instead of an XML document for texts of the
315            Reuters-21578 dataset.
316    
317            * R/sparse.R: Removed since the slam package is now available on
318            CRAN.
319    
320            * DESCRIPTION (Depends): Add slam package.
321    
322    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
323    
324            * R/transform.R (stemDoc): Fix character(0) handling.
325    
326    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
327    
328            * R/doc.R (show): Pretty print.
329    
330    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
331    
332            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
333            gracefully.
334    
335    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
336    
337            * R/corpus.R: Make corpus virtual. Implement corpus with standard
338            and permanent storage semantics.
339    
340            * DESCRIPTION: New major release. A *lot* of improvements.
341    
342    2009-05-04   Ingo Feinerer <feinerer@logic.at>
343    
344            * NAMESPACE: Export some simple_triplet_matrix functions.
345    
346    2009-04-28   Ingo Feinerer <feinerer@logic.at>
347    
348            * R/weight.R: Adapt tf-idf to new matrix format.
349    
350    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
351    
352            * R/matrix.R: Create two distinct classes for term-document and
353            document-term matrices.
354    
355    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
356    
357            * R/termdocmatrix.R: No longer use Matrix package. This reduces
358            package start-up time significantly.
359    
360    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
361    
362            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
363    
364    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
365    
366            * R/transform.R (tmReduce): Combine multiple maps into one
367            transformation.
368    
369    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
370    
371            * R/weight.R: Remove weightLogical since it does not return a
372            dgCMatrix.
373    
374            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
375            or TermDocumentMatrix instead.
376    
377    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
378    
379            * inst/doc/extensions.Rnw: Finished vignette.
380    
381    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
382    
383            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
384            DocumentTermMatrix representations.
385    
386    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
387    
388            * R/reader.R (readXML): New reader for arbitrary XML files.
389    
390    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
391    
392            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
393            (XMLSource): New XMLSource class for arbitrary XML files.
394            (Source): New slot Vectorized.
395    
396    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
397    
398            * R/reader.R (readTabular): Experimental reader for tabular data
399            structures which can be customized via user-defined mappings.
400    
401            * R/reader.R: Always use UTC time zone.
402    
403            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
404    
405    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
406    
407            * R/reader.R (readDOC): Options can be passed over to antiword.
408    
409            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
410            pdftotext.
411    
412    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
413    
414            * R/source.R (DirSource): Add pattern and ignore.case arguments
415            which are internally passed over to list.files().
416    
417    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
418    
419            * inst/doc/tm.Rnw: Suppress pointless loading message.
420    
421    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
422    
423            * DESCRIPTION: Speed up package loading (via moving packages not
424            strictly necessary for normal operation to Suggests instead of
425            Depends).
426    
427    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
428    
429            * R/reader.R (readNewsgroup): The date format is now configurable.
430    
431    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
432    
433            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
434    
435    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
436    
437            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
438    
439    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
440    
441            * R/source.R (DataframeSource): New source class for data frames.
442    
443            * R/source.R: Fixed non-standard call evaluation.
444    
445    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
446    
447            * R/source.R (URISource): New source class for a single document.
448    
449    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
450    
451            * R/source.R: Refactoring.
452    
453    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
454    
455            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
456            Rmpi installations more gracefully.
457    
458    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
459    
460            * R/source.R (Source): Add Length slot.
461    
462    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
463    
464            * R/AAA.R: Unify duplicated .onLoad function.
465    
466    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
467    
468            * DESCRIPTION (Suggests): Added Rmpi.
469    
470    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
471    
472            * R/source.R (getElem): Fix 'no visible binding' warning.
473    
474            * man/WeightFunction.Rd: Fix signature.
475    
476    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
477    
478            * R/weight.R: Introduce name abbreviations for weighting functions.
479    
480    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
481    
482            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
483    
484            * R/cluster.R: Provide convenience functions for using a MPI
485            cluster.
486    
487            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
488            available.
489    
490            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
491            available.
492    
493    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
494    
495            * R/textdoccol.R (lapply): Removed debug print out.
496    
497    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
498    
499            * R/reader.R (readRCV1): Improved meta data extraction from
500            Reuters Corpus Volume 1 documents.
501    
502    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
503    
504            * R/transform.R: Ensure that all mappings preserve multiline
505            structures.
506    
507    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
508    
509            * R/filter.R: Every filter has now an attribute indicating whether
510            it sould be applied to document level (doclevel).
511    
512            * R/textdoccol.R (tmFilter): Set searchFullText as new default
513            filter.
514    
515    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
516    
517            * R/transform.R (replacePatterns): Replaced removeWords by
518            replacePatterns. Suggested by Christian Buchta.
519    
520            * R/textdoccol.R (inspect): Improved formatting.
521    
522    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
523    
524            * inst/CITATION: Updated JSS article information.
525    
526            * R/textdoccol.R (setAs): Added coerce method from list to
527            corpus.
528    
529            * R/meta.R (meta): Improved meta data handling.
530    
531    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
532    
533            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
534            Christian Buchta.
535    
536            * inst/CITATION: Added template to include JSS article reference.
537    
538    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
539    
540            * R/textdoccol.R (tmMap): Introduced lazy mapping.
541    
542            * R/source.R: Added VectorSource.
543    
544    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
545    
546            * man/: Language codes should be in ISO 639-1 format.
547    
548            * R/textdoccol.R (asPlain): Preserve local meta data.
549    
550    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
551    
552            * R/textdoccol.R (writeCorpus): Function for writing a corpus
553            containing plain text documents to disk.
554    
555    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
556    
557            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
558            always set correctly.
559    
560            * R/textdoccol.R: Set load = TRUE as default for load on demand
561            since in most cases this is the wanted behaviour.
562    
563    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
564    
565            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
566    
567            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
568    
569    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
570    
571            * R/meta.R (meta): New function for consistent access to meta data
572            of document collections, repositories, and texts.
573    
574    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
575    
576            * R/: Better support for encodings.
577    
578    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
579    
580            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
581            selection when no reader argument is given.
582    
583    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
584    
585            * R/source.R (CSVSource): Now uses read.csv instead of scan
586            internally.
587    
588    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
589    
590            * R/reader.R (getReaders): Returns available reader functions.
591    
592            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
593            as default.
594    
595    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
596    
597            * R/stopwords.R (stopwords): Shortened code, removed codetools
598            variable warnings.
599    
600            * man/: Documentation for showMeta, added an example for tmMap.
601    
602            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
603            some minor typos fixed.
604    
605    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
606    
607            * R/aobjects.R (showMeta): Added method for pretty printing a
608            text document's meta data.
609    
610    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
611    
612            * R/textdoccol.R (TextDocCol): Better handling of empty
613            arguments.
614    
615            * NAMESPACE: Exported readDOC.
616    
617            * man/completeStems.Rd: Added an example.
618    
619    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
620    
621            * R/stopwords.R (stopwords): Look up .dat files at every
622            call. Allows users to modify stopword .dat files interactively.
623    
624    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
625    
626            * R/termdocmatrix.R (termFreq): Correct processing of empty
627            documents.
628    
629    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
630    
631            * man/: Updated documentation.
632    
633    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
634    
635            * R/complete.R (completeStems): Completes (heuristically) word
636            stems.
637    
638            * R/termdocmatrix.R (TermDocMatrix2): New modular
639            constructor.
640    
641            * NAMESPACE: Exported termFreq.
642    
643    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
644    
645            * R/reader.R (readDOC): Added MS Word reader (using antiword).
646    
647    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
648    
649            * R/weight.R: Weighting functions for TermDocMatrix.
650    
651    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
652    
653            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
654            functions for accessing dimension, column, and row names.
655    
656            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
657    
658    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
659    
660            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
661    
662    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
663    
664            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
665    
666    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
667    
668            * R/reader.R (readPDF): Removed manual checks for pdftotext and
669            pdfinfo. The system call gives a warning anyway.
670    
671    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
672    
673            * R/textdoccol.R (asPlain): Conversion from
674            StructuredTextDocuments to PlainTextDocuments.
675    
676    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
677    
678            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
679            for accessing term-document matrices.
680    
681            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
682            are installed.
683    
684    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
687            Christian Buchta.
688    
689    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
690    
691            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
692    
693    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
694    
695            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
696    
697            * R/reader.R (readPDF): Added PDF reader.
698    
699    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
700    
701            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
702    
703            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
704    
705            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
706    
707            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
708    
709    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
710    
711            * R/distmeasure.R (dissimilarity): Replaced dists call from
712            package cba by new dist call from package proxy.
713    
714    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
715    
716            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
717    
718    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
719    
720            * R/termdocmatrix.R: require() uses the quietly option to suppress
721            loading messages.
722    
723    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
724    
725            * R/dictionary.R: Added dictionary support.
726    
727    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
728    
729            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
730            documents. This simplifies some functions, e.g., asPlain.
731    
732    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
733    
734            * inst/doc/tm.Rnw: Fixed some typos in vignette.
735    
736    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
737    
738            * R/textdoccol.R (replaceWords): Added method to replace a set of
739            words by a single word. Useful for synonyms.
740    
741    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
742    
743            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
744    
745    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
746    
747            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
748            vectors. Thanks to Ariel Maguyon for his error report.
749            (removeSparseTerms): New function to remove columns from a
750            term-document matrix exceeding a sparse factor.
751    
752    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
753    
754            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
755    
756    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
757    
758            * man/sFilter.Rd: Corrected documentation on statement format (use
759            '==' instead of '=').
760    
761    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * R/aobjects.R (StructuredTextDocument): Inherits from
764            TextDocument.
765    
766    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
767    
768            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
769            on sparse matrices as proposed by Martin Maechler.
770    
771    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
772    
773            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
774            \pkg{filehash} version makes them deprecated.
775    
776    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
777    
778            * R/termdocmatrix.R (textvector): Stemming is now performed before
779            erasing stopwords.
780            (weightMatrix): Adapted to handle sparse matrices.
781            (TermDocMatrix): Sparse matrix is now efficiently built by
782            direct stepwise insertion of row values into it.
783    
784    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
785    
786            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
787            due to ongoing problems. For our purposes the latter is as useful
788            as the replaced package.
789    
790    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
791    
792            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
793    
794            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
795    
796    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
797    
798            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
799            languages with available stopwords.
800    
801    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
802    
803            * inst/doc/tm.Rnw: Minor corrections in the vignette.
804    
805    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
806    
807            * DESCRIPTION: Update to version 0.2, since a lot of new features
808            have been integrated.
809    
810            * inst/stopwords: Updated existing stopwords and added stopwords
811            for various other languages.
812    
813    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
814    
815            * man/: Updated documentation.
816    
817            * Work/testDb.R: Script to test database stuff.
818    
819            * R/: Fixed various database related bugs. Seems to be rather
820            useable now, i.e., consider as alpha status for now.
821    
822    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
823    
824            * R/: Fixed some bugs related to database support.
825    
826    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
827    
828            * man/: Added a lot of examples to the manuals.
829    
830    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
831    
832            * man/: Updated parts of the documentation.
833    
834            * R/textdoccol.R (asPlain): Added conversion from newsgroup
835            documents to plain text documents.
836    
837    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
838    
839            * R/textdoccol.R: Finished experimental database support. Not yet
840            intensively tested.
841    
842            * R/source.R: Now each source has a default reader.
843    
844            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
845            class anymore.
846    
847            * R/plaintextdoc.R: Custom show method for plain text documents.
848    
849            * R/aobjects.R: Added a class for structured text documents.
850    
851            * R/reader.R: Replaced remaining \code{parser} occurrences with
852            \code{reader}.
853    
854            * R/textdoccol.R (summary): Indent tags.
855    
856            * R/textdoccol.R (removePunctuation): Transform method to remove
857            punctuation marks.
858    
859    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
860    
861            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
862            using prescindMeta().
863    
864    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
865    
866            * R/textdoccol.R: Improved database support.
867    
868    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
869    
870            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
871    
872            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
873            language code.
874    
875            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
876            into parserControl argument.
877    
878            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
879    
880    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
881    
882            * Work/tmDataSetup.R: The datasets acq and crude can now be
883            created on the fly.
884    
885            * R/stopwords.R: Introduced a function returning the stopwords for
886            a given language (English, German and French at the moment)
887    
888            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
889            otherwise falls back to Snowball package.
890    
891    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
892    
893            * man/dissimilarity-methods.Rd: Make clear that any method offered
894            by "dists" from package "cba" can be used.
895    
896    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
897    
898            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
899            to Kurt's latex suggestion. Removed points and underscores in
900            variable names for consistent naming.
901    
902            * DESCRIPTION: Update to version 0.1-2.
903    
904            * man/TextRepository.Rd: Fixed bug in documentation.
905    
906    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
907    
908            * DESCRIPTION: Update to version 0.1-1.
909    
910    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
911    
912            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
913            wordStem.
914    
915    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
916    
917            * R/: Changes due to Kurt's review.
918    
919    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
920    
921            * R/: Implemented improvements based upon comments by David
922            Meyer.
923    
924    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
925    
926            * inst/doc/: Rewrote vignette.
927    
928            * man/: Improved documentation.
929    
930    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
931    
932            * man/: Updated documentation.
933    
934            * DESCRIPTION: Changed package name to "tm". Updated version to
935            0.1 for first CRAN release.
936    
937            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
938            list archive example.
939    
940            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
941            archive example.
942    
943            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
944            from (several mails per box) mbox format to (single mail per file)
945            eml format.
946    
947    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
948    
949            * data/crude.rda: Rebuilt.
950    
951            * data/acq.rda: Rebuilt.
952    
953            * R/reader.R: Factored out reader and parser methods from
954            textdoccol.R.
955    
956            * R/source.R: Factored out Source methods from aobjects.R and
957            textdoccol.R.
958            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
959            feeds.
960    
961            * R/textdoccol.R (DirSource): Added support for recursive
962            traversal of directories.
963    
964    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
965    
966            * R/textdoccol.R ([[): Loads the document corpus automatically
967            into memory upon access.
968            (tm_transform, tm_filter): Removed several checks whether the
969            document is already loaded ([[ ensures this now).
970            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
971            mailing list archive.
972    
973    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
974    
975            * R/aobjects.R (TextDocument): Is now a virtual class.
976            (Source): Is now a virtual class.
977    
978    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
979    
980            * R/textdoccol.R (c): Support for an arbitrary number of document
981            collections.
982    
983    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
986            append_meta and remove_meta.
987    
988            * R/textdoccol.R: Removed modify_metadata method.
989    
990            * R/textrepo.R: Removed modify_metadata method.
991    
992            * R/textdoccol.R (remove_meta): Supports removal of document
993            collection metadata and document (= in data frame) metadata.
994    
995    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
996    
997            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
998    
999            * data/crude.rda: Rebuilt.
1000    
1001            * data/acq.rda: Rebuilt.
1002    
1003            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
1004    
1005            * R/textdoccol.R ([): Bug fix for subsetting a document
1006            collection's data frame.
1007    
1008    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1009    
1010            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
1011            to s_filter.
1012    
1013            * R/textdoccol.R: Local text documents' metadata can now be copied
1014            to a document collection's data frame with prescind_meta.
1015    
1016    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1017    
1018            * R/: Text documents' slot metadata is now accessible in s_filter.
1019    
1020            * R/: Rewrote s_filter function (has still some restrictions).
1021    
1022    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1023    
1024            * R/: Various fixes in handling metadata.
1025    
1026            * R/: Added update mechanism for text document collections.
1027    
1028    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1029    
1030            * R/: Merging of document collections now creates a binary tree
1031            for reconstructing merged document collections.
1032    
1033            * R/: Redesign of metadata for document collections.
1034    
1035    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1036    
1037            * R/: Messages now use \code{ngettext}.
1038    
1039    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1040    
1041            * R/: Added functions for modifying and removing metadata.
1042    
1043    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1044    
1045            * man/: Updated some documentation.
1046    
1047            * R/: Corrected some connection issues.
1048    
1049            * inst/doc: Worked on the vignette.
1050    
1051    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1052    
1053            * inst/: Added texts and started vignette.
1054    
1055            * R/: Final changes based upon David's comments.
1056    
1057    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1058    
1059            * NAMESPACE: Corrected exports (generic methods need exportMethods
1060            directives!).
1061    
1062    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1063    
1064            * R/: Modified the TextDocCol constructur and various parsers. It
1065            is now modular and supports various file formats via plugins (see
1066            the new "Source" class).
1067    
1068    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1069    
1070            * man/: Revised documentation after previous code changes.
1071    
1072    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1073    
1074            * R/: Remaining changes as discussed with David.
1075    
1076    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1077    
1078            * R/: Some changes as suggested by David. The rest will follow
1079            within the next days.
1080    
1081    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1082    
1083            * man/: Finished documentation.
1084    
1085    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1086    
1087            * man/: Wrote some documentation.
1088    
1089    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1090    
1091            * R/: Further syntactic sugar in form of additional assignment and
1092            accessor methods.
1093    
1094    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1095    
1096            * R/: Syntactic sugar in form of "length", "show" and "summary"
1097            operators.
1098    
1099    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1100    
1101            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1102            and dissimilarities.
1103    
1104    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1105    
1106            * R/: Added similarity functions.
1107    
1108            * data/: Added english stopwords.
1109    
1110    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1111    
1112            * data/: Examples compiled for new features
1113    
1114            * R/: Changes due to new structure.
1115    
1116            * NAMESPACE: Corrected namespace to reflect new structure.
1117    
1118            * R/termdocmatrix.R: Adapted for new naming scheme.
1119    
1120    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1121    
1122            * R/textdoccol.R: Adapted code for new class structure. Wrote
1123            several transform and filter functions operating on text document
1124            collections (alias text document databases).
1125    
1126            * R/aobjects.R: Adapted class structure with inheritance,
1127            repositories and additional meta data. Loading files on demand is
1128            now possible.
1129    
1130    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1131    
1132            * R/: Some cosmetic cleanups.
1133    
1134            * inst/: Removed vignette on clustering. That and much more is now
1135            described in the JSS paper on text mining. Based upon that
1136            article an elaborated vignette will be incorporated in the future.
1137    
1138    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1139    
1140            * R/: Updated generic S4 methods to comply with signature changes
1141            in newer versions of R (> 2.3)
1142    
1143    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1144    
1145            * ext/R/importRIS.R: Automatic RIS import is now possible.
1146    
1147    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1148    
1149            * R/textdoccol.R: Added RIS HTML input format.
1150    
1151    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1152    
1153            * R/textdoccol.R: Removed bug that caused invalid text document
1154            collections when handling many input files.
1155    
1156    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1157    
1158            * R/textdoccol.R: Restructured and extended file import
1159            mechanism.
1160    
1161            * inst/doc/clustering.Rnw: Adapted vignette for use with
1162            ReutNews.rda
1163    
1164            * man/ReutNews.Rd: Documentation for ReutNews.rda
1165    
1166            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1167    
1168  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1169    
1170          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.1199

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge