SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC pkg/ChangeLog revision 1167, Fri Dec 23 09:44:33 2011 UTC
# Line 1  Line 1 
1    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/matrix.R (termFreq): Fix invalid handling of
4            control$wordLengths[1]. Reported by Steven C. Bagley.
5    
6    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
7    
8            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
9    
10    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
11    
12            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
13    
14    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
15    
16            * R/transform.R (removePunctuation): Add option to preserve
17            intra-word dashes.
18    
19    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
20    
21            * R/matrix.R (termFreq): Allow reordering of control option
22            processing.
23    
24    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
25    
26            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
27            pdfinfo tool.
28    
29            * inst/stopwords/SMART.dat: Add SMART information retrieval system
30            stopwords (which are also used by the MC toolkit).
31    
32            * R/matrix (termFreq): Allow local option \code{bounds$local} to
33            restrict how often a term may appear in each document (generalizes
34            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
35            for word length bounds (generalizes \code{minWordLength}).
36    
37            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
38            \code{bounds$global} for restricting how often a term is allowed
39            to appear in different documents.
40    
41            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
42            local options delegated internally to termFreq() and global
43            options which are processed by the term-document matrix
44            constructor itself.
45    
46    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
47    
48            * man/getTokenizers.Rd: Document getTokenizers().
49    
50            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
51    
52    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
53    
54            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
55    
56            * man/combine.Rd: Document c.term_frequency().
57    
58    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
59    
60            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
61            can be accessed via '[' and not '[['.
62    
63    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/stopwords.R (stopwords): Raise an error if no stopwords are
66            available for requested language. Suggested by Derek M Jones.
67    
68    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
69    
70            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
71            normalization.
72    
73    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
74    
75            * R/transform.R (stemDocument.PlainTextDocument): Use language
76            argument.
77    
78    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
79    
80            * R/source.R: Store strings and connections instead of unevaluated
81            calls.
82    
83    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
84    
85            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
86    
87    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
88    
89            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
90            (instead of a list element).
91    
92    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
93    
94            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
95            documents by names (fallback to IDs if names are not set).
96    
97    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
100            \code{recursive} now determines whether existing corpus meta data
101            is used.
102    
103    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
104    
105            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
106    
107    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
110            remove terms not occurring in the corpus anymore.
111    
112    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
115            and Heaps' law.
116    
117    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
120            provided by a source.
121    
122    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/source.R (.Source): Provide document names.
125    
126    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
127    
128            * R/meta.R (`content_or_meta`): Utility function.
129    
130    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
131    
132            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
133            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
134    
135    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
136    
137            * R/weight.R (weightTfIdf): Added normalization option.
138    
139            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
140            analysis.
141    
142    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/score.R (tm_tag_score): Compute a score from the number of
145            tags matching in a document.
146    
147    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
148    
149            * R/complete.R (stemCompletion): New completion heuristics.
150    
151    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
152    
153            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
154    
155    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
156    
157            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
158            setOldClass(c(..., "list")) works.
159    
160    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
161    
162            * R/transform.R (stemDocument.character): In case input is a
163            simple character just delegate to the default Snowball stemmer.
164    
165    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
166    
167            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
168            data.
169    
170    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/doc.R (`Content<-`): Be careful with names attribute.
173    
174    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
175    
176            * R/source.R (DirSource): Improved implementation especially when
177            handling many (> 1M) files.
178    
179    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
180    
181            * R/source.R (getElem.URISource): Use encoding argument.
182    
183    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
184    
185            * R/doc.R (setOldClass): Register S3 document classes to be
186            recognized by S4 methods.
187    
188    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
189    
190            * R/matrix.R (termFreq): Add option to remove punctuation
191            characters.
192    
193    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
194    
195            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
196            merging multiple term-document matrices.
197    
198    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
199    
200            * R/corpus.R (setOldClass): Register S3 corpus classes to be
201            recognized by S4 methods.
202    
203            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
204            that CRAN Mac OS X builds do not fail any longer.
205    
206    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
207    
208            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
209            of RWeka:AlphabeticTokenizer() as default.
210    
211    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
212    
213            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
214            caused words at the beginning or the end of a line not to be removed. Do
215            not delete whitespace anymore.
216    
217    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
218    
219            * R/source.R (DirSource): Default to working directory if no path
220            is specified.
221    
222    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
223    
224            * R/source.R (DirSource): Stop on empty directories.
225    
226    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
227    
228            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
229            named documents.
230    
231    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
232    
233            * R/transform.R (removeWords): Improve regular expressions.
234    
235    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
236    
237            * R/meta.R (DublinCore): Allow lower case tags.
238    
239    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
240    
241            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
242            instead of x$children.
243    
244    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
245    
246            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
247    
248    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
249    
250            * R/: Use S3 instead of S4 class system.
251    
252    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
253    
254            * R/reader.R (readMail): Moved to tm.plugin.mail package.
255    
256    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
257    
258            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
259            postings are basically e-mails with some extra headers.
260    
261    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
262    
263            * R/transform.R: Move convertMboxEml, removeCitation,
264            removeMultipart, and removeSignature to the tm.plugin.mail package
265            since they are mainly utility functions (for handling e-mails) and
266            not very framework specific.
267    
268    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
269    
270            * man/: Fix documentation.
271    
272    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
273    
274            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
275            plain text document instead of an XML document for texts of the
276            Reuters-21578 dataset.
277    
278            * R/sparse.R: Removed since the slam package is now available on
279            CRAN.
280    
281            * DESCRIPTION (Depends): Add slam package.
282    
283    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
284    
285            * R/transform.R (stemDoc): Fix character(0) handling.
286    
287    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
288    
289            * R/doc.R (show): Pretty print.
290    
291    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
292    
293            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
294            gracefully.
295    
296    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
297    
298            * R/corpus.R: Make corpus virtual. Implement corpus with standard
299            and permanent storage semantics.
300    
301            * DESCRIPTION: New major release. A *lot* of improvements.
302    
303    2009-05-04   Ingo Feinerer <feinerer@logic.at>
304    
305            * NAMESPACE: Export some simple_triplet_matrix functions.
306    
307    2009-04-28   Ingo Feinerer <feinerer@logic.at>
308    
309            * R/weight.R: Adapt tf-idf to new matrix format.
310    
311    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
312    
313            * R/matrix.R: Create two distinct classes for term-document and
314            document-term matrices.
315    
316    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
317    
318            * R/termdocmatrix.R: No longer use Matrix package. This reduces
319            package start-up time significantly.
320    
321    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
322    
323            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
324    
325    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
326    
327            * R/transform.R (tmReduce): Combine multiple maps into one
328            transformation.
329    
330    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
331    
332            * R/weight.R: Remove weightLogical since it does not return a
333            dgCMatrix.
334    
335            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
336            or TermDocumentMatrix instead.
337    
338    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
339    
340            * inst/doc/extensions.Rnw: Finished vignette.
341    
342    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
343    
344            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
345            DocumentTermMatrix representations.
346    
347    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
348    
349            * R/reader.R (readXML): New reader for arbitrary XML files.
350    
351    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
352    
353            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
354            (XMLSource): New XMLSource class for arbitrary XML files.
355            (Source): New slot Vectorized.
356    
357    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
358    
359            * R/reader.R (readTabular): Experimental reader for tabular data
360            structures which can be customized via user-defined mappings.
361    
362            * R/reader.R: Always use UTC time zone.
363    
364            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
365    
366    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
367    
368            * R/reader.R (readDOC): Options can be passed over to antiword.
369    
370            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
371            pdftotext.
372    
373    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
374    
375            * R/source.R (DirSource): Add pattern and ignore.case arguments
376            which are internally passed over to list.files().
377    
378    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
379    
380            * inst/doc/tm.Rnw: Suppress pointless loading message.
381    
382    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
383    
384            * DESCRIPTION: Speed up package loading (via moving packages not
385            strictly necessary for normal operation to Suggests instead of
386            Depends).
387    
388    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
389    
390            * R/reader.R (readNewsgroup): The date format is now configurable.
391    
392    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
393    
394            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
395    
396    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
397    
398            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
399    
400    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
401    
402            * R/source.R (DataframeSource): New source class for data frames.
403    
404            * R/source.R: Fixed non-standard call evaluation.
405    
406    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
407    
408            * R/source.R (URISource): New source class for a single document.
409    
410    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
411    
412            * R/source.R: Refactoring.
413    
414    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
415    
416            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
417            Rmpi installations more gracefully.
418    
419    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
420    
421            * R/source.R (Source): Add Length slot.
422    
423    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
424    
425            * R/AAA.R: Unify duplicated .onLoad function.
426    
427    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
428    
429            * DESCRIPTION (Suggests): Added Rmpi.
430    
431    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
432    
433            * R/source.R (getElem): Fix 'no visible binding' warning.
434    
435            * man/WeightFunction.Rd: Fix signature.
436    
437    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
438    
439            * R/weight.R: Introduce name abbreviations for weighting functions.
440    
441    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
442    
443            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
444    
445            * R/cluster.R: Provide convenience functions for using a MPI
446            cluster.
447    
448            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
449            available.
450    
451            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
452            available.
453    
454    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
455    
456            * R/textdoccol.R (lapply): Removed debug print out.
457    
458    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/reader.R (readRCV1): Improved meta data extraction from
461            Reuters Corpus Volume 1 documents.
462    
463    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/transform.R: Ensure that all mappings preserve multiline
466            structures.
467    
468    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/filter.R: Every filter has now an attribute indicating whether
471            it sould be applied to document level (doclevel).
472    
473            * R/textdoccol.R (tmFilter): Set searchFullText as new default
474            filter.
475    
476    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
477    
478            * R/transform.R (replacePatterns): Replaced removeWords by
479            replacePatterns. Suggested by Christian Buchta.
480    
481            * R/textdoccol.R (inspect): Improved formatting.
482    
483    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
484    
485            * inst/CITATION: Updated JSS article information.
486    
487            * R/textdoccol.R (setAs): Added coerce method from list to
488            corpus.
489    
490            * R/meta.R (meta): Improved meta data handling.
491    
492    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
495            Christian Buchta.
496    
497            * inst/CITATION: Added template to include JSS article reference.
498    
499    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
500    
501            * R/textdoccol.R (tmMap): Introduced lazy mapping.
502    
503            * R/source.R: Added VectorSource.
504    
505    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
506    
507            * man/: Language codes should be in ISO 639-1 format.
508    
509            * R/textdoccol.R (asPlain): Preserve local meta data.
510    
511    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
512    
513            * R/textdoccol.R (writeCorpus): Function for writing a corpus
514            containing plain text documents to disk.
515    
516    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
519            always set correctly.
520    
521            * R/textdoccol.R: Set load = TRUE as default for load on demand
522            since in most cases this is the wanted behaviour.
523    
524    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
525    
526            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
527    
528            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
529    
530    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
531    
532            * R/meta.R (meta): New function for consistent access to meta data
533            of document collections, repositories, and texts.
534    
535    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
536    
537            * R/: Better support for encodings.
538    
539    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
540    
541            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
542            selection when no reader argument is given.
543    
544    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
545    
546            * R/source.R (CSVSource): Now uses read.csv instead of scan
547            internally.
548    
549    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
550    
551            * R/reader.R (getReaders): Returns available reader functions.
552    
553            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
554            as default.
555    
556    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
557    
558            * R/stopwords.R (stopwords): Shortened code, removed codetools
559            variable warnings.
560    
561            * man/: Documentation for showMeta, added an example for tmMap.
562    
563            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
564            some minor typos fixed.
565    
566    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
567    
568            * R/aobjects.R (showMeta): Added method for pretty printing a
569            text document's meta data.
570    
571    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
572    
573            * R/textdoccol.R (TextDocCol): Better handling of empty
574            arguments.
575    
576            * NAMESPACE: Exported readDOC.
577    
578            * man/completeStems.Rd: Added an example.
579    
580    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
581    
582            * R/stopwords.R (stopwords): Look up .dat files at every
583            call. Allows users to modify stopword .dat files interactively.
584    
585    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
586    
587            * R/termdocmatrix.R (termFreq): Correct processing of empty
588            documents.
589    
590    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
591    
592            * man/: Updated documentation.
593    
594    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
595    
596            * R/complete.R (completeStems): Completes (heuristically) word
597            stems.
598    
599            * R/termdocmatrix.R (TermDocMatrix2): New modular
600            constructor.
601    
602            * NAMESPACE: Exported termFreq.
603    
604    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
605    
606            * R/reader.R (readDOC): Added MS Word reader (using antiword).
607    
608    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
609    
610            * R/weight.R: Weighting functions for TermDocMatrix.
611    
612    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
613    
614            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
615            functions for accessing dimension, column, and row names.
616    
617            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
618    
619    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
620    
621            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
622    
623    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
624    
625            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
626    
627    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
628    
629            * R/reader.R (readPDF): Removed manual checks for pdftotext and
630            pdfinfo. The system call gives a warning anyway.
631    
632    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
633    
634            * R/textdoccol.R (asPlain): Conversion from
635            StructuredTextDocuments to PlainTextDocuments.
636    
637    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
638    
639            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
640            for accessing term-document matrices.
641    
642            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
643            are installed.
644    
645    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
646    
647            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
648            Christian Buchta.
649    
650    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
651    
652            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
653    
654    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
655    
656            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
657    
658            * R/reader.R (readPDF): Added PDF reader.
659    
660    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
661    
662            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
663    
664            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
665    
666            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
667    
668            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
669    
670    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * R/distmeasure.R (dissimilarity): Replaced dists call from
673            package cba by new dist call from package proxy.
674    
675    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
678    
679    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
680    
681            * R/termdocmatrix.R: require() uses the quietly option to suppress
682            loading messages.
683    
684    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * R/dictionary.R: Added dictionary support.
687    
688    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
689    
690            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
691            documents. This simplifies some functions, e.g., asPlain.
692    
693    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
694    
695            * inst/doc/tm.Rnw: Fixed some typos in vignette.
696    
697    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
698    
699            * R/textdoccol.R (replaceWords): Added method to replace a set of
700            words by a single word. Useful for synonyms.
701    
702    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
703    
704            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
705    
706    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
707    
708            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
709            vectors. Thanks to Ariel Maguyon for his error report.
710            (removeSparseTerms): New function to remove columns from a
711            term-document matrix exceeding a sparse factor.
712    
713    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
714    
715            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
716    
717    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
718    
719            * man/sFilter.Rd: Corrected documentation on statement format (use
720            '==' instead of '=').
721    
722    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
723    
724            * R/aobjects.R (StructuredTextDocument): Inherits from
725            TextDocument.
726    
727    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
728    
729            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
730            on sparse matrices as proposed by Martin Maechler.
731    
732    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
733    
734            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
735            \pkg{filehash} version makes them deprecated.
736    
737    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
738    
739            * R/termdocmatrix.R (textvector): Stemming is now performed before
740            erasing stopwords.
741            (weightMatrix): Adapted to handle sparse matrices.
742            (TermDocMatrix): Sparse matrix is now efficiently built by
743            direct stepwise insertion of row values into it.
744    
745    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
746    
747            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
748            due to ongoing problems. For our purposes the latter is as useful
749            as the replaced package.
750    
751    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
752    
753            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
754    
755            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
756    
757    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
758    
759            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
760            languages with available stopwords.
761    
762    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
763    
764            * inst/doc/tm.Rnw: Minor corrections in the vignette.
765    
766    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
767    
768            * DESCRIPTION: Update to version 0.2, since a lot of new features
769            have been integrated.
770    
771            * inst/stopwords: Updated existing stopwords and added stopwords
772            for various other languages.
773    
774    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
775    
776            * man/: Updated documentation.
777    
778            * Work/testDb.R: Script to test database stuff.
779    
780            * R/: Fixed various database related bugs. Seems to be rather
781            useable now, i.e., consider as alpha status for now.
782    
783    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
784    
785            * R/: Fixed some bugs related to database support.
786    
787    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
788    
789            * man/: Added a lot of examples to the manuals.
790    
791    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
792    
793            * man/: Updated parts of the documentation.
794    
795            * R/textdoccol.R (asPlain): Added conversion from newsgroup
796            documents to plain text documents.
797    
798    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
799    
800            * R/textdoccol.R: Finished experimental database support. Not yet
801            intensively tested.
802    
803            * R/source.R: Now each source has a default reader.
804    
805            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
806            class anymore.
807    
808            * R/plaintextdoc.R: Custom show method for plain text documents.
809    
810            * R/aobjects.R: Added a class for structured text documents.
811    
812            * R/reader.R: Replaced remaining \code{parser} occurrences with
813            \code{reader}.
814    
815            * R/textdoccol.R (summary): Indent tags.
816    
817            * R/textdoccol.R (removePunctuation): Transform method to remove
818            punctuation marks.
819    
820    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
821    
822            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
823            using prescindMeta().
824    
825    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
826    
827            * R/textdoccol.R: Improved database support.
828    
829    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
830    
831            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
832    
833            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
834            language code.
835    
836            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
837            into parserControl argument.
838    
839            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
840    
841    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
842    
843            * Work/tmDataSetup.R: The datasets acq and crude can now be
844            created on the fly.
845    
846            * R/stopwords.R: Introduced a function returning the stopwords for
847            a given language (English, German and French at the moment)
848    
849            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
850            otherwise falls back to Snowball package.
851    
852    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
853    
854            * man/dissimilarity-methods.Rd: Make clear that any method offered
855            by "dists" from package "cba" can be used.
856    
857    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
858    
859            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
860            to Kurt's latex suggestion. Removed points and underscores in
861            variable names for consistent naming.
862    
863            * DESCRIPTION: Update to version 0.1-2.
864    
865            * man/TextRepository.Rd: Fixed bug in documentation.
866    
867    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * DESCRIPTION: Update to version 0.1-1.
870    
871    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
872    
873            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
874            wordStem.
875    
876    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
877    
878            * R/: Changes due to Kurt's review.
879    
880    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
881    
882            * R/: Implemented improvements based upon comments by David
883            Meyer.
884    
885    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
886    
887            * inst/doc/: Rewrote vignette.
888    
889            * man/: Improved documentation.
890    
891    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
892    
893            * man/: Updated documentation.
894    
895            * DESCRIPTION: Changed package name to "tm". Updated version to
896            0.1 for first CRAN release.
897    
898            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
899            list archive example.
900    
901            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
902            archive example.
903    
904            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
905            from (several mails per box) mbox format to (single mail per file)
906            eml format.
907    
908    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
909    
910            * data/crude.rda: Rebuilt.
911    
912            * data/acq.rda: Rebuilt.
913    
914            * R/reader.R: Factored out reader and parser methods from
915            textdoccol.R.
916    
917            * R/source.R: Factored out Source methods from aobjects.R and
918            textdoccol.R.
919            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
920            feeds.
921    
922            * R/textdoccol.R (DirSource): Added support for recursive
923            traversal of directories.
924    
925    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
926    
927            * R/textdoccol.R ([[): Loads the document corpus automatically
928            into memory upon access.
929            (tm_transform, tm_filter): Removed several checks whether the
930            document is already loaded ([[ ensures this now).
931            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
932            mailing list archive.
933    
934    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
935    
936            * R/aobjects.R (TextDocument): Is now a virtual class.
937            (Source): Is now a virtual class.
938    
939    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
940    
941            * R/textdoccol.R (c): Support for an arbitrary number of document
942            collections.
943    
944    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
945    
946            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
947            append_meta and remove_meta.
948    
949            * R/textdoccol.R: Removed modify_metadata method.
950    
951            * R/textrepo.R: Removed modify_metadata method.
952    
953            * R/textdoccol.R (remove_meta): Supports removal of document
954            collection metadata and document (= in data frame) metadata.
955    
956    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
957    
958            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
959    
960            * data/crude.rda: Rebuilt.
961    
962            * data/acq.rda: Rebuilt.
963    
964            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
965    
966            * R/textdoccol.R ([): Bug fix for subsetting a document
967            collection's data frame.
968    
969    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
970    
971            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
972            to s_filter.
973    
974            * R/textdoccol.R: Local text documents' metadata can now be copied
975            to a document collection's data frame with prescind_meta.
976    
977    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
978    
979            * R/: Text documents' slot metadata is now accessible in s_filter.
980    
981            * R/: Rewrote s_filter function (has still some restrictions).
982    
983    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * R/: Various fixes in handling metadata.
986    
987            * R/: Added update mechanism for text document collections.
988    
989    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
990    
991            * R/: Merging of document collections now creates a binary tree
992            for reconstructing merged document collections.
993    
994            * R/: Redesign of metadata for document collections.
995    
996    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
997    
998            * R/: Messages now use \code{ngettext}.
999    
1000    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1001    
1002            * R/: Added functions for modifying and removing metadata.
1003    
1004    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1005    
1006            * man/: Updated some documentation.
1007    
1008            * R/: Corrected some connection issues.
1009    
1010            * inst/doc: Worked on the vignette.
1011    
1012    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1013    
1014            * inst/: Added texts and started vignette.
1015    
1016            * R/: Final changes based upon David's comments.
1017    
1018    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1019    
1020            * NAMESPACE: Corrected exports (generic methods need exportMethods
1021            directives!).
1022    
1023    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1024    
1025            * R/: Modified the TextDocCol constructur and various parsers. It
1026            is now modular and supports various file formats via plugins (see
1027            the new "Source" class).
1028    
1029    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1030    
1031            * man/: Revised documentation after previous code changes.
1032    
1033    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1034    
1035            * R/: Remaining changes as discussed with David.
1036    
1037    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            * R/: Some changes as suggested by David. The rest will follow
1040            within the next days.
1041    
1042    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1043    
1044            * man/: Finished documentation.
1045    
1046    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1047    
1048            * man/: Wrote some documentation.
1049    
1050    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1051    
1052            * R/: Further syntactic sugar in form of additional assignment and
1053            accessor methods.
1054    
1055    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1056    
1057            * R/: Syntactic sugar in form of "length", "show" and "summary"
1058            operators.
1059    
1060    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1061    
1062            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1063            and dissimilarities.
1064    
1065    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1066    
1067            * R/: Added similarity functions.
1068    
1069            * data/: Added english stopwords.
1070    
1071    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1072    
1073            * data/: Examples compiled for new features
1074    
1075            * R/: Changes due to new structure.
1076    
1077            * NAMESPACE: Corrected namespace to reflect new structure.
1078    
1079            * R/termdocmatrix.R: Adapted for new naming scheme.
1080    
1081    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1082    
1083            * R/textdoccol.R: Adapted code for new class structure. Wrote
1084            several transform and filter functions operating on text document
1085            collections (alias text document databases).
1086    
1087            * R/aobjects.R: Adapted class structure with inheritance,
1088            repositories and additional meta data. Loading files on demand is
1089            now possible.
1090    
1091    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1092    
1093            * R/: Some cosmetic cleanups.
1094    
1095            * inst/: Removed vignette on clustering. That and much more is now
1096            described in the JSS paper on text mining. Based upon that
1097            article an elaborated vignette will be incorporated in the future.
1098    
1099    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1100    
1101            * R/: Updated generic S4 methods to comply with signature changes
1102            in newer versions of R (> 2.3)
1103    
1104    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1105    
1106            * ext/R/importRIS.R: Automatic RIS import is now possible.
1107    
1108    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1109    
1110            * R/textdoccol.R: Added RIS HTML input format.
1111    
1112    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1113    
1114            * R/textdoccol.R: Removed bug that caused invalid text document
1115            collections when handling many input files.
1116    
1117    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1118    
1119            * R/textdoccol.R: Restructured and extended file import
1120            mechanism.
1121    
1122            * inst/doc/clustering.Rnw: Adapted vignette for use with
1123            ReutNews.rda
1124    
1125            * man/ReutNews.Rd: Documentation for ReutNews.rda
1126    
1127            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1128    
1129    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1130    
1131            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1132            clustering facilities of this package.
1133    
1134    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1135    
1136            * R/aobjects.R: Changed package document structure to avoid class
1137            dependency problems.
1138    
1139  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1140    
1141            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1142            data set.
1143    
1144          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
1145          CMD check textmin" works without errors.          CMD check textmin" works without errors.
1146    

Legend:
Removed from v.28  
changed lines
  Added in v.1167

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge