SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 25, Wed Nov 30 18:53:50 2005 UTC pkg/ChangeLog revision 1159, Tue Dec 6 15:11:45 2011 UTC
# Line 1  Line 1 
1    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/matrix.R (termFreq): Allow reordering of control option
4            processing.
5    
6    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
9            pdfinfo tool.
10    
11            * inst/stopwords/SMART.dat: Add SMART information retrieval system
12            stopwords (which are also used by the MC toolkit).
13    
14            * R/matrix (termFreq): Allow local option \code{bounds$local} to
15            restrict how often a term may appear in each document (generalizes
16            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
17            for word length bounds (generalizes \code{minWordLength}).
18    
19            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
20            \code{bounds$global} for restricting how often a term is allowed
21            to appear in different documents.
22    
23            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
24            local options delegated internally to termFreq() and global
25            options which are processed by the term-document matrix
26            constructor itself.
27    
28    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
29    
30            * man/getTokenizers.Rd: Document getTokenizers().
31    
32            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
33    
34    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
35    
36            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
37    
38            * man/combine.Rd: Document c.term_frequency().
39    
40    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
43            can be accessed via '[' and not '[['.
44    
45    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
46    
47            * R/stopwords.R (stopwords): Raise an error if no stopwords are
48            available for requested language. Suggested by Derek M Jones.
49    
50    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
51    
52            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
53            normalization.
54    
55    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
56    
57            * R/transform.R (stemDocument.PlainTextDocument): Use language
58            argument.
59    
60    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
61    
62            * R/source.R: Store strings and connections instead of unevaluated
63            calls.
64    
65    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
66    
67            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
68    
69    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
70    
71            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
72            (instead of a list element).
73    
74    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
75    
76            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
77            documents by names (fallback to IDs if names are not set).
78    
79    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
80    
81            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
82            \code{recursive} now determines whether existing corpus meta data
83            is used.
84    
85    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
86    
87            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
88    
89    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
90    
91            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
92            remove terms not occurring in the corpus anymore.
93    
94    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
95    
96            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
97            and Heaps' law.
98    
99    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
100    
101            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
102            provided by a source.
103    
104    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
105    
106            * R/source.R (.Source): Provide document names.
107    
108    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
109    
110            * R/meta.R (`content_or_meta`): Utility function.
111    
112    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
115            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
116    
117    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/weight.R (weightTfIdf): Added normalization option.
120    
121            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
122            analysis.
123    
124    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
125    
126            * R/score.R (tm_tag_score): Compute a score from the number of
127            tags matching in a document.
128    
129    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
130    
131            * R/complete.R (stemCompletion): New completion heuristics.
132    
133    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
134    
135            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
136    
137    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
138    
139            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
140            setOldClass(c(..., "list")) works.
141    
142    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/transform.R (stemDocument.character): In case input is a
145            simple character just delegate to the default Snowball stemmer.
146    
147    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
148    
149            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
150            data.
151    
152    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
153    
154            * R/doc.R (`Content<-`): Be careful with names attribute.
155    
156    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
157    
158            * R/source.R (DirSource): Improved implementation especially when
159            handling many (> 1M) files.
160    
161    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
162    
163            * R/source.R (getElem.URISource): Use encoding argument.
164    
165    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
166    
167            * R/doc.R (setOldClass): Register S3 document classes to be
168            recognized by S4 methods.
169    
170    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/matrix.R (termFreq): Add option to remove punctuation
173            characters.
174    
175    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
178            merging multiple term-document matrices.
179    
180    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
181    
182            * R/corpus.R (setOldClass): Register S3 corpus classes to be
183            recognized by S4 methods.
184    
185            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
186            that CRAN Mac OS X builds do not fail any longer.
187    
188    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
189    
190            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
191            of RWeka:AlphabeticTokenizer() as default.
192    
193    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
194    
195            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
196            caused words at the beginning or the end of a line not to be removed. Do
197            not delete whitespace anymore.
198    
199    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
200    
201            * R/source.R (DirSource): Default to working directory if no path
202            is specified.
203    
204    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
205    
206            * R/source.R (DirSource): Stop on empty directories.
207    
208    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
209    
210            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
211            named documents.
212    
213    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
214    
215            * R/transform.R (removeWords): Improve regular expressions.
216    
217    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
218    
219            * R/meta.R (DublinCore): Allow lower case tags.
220    
221    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
222    
223            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
224            instead of x$children.
225    
226    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
227    
228            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
229    
230    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
231    
232            * R/: Use S3 instead of S4 class system.
233    
234    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
235    
236            * R/reader.R (readMail): Moved to tm.plugin.mail package.
237    
238    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
239    
240            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
241            postings are basically e-mails with some extra headers.
242    
243    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
244    
245            * R/transform.R: Move convertMboxEml, removeCitation,
246            removeMultipart, and removeSignature to the tm.plugin.mail package
247            since they are mainly utility functions (for handling e-mails) and
248            not very framework specific.
249    
250    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
251    
252            * man/: Fix documentation.
253    
254    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
255    
256            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
257            plain text document instead of an XML document for texts of the
258            Reuters-21578 dataset.
259    
260            * R/sparse.R: Removed since the slam package is now available on
261            CRAN.
262    
263            * DESCRIPTION (Depends): Add slam package.
264    
265    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
266    
267            * R/transform.R (stemDoc): Fix character(0) handling.
268    
269    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
270    
271            * R/doc.R (show): Pretty print.
272    
273    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
274    
275            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
276            gracefully.
277    
278    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
279    
280            * R/corpus.R: Make corpus virtual. Implement corpus with standard
281            and permanent storage semantics.
282    
283            * DESCRIPTION: New major release. A *lot* of improvements.
284    
285    2009-05-04   Ingo Feinerer <feinerer@logic.at>
286    
287            * NAMESPACE: Export some simple_triplet_matrix functions.
288    
289    2009-04-28   Ingo Feinerer <feinerer@logic.at>
290    
291            * R/weight.R: Adapt tf-idf to new matrix format.
292    
293    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
294    
295            * R/matrix.R: Create two distinct classes for term-document and
296            document-term matrices.
297    
298    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
299    
300            * R/termdocmatrix.R: No longer use Matrix package. This reduces
301            package start-up time significantly.
302    
303    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
304    
305            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
306    
307    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
308    
309            * R/transform.R (tmReduce): Combine multiple maps into one
310            transformation.
311    
312    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
313    
314            * R/weight.R: Remove weightLogical since it does not return a
315            dgCMatrix.
316    
317            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
318            or TermDocumentMatrix instead.
319    
320    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
321    
322            * inst/doc/extensions.Rnw: Finished vignette.
323    
324    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
325    
326            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
327            DocumentTermMatrix representations.
328    
329    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
330    
331            * R/reader.R (readXML): New reader for arbitrary XML files.
332    
333    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
334    
335            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
336            (XMLSource): New XMLSource class for arbitrary XML files.
337            (Source): New slot Vectorized.
338    
339    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
340    
341            * R/reader.R (readTabular): Experimental reader for tabular data
342            structures which can be customized via user-defined mappings.
343    
344            * R/reader.R: Always use UTC time zone.
345    
346            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
347    
348    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
349    
350            * R/reader.R (readDOC): Options can be passed over to antiword.
351    
352            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
353            pdftotext.
354    
355    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
356    
357            * R/source.R (DirSource): Add pattern and ignore.case arguments
358            which are internally passed over to list.files().
359    
360    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
361    
362            * inst/doc/tm.Rnw: Suppress pointless loading message.
363    
364    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
365    
366            * DESCRIPTION: Speed up package loading (via moving packages not
367            strictly necessary for normal operation to Suggests instead of
368            Depends).
369    
370    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
371    
372            * R/reader.R (readNewsgroup): The date format is now configurable.
373    
374    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
375    
376            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
377    
378    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
379    
380            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
381    
382    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
383    
384            * R/source.R (DataframeSource): New source class for data frames.
385    
386            * R/source.R: Fixed non-standard call evaluation.
387    
388    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
389    
390            * R/source.R (URISource): New source class for a single document.
391    
392    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
393    
394            * R/source.R: Refactoring.
395    
396    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
397    
398            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
399            Rmpi installations more gracefully.
400    
401    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
402    
403            * R/source.R (Source): Add Length slot.
404    
405    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
406    
407            * R/AAA.R: Unify duplicated .onLoad function.
408    
409    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
410    
411            * DESCRIPTION (Suggests): Added Rmpi.
412    
413    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
414    
415            * R/source.R (getElem): Fix 'no visible binding' warning.
416    
417            * man/WeightFunction.Rd: Fix signature.
418    
419    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
420    
421            * R/weight.R: Introduce name abbreviations for weighting functions.
422    
423    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
424    
425            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
426    
427            * R/cluster.R: Provide convenience functions for using a MPI
428            cluster.
429    
430            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
431            available.
432    
433            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
434            available.
435    
436    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
437    
438            * R/textdoccol.R (lapply): Removed debug print out.
439    
440    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
441    
442            * R/reader.R (readRCV1): Improved meta data extraction from
443            Reuters Corpus Volume 1 documents.
444    
445    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
446    
447            * R/transform.R: Ensure that all mappings preserve multiline
448            structures.
449    
450    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
451    
452            * R/filter.R: Every filter has now an attribute indicating whether
453            it sould be applied to document level (doclevel).
454    
455            * R/textdoccol.R (tmFilter): Set searchFullText as new default
456            filter.
457    
458    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/transform.R (replacePatterns): Replaced removeWords by
461            replacePatterns. Suggested by Christian Buchta.
462    
463            * R/textdoccol.R (inspect): Improved formatting.
464    
465    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
466    
467            * inst/CITATION: Updated JSS article information.
468    
469            * R/textdoccol.R (setAs): Added coerce method from list to
470            corpus.
471    
472            * R/meta.R (meta): Improved meta data handling.
473    
474    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
475    
476            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
477            Christian Buchta.
478    
479            * inst/CITATION: Added template to include JSS article reference.
480    
481    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
482    
483            * R/textdoccol.R (tmMap): Introduced lazy mapping.
484    
485            * R/source.R: Added VectorSource.
486    
487    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
488    
489            * man/: Language codes should be in ISO 639-1 format.
490    
491            * R/textdoccol.R (asPlain): Preserve local meta data.
492    
493    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
494    
495            * R/textdoccol.R (writeCorpus): Function for writing a corpus
496            containing plain text documents to disk.
497    
498    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
501            always set correctly.
502    
503            * R/textdoccol.R: Set load = TRUE as default for load on demand
504            since in most cases this is the wanted behaviour.
505    
506    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
509    
510            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
511    
512    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
513    
514            * R/meta.R (meta): New function for consistent access to meta data
515            of document collections, repositories, and texts.
516    
517    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * R/: Better support for encodings.
520    
521    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
524            selection when no reader argument is given.
525    
526    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/source.R (CSVSource): Now uses read.csv instead of scan
529            internally.
530    
531    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
532    
533            * R/reader.R (getReaders): Returns available reader functions.
534    
535            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
536            as default.
537    
538    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
539    
540            * R/stopwords.R (stopwords): Shortened code, removed codetools
541            variable warnings.
542    
543            * man/: Documentation for showMeta, added an example for tmMap.
544    
545            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
546            some minor typos fixed.
547    
548    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
549    
550            * R/aobjects.R (showMeta): Added method for pretty printing a
551            text document's meta data.
552    
553    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
554    
555            * R/textdoccol.R (TextDocCol): Better handling of empty
556            arguments.
557    
558            * NAMESPACE: Exported readDOC.
559    
560            * man/completeStems.Rd: Added an example.
561    
562    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
563    
564            * R/stopwords.R (stopwords): Look up .dat files at every
565            call. Allows users to modify stopword .dat files interactively.
566    
567    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
568    
569            * R/termdocmatrix.R (termFreq): Correct processing of empty
570            documents.
571    
572    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
573    
574            * man/: Updated documentation.
575    
576    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
577    
578            * R/complete.R (completeStems): Completes (heuristically) word
579            stems.
580    
581            * R/termdocmatrix.R (TermDocMatrix2): New modular
582            constructor.
583    
584            * NAMESPACE: Exported termFreq.
585    
586    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
587    
588            * R/reader.R (readDOC): Added MS Word reader (using antiword).
589    
590    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
591    
592            * R/weight.R: Weighting functions for TermDocMatrix.
593    
594    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
595    
596            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
597            functions for accessing dimension, column, and row names.
598    
599            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
600    
601    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
602    
603            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
604    
605    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
606    
607            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
608    
609    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
610    
611            * R/reader.R (readPDF): Removed manual checks for pdftotext and
612            pdfinfo. The system call gives a warning anyway.
613    
614    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
615    
616            * R/textdoccol.R (asPlain): Conversion from
617            StructuredTextDocuments to PlainTextDocuments.
618    
619    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
620    
621            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
622            for accessing term-document matrices.
623    
624            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
625            are installed.
626    
627    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
628    
629            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
630            Christian Buchta.
631    
632    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
633    
634            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
635    
636    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
637    
638            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
639    
640            * R/reader.R (readPDF): Added PDF reader.
641    
642    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
643    
644            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
645    
646            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
647    
648            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
649    
650            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
651    
652    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
653    
654            * R/distmeasure.R (dissimilarity): Replaced dists call from
655            package cba by new dist call from package proxy.
656    
657    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
658    
659            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
660    
661    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
662    
663            * R/termdocmatrix.R: require() uses the quietly option to suppress
664            loading messages.
665    
666    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
667    
668            * R/dictionary.R: Added dictionary support.
669    
670    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
673            documents. This simplifies some functions, e.g., asPlain.
674    
675    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * inst/doc/tm.Rnw: Fixed some typos in vignette.
678    
679    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
680    
681            * R/textdoccol.R (replaceWords): Added method to replace a set of
682            words by a single word. Useful for synonyms.
683    
684    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
687    
688    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
689    
690            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
691            vectors. Thanks to Ariel Maguyon for his error report.
692            (removeSparseTerms): New function to remove columns from a
693            term-document matrix exceeding a sparse factor.
694    
695    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
696    
697            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
698    
699    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
700    
701            * man/sFilter.Rd: Corrected documentation on statement format (use
702            '==' instead of '=').
703    
704    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
705    
706            * R/aobjects.R (StructuredTextDocument): Inherits from
707            TextDocument.
708    
709    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
710    
711            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
712            on sparse matrices as proposed by Martin Maechler.
713    
714    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
715    
716            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
717            \pkg{filehash} version makes them deprecated.
718    
719    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
720    
721            * R/termdocmatrix.R (textvector): Stemming is now performed before
722            erasing stopwords.
723            (weightMatrix): Adapted to handle sparse matrices.
724            (TermDocMatrix): Sparse matrix is now efficiently built by
725            direct stepwise insertion of row values into it.
726    
727    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
728    
729            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
730            due to ongoing problems. For our purposes the latter is as useful
731            as the replaced package.
732    
733    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
734    
735            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
736    
737            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
738    
739    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
740    
741            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
742            languages with available stopwords.
743    
744    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
745    
746            * inst/doc/tm.Rnw: Minor corrections in the vignette.
747    
748    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
749    
750            * DESCRIPTION: Update to version 0.2, since a lot of new features
751            have been integrated.
752    
753            * inst/stopwords: Updated existing stopwords and added stopwords
754            for various other languages.
755    
756    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
757    
758            * man/: Updated documentation.
759    
760            * Work/testDb.R: Script to test database stuff.
761    
762            * R/: Fixed various database related bugs. Seems to be rather
763            useable now, i.e., consider as alpha status for now.
764    
765    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
766    
767            * R/: Fixed some bugs related to database support.
768    
769    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
770    
771            * man/: Added a lot of examples to the manuals.
772    
773    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
774    
775            * man/: Updated parts of the documentation.
776    
777            * R/textdoccol.R (asPlain): Added conversion from newsgroup
778            documents to plain text documents.
779    
780    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
781    
782            * R/textdoccol.R: Finished experimental database support. Not yet
783            intensively tested.
784    
785            * R/source.R: Now each source has a default reader.
786    
787            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
788            class anymore.
789    
790            * R/plaintextdoc.R: Custom show method for plain text documents.
791    
792            * R/aobjects.R: Added a class for structured text documents.
793    
794            * R/reader.R: Replaced remaining \code{parser} occurrences with
795            \code{reader}.
796    
797            * R/textdoccol.R (summary): Indent tags.
798    
799            * R/textdoccol.R (removePunctuation): Transform method to remove
800            punctuation marks.
801    
802    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
803    
804            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
805            using prescindMeta().
806    
807    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
808    
809            * R/textdoccol.R: Improved database support.
810    
811    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
812    
813            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
814    
815            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
816            language code.
817    
818            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
819            into parserControl argument.
820    
821            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
822    
823    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
824    
825            * Work/tmDataSetup.R: The datasets acq and crude can now be
826            created on the fly.
827    
828            * R/stopwords.R: Introduced a function returning the stopwords for
829            a given language (English, German and French at the moment)
830    
831            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
832            otherwise falls back to Snowball package.
833    
834    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
835    
836            * man/dissimilarity-methods.Rd: Make clear that any method offered
837            by "dists" from package "cba" can be used.
838    
839    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
840    
841            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
842            to Kurt's latex suggestion. Removed points and underscores in
843            variable names for consistent naming.
844    
845            * DESCRIPTION: Update to version 0.1-2.
846    
847            * man/TextRepository.Rd: Fixed bug in documentation.
848    
849    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
850    
851            * DESCRIPTION: Update to version 0.1-1.
852    
853    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
854    
855            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
856            wordStem.
857    
858    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
859    
860            * R/: Changes due to Kurt's review.
861    
862    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
863    
864            * R/: Implemented improvements based upon comments by David
865            Meyer.
866    
867    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * inst/doc/: Rewrote vignette.
870    
871            * man/: Improved documentation.
872    
873    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
874    
875            * man/: Updated documentation.
876    
877            * DESCRIPTION: Changed package name to "tm". Updated version to
878            0.1 for first CRAN release.
879    
880            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
881            list archive example.
882    
883            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
884            archive example.
885    
886            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
887            from (several mails per box) mbox format to (single mail per file)
888            eml format.
889    
890    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
891    
892            * data/crude.rda: Rebuilt.
893    
894            * data/acq.rda: Rebuilt.
895    
896            * R/reader.R: Factored out reader and parser methods from
897            textdoccol.R.
898    
899            * R/source.R: Factored out Source methods from aobjects.R and
900            textdoccol.R.
901            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
902            feeds.
903    
904            * R/textdoccol.R (DirSource): Added support for recursive
905            traversal of directories.
906    
907    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
908    
909            * R/textdoccol.R ([[): Loads the document corpus automatically
910            into memory upon access.
911            (tm_transform, tm_filter): Removed several checks whether the
912            document is already loaded ([[ ensures this now).
913            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
914            mailing list archive.
915    
916    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
917    
918            * R/aobjects.R (TextDocument): Is now a virtual class.
919            (Source): Is now a virtual class.
920    
921    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
922    
923            * R/textdoccol.R (c): Support for an arbitrary number of document
924            collections.
925    
926    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
927    
928            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
929            append_meta and remove_meta.
930    
931            * R/textdoccol.R: Removed modify_metadata method.
932    
933            * R/textrepo.R: Removed modify_metadata method.
934    
935            * R/textdoccol.R (remove_meta): Supports removal of document
936            collection metadata and document (= in data frame) metadata.
937    
938    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
939    
940            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
941    
942            * data/crude.rda: Rebuilt.
943    
944            * data/acq.rda: Rebuilt.
945    
946            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
947    
948            * R/textdoccol.R ([): Bug fix for subsetting a document
949            collection's data frame.
950    
951    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
952    
953            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
954            to s_filter.
955    
956            * R/textdoccol.R: Local text documents' metadata can now be copied
957            to a document collection's data frame with prescind_meta.
958    
959    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
960    
961            * R/: Text documents' slot metadata is now accessible in s_filter.
962    
963            * R/: Rewrote s_filter function (has still some restrictions).
964    
965    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
966    
967            * R/: Various fixes in handling metadata.
968    
969            * R/: Added update mechanism for text document collections.
970    
971    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
972    
973            * R/: Merging of document collections now creates a binary tree
974            for reconstructing merged document collections.
975    
976            * R/: Redesign of metadata for document collections.
977    
978    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
979    
980            * R/: Messages now use \code{ngettext}.
981    
982    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
983    
984            * R/: Added functions for modifying and removing metadata.
985    
986    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
987    
988            * man/: Updated some documentation.
989    
990            * R/: Corrected some connection issues.
991    
992            * inst/doc: Worked on the vignette.
993    
994    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
995    
996            * inst/: Added texts and started vignette.
997    
998            * R/: Final changes based upon David's comments.
999    
1000    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1001    
1002            * NAMESPACE: Corrected exports (generic methods need exportMethods
1003            directives!).
1004    
1005    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1006    
1007            * R/: Modified the TextDocCol constructur and various parsers. It
1008            is now modular and supports various file formats via plugins (see
1009            the new "Source" class).
1010    
1011    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1012    
1013            * man/: Revised documentation after previous code changes.
1014    
1015    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1016    
1017            * R/: Remaining changes as discussed with David.
1018    
1019    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1020    
1021            * R/: Some changes as suggested by David. The rest will follow
1022            within the next days.
1023    
1024    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1025    
1026            * man/: Finished documentation.
1027    
1028    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1029    
1030            * man/: Wrote some documentation.
1031    
1032    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1033    
1034            * R/: Further syntactic sugar in form of additional assignment and
1035            accessor methods.
1036    
1037    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            * R/: Syntactic sugar in form of "length", "show" and "summary"
1040            operators.
1041    
1042    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1043    
1044            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1045            and dissimilarities.
1046    
1047    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1048    
1049            * R/: Added similarity functions.
1050    
1051            * data/: Added english stopwords.
1052    
1053    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1054    
1055            * data/: Examples compiled for new features
1056    
1057            * R/: Changes due to new structure.
1058    
1059            * NAMESPACE: Corrected namespace to reflect new structure.
1060    
1061            * R/termdocmatrix.R: Adapted for new naming scheme.
1062    
1063    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1064    
1065            * R/textdoccol.R: Adapted code for new class structure. Wrote
1066            several transform and filter functions operating on text document
1067            collections (alias text document databases).
1068    
1069            * R/aobjects.R: Adapted class structure with inheritance,
1070            repositories and additional meta data. Loading files on demand is
1071            now possible.
1072    
1073    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1074    
1075            * R/: Some cosmetic cleanups.
1076    
1077            * inst/: Removed vignette on clustering. That and much more is now
1078            described in the JSS paper on text mining. Based upon that
1079            article an elaborated vignette will be incorporated in the future.
1080    
1081    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1082    
1083            * R/: Updated generic S4 methods to comply with signature changes
1084            in newer versions of R (> 2.3)
1085    
1086    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1087    
1088            * ext/R/importRIS.R: Automatic RIS import is now possible.
1089    
1090    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1091    
1092            * R/textdoccol.R: Added RIS HTML input format.
1093    
1094    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1095    
1096            * R/textdoccol.R: Removed bug that caused invalid text document
1097            collections when handling many input files.
1098    
1099    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1100    
1101            * R/textdoccol.R: Restructured and extended file import
1102            mechanism.
1103    
1104            * inst/doc/clustering.Rnw: Adapted vignette for use with
1105            ReutNews.rda
1106    
1107            * man/ReutNews.Rd: Documentation for ReutNews.rda
1108    
1109            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1110    
1111    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1112    
1113            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1114            clustering facilities of this package.
1115    
1116    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1117    
1118            * R/aobjects.R: Changed package document structure to avoid class
1119            dependency problems.
1120    
1121    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1122    
1123            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1124            data set.
1125    
1126            *  Finished documentation and reordered directory structure. Now "R
1127            CMD check textmin" works without errors.
1128    
1129    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1130    
1131            * src/: Various splits can now be easily created for the
1132            Reuters21578 data set.
1133    
1134    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1135    
1136            *  Updated documentation
1137    
1138  2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1139    
1140          * Wrote R documentation for some classes and methods.          * Wrote R documentation for some classes and methods.

Legend:
Removed from v.25  
changed lines
  Added in v.1159

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge