SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC pkg/ChangeLog revision 1164, Mon Dec 12 06:42:28 2011 UTC
# Line 1  Line 1 
1    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
4    
5    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
6    
7            * R/transform.R (removePunctuation): Add option to preserve
8            intra-word dashes.
9    
10    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
11    
12            * R/matrix.R (termFreq): Allow reordering of control option
13            processing.
14    
15    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
16    
17            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
18            pdfinfo tool.
19    
20            * inst/stopwords/SMART.dat: Add SMART information retrieval system
21            stopwords (which are also used by the MC toolkit).
22    
23            * R/matrix (termFreq): Allow local option \code{bounds$local} to
24            restrict how often a term may appear in each document (generalizes
25            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
26            for word length bounds (generalizes \code{minWordLength}).
27    
28            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
29            \code{bounds$global} for restricting how often a term is allowed
30            to appear in different documents.
31    
32            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
33            local options delegated internally to termFreq() and global
34            options which are processed by the term-document matrix
35            constructor itself.
36    
37    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
38    
39            * man/getTokenizers.Rd: Document getTokenizers().
40    
41            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
42    
43    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
44    
45            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
46    
47            * man/combine.Rd: Document c.term_frequency().
48    
49    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
50    
51            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
52            can be accessed via '[' and not '[['.
53    
54    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
55    
56            * R/stopwords.R (stopwords): Raise an error if no stopwords are
57            available for requested language. Suggested by Derek M Jones.
58    
59    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
60    
61            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
62            normalization.
63    
64    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
65    
66            * R/transform.R (stemDocument.PlainTextDocument): Use language
67            argument.
68    
69    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
70    
71            * R/source.R: Store strings and connections instead of unevaluated
72            calls.
73    
74    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
75    
76            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
77    
78    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
79    
80            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
81            (instead of a list element).
82    
83    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
84    
85            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
86            documents by names (fallback to IDs if names are not set).
87    
88    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
91            \code{recursive} now determines whether existing corpus meta data
92            is used.
93    
94    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
95    
96            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
97    
98    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
99    
100            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
101            remove terms not occurring in the corpus anymore.
102    
103    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
104    
105            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
106            and Heaps' law.
107    
108    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
109    
110            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
111            provided by a source.
112    
113    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
114    
115            * R/source.R (.Source): Provide document names.
116    
117    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/meta.R (`content_or_meta`): Utility function.
120    
121    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
122    
123            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
124            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
125    
126    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
127    
128            * R/weight.R (weightTfIdf): Added normalization option.
129    
130            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
131            analysis.
132    
133    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
134    
135            * R/score.R (tm_tag_score): Compute a score from the number of
136            tags matching in a document.
137    
138    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
139    
140            * R/complete.R (stemCompletion): New completion heuristics.
141    
142    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
145    
146    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
147    
148            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
149            setOldClass(c(..., "list")) works.
150    
151    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
152    
153            * R/transform.R (stemDocument.character): In case input is a
154            simple character just delegate to the default Snowball stemmer.
155    
156    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
157    
158            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
159            data.
160    
161    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
162    
163            * R/doc.R (`Content<-`): Be careful with names attribute.
164    
165    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
166    
167            * R/source.R (DirSource): Improved implementation especially when
168            handling many (> 1M) files.
169    
170    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/source.R (getElem.URISource): Use encoding argument.
173    
174    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
175    
176            * R/doc.R (setOldClass): Register S3 document classes to be
177            recognized by S4 methods.
178    
179    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
180    
181            * R/matrix.R (termFreq): Add option to remove punctuation
182            characters.
183    
184    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
185    
186            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
187            merging multiple term-document matrices.
188    
189    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
190    
191            * R/corpus.R (setOldClass): Register S3 corpus classes to be
192            recognized by S4 methods.
193    
194            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
195            that CRAN Mac OS X builds do not fail any longer.
196    
197    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
198    
199            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
200            of RWeka:AlphabeticTokenizer() as default.
201    
202    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
203    
204            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
205            caused words at the beginning or the end of a line not to be removed. Do
206            not delete whitespace anymore.
207    
208    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
209    
210            * R/source.R (DirSource): Default to working directory if no path
211            is specified.
212    
213    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
214    
215            * R/source.R (DirSource): Stop on empty directories.
216    
217    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
218    
219            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
220            named documents.
221    
222    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
223    
224            * R/transform.R (removeWords): Improve regular expressions.
225    
226    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
227    
228            * R/meta.R (DublinCore): Allow lower case tags.
229    
230    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
231    
232            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
233            instead of x$children.
234    
235    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
236    
237            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
238    
239    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
240    
241            * R/: Use S3 instead of S4 class system.
242    
243    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
244    
245            * R/reader.R (readMail): Moved to tm.plugin.mail package.
246    
247    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
248    
249            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
250            postings are basically e-mails with some extra headers.
251    
252    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
253    
254            * R/transform.R: Move convertMboxEml, removeCitation,
255            removeMultipart, and removeSignature to the tm.plugin.mail package
256            since they are mainly utility functions (for handling e-mails) and
257            not very framework specific.
258    
259    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
260    
261            * man/: Fix documentation.
262    
263    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
264    
265            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
266            plain text document instead of an XML document for texts of the
267            Reuters-21578 dataset.
268    
269            * R/sparse.R: Removed since the slam package is now available on
270            CRAN.
271    
272            * DESCRIPTION (Depends): Add slam package.
273    
274    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
275    
276            * R/transform.R (stemDoc): Fix character(0) handling.
277    
278    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
279    
280            * R/doc.R (show): Pretty print.
281    
282    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
283    
284            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
285            gracefully.
286    
287    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
288    
289            * R/corpus.R: Make corpus virtual. Implement corpus with standard
290            and permanent storage semantics.
291    
292            * DESCRIPTION: New major release. A *lot* of improvements.
293    
294    2009-05-04   Ingo Feinerer <feinerer@logic.at>
295    
296            * NAMESPACE: Export some simple_triplet_matrix functions.
297    
298    2009-04-28   Ingo Feinerer <feinerer@logic.at>
299    
300            * R/weight.R: Adapt tf-idf to new matrix format.
301    
302    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
303    
304            * R/matrix.R: Create two distinct classes for term-document and
305            document-term matrices.
306    
307    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
308    
309            * R/termdocmatrix.R: No longer use Matrix package. This reduces
310            package start-up time significantly.
311    
312    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
313    
314            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
315    
316    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
317    
318            * R/transform.R (tmReduce): Combine multiple maps into one
319            transformation.
320    
321    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/weight.R: Remove weightLogical since it does not return a
324            dgCMatrix.
325    
326            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
327            or TermDocumentMatrix instead.
328    
329    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
330    
331            * inst/doc/extensions.Rnw: Finished vignette.
332    
333    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
334    
335            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
336            DocumentTermMatrix representations.
337    
338    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
339    
340            * R/reader.R (readXML): New reader for arbitrary XML files.
341    
342    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
343    
344            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
345            (XMLSource): New XMLSource class for arbitrary XML files.
346            (Source): New slot Vectorized.
347    
348    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
349    
350            * R/reader.R (readTabular): Experimental reader for tabular data
351            structures which can be customized via user-defined mappings.
352    
353            * R/reader.R: Always use UTC time zone.
354    
355            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
356    
357    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
358    
359            * R/reader.R (readDOC): Options can be passed over to antiword.
360    
361            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
362            pdftotext.
363    
364    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
365    
366            * R/source.R (DirSource): Add pattern and ignore.case arguments
367            which are internally passed over to list.files().
368    
369    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
370    
371            * inst/doc/tm.Rnw: Suppress pointless loading message.
372    
373    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
374    
375            * DESCRIPTION: Speed up package loading (via moving packages not
376            strictly necessary for normal operation to Suggests instead of
377            Depends).
378    
379    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
380    
381            * R/reader.R (readNewsgroup): The date format is now configurable.
382    
383    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
384    
385            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
386    
387    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
388    
389            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
390    
391    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
392    
393            * R/source.R (DataframeSource): New source class for data frames.
394    
395            * R/source.R: Fixed non-standard call evaluation.
396    
397    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
398    
399            * R/source.R (URISource): New source class for a single document.
400    
401    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
402    
403            * R/source.R: Refactoring.
404    
405    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
406    
407            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
408            Rmpi installations more gracefully.
409    
410    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
411    
412            * R/source.R (Source): Add Length slot.
413    
414    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
415    
416            * R/AAA.R: Unify duplicated .onLoad function.
417    
418    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
419    
420            * DESCRIPTION (Suggests): Added Rmpi.
421    
422    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
423    
424            * R/source.R (getElem): Fix 'no visible binding' warning.
425    
426            * man/WeightFunction.Rd: Fix signature.
427    
428    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
429    
430            * R/weight.R: Introduce name abbreviations for weighting functions.
431    
432    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
433    
434            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
435    
436            * R/cluster.R: Provide convenience functions for using a MPI
437            cluster.
438    
439            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
440            available.
441    
442            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
443            available.
444    
445    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
446    
447            * R/textdoccol.R (lapply): Removed debug print out.
448    
449    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
450    
451            * R/reader.R (readRCV1): Improved meta data extraction from
452            Reuters Corpus Volume 1 documents.
453    
454    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * R/transform.R: Ensure that all mappings preserve multiline
457            structures.
458    
459    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
460    
461            * R/filter.R: Every filter has now an attribute indicating whether
462            it sould be applied to document level (doclevel).
463    
464            * R/textdoccol.R (tmFilter): Set searchFullText as new default
465            filter.
466    
467    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
468    
469            * R/transform.R (replacePatterns): Replaced removeWords by
470            replacePatterns. Suggested by Christian Buchta.
471    
472            * R/textdoccol.R (inspect): Improved formatting.
473    
474    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
475    
476            * inst/CITATION: Updated JSS article information.
477    
478            * R/textdoccol.R (setAs): Added coerce method from list to
479            corpus.
480    
481            * R/meta.R (meta): Improved meta data handling.
482    
483    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
484    
485            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
486            Christian Buchta.
487    
488            * inst/CITATION: Added template to include JSS article reference.
489    
490    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
491    
492            * R/textdoccol.R (tmMap): Introduced lazy mapping.
493    
494            * R/source.R: Added VectorSource.
495    
496    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
497    
498            * man/: Language codes should be in ISO 639-1 format.
499    
500            * R/textdoccol.R (asPlain): Preserve local meta data.
501    
502    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
503    
504            * R/textdoccol.R (writeCorpus): Function for writing a corpus
505            containing plain text documents to disk.
506    
507    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
508    
509            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
510            always set correctly.
511    
512            * R/textdoccol.R: Set load = TRUE as default for load on demand
513            since in most cases this is the wanted behaviour.
514    
515    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
516    
517            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
518    
519            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
520    
521    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/meta.R (meta): New function for consistent access to meta data
524            of document collections, repositories, and texts.
525    
526    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/: Better support for encodings.
529    
530    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
531    
532            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
533            selection when no reader argument is given.
534    
535    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
536    
537            * R/source.R (CSVSource): Now uses read.csv instead of scan
538            internally.
539    
540    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
541    
542            * R/reader.R (getReaders): Returns available reader functions.
543    
544            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
545            as default.
546    
547    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
548    
549            * R/stopwords.R (stopwords): Shortened code, removed codetools
550            variable warnings.
551    
552            * man/: Documentation for showMeta, added an example for tmMap.
553    
554            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
555            some minor typos fixed.
556    
557    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
558    
559            * R/aobjects.R (showMeta): Added method for pretty printing a
560            text document's meta data.
561    
562    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
563    
564            * R/textdoccol.R (TextDocCol): Better handling of empty
565            arguments.
566    
567            * NAMESPACE: Exported readDOC.
568    
569            * man/completeStems.Rd: Added an example.
570    
571    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
572    
573            * R/stopwords.R (stopwords): Look up .dat files at every
574            call. Allows users to modify stopword .dat files interactively.
575    
576    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
577    
578            * R/termdocmatrix.R (termFreq): Correct processing of empty
579            documents.
580    
581    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
582    
583            * man/: Updated documentation.
584    
585    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
586    
587            * R/complete.R (completeStems): Completes (heuristically) word
588            stems.
589    
590            * R/termdocmatrix.R (TermDocMatrix2): New modular
591            constructor.
592    
593            * NAMESPACE: Exported termFreq.
594    
595    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
596    
597            * R/reader.R (readDOC): Added MS Word reader (using antiword).
598    
599    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
600    
601            * R/weight.R: Weighting functions for TermDocMatrix.
602    
603    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
604    
605            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
606            functions for accessing dimension, column, and row names.
607    
608            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
609    
610    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
611    
612            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
613    
614    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
615    
616            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
617    
618    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
619    
620            * R/reader.R (readPDF): Removed manual checks for pdftotext and
621            pdfinfo. The system call gives a warning anyway.
622    
623    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
624    
625            * R/textdoccol.R (asPlain): Conversion from
626            StructuredTextDocuments to PlainTextDocuments.
627    
628    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
629    
630            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
631            for accessing term-document matrices.
632    
633            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
634            are installed.
635    
636    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
637    
638            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
639            Christian Buchta.
640    
641    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
642    
643            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
644    
645    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
646    
647            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
648    
649            * R/reader.R (readPDF): Added PDF reader.
650    
651    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
652    
653            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
654    
655            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
656    
657            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
658    
659            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
660    
661    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
662    
663            * R/distmeasure.R (dissimilarity): Replaced dists call from
664            package cba by new dist call from package proxy.
665    
666    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
667    
668            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
669    
670    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * R/termdocmatrix.R: require() uses the quietly option to suppress
673            loading messages.
674    
675    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * R/dictionary.R: Added dictionary support.
678    
679    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
680    
681            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
682            documents. This simplifies some functions, e.g., asPlain.
683    
684    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * inst/doc/tm.Rnw: Fixed some typos in vignette.
687    
688    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
689    
690            * R/textdoccol.R (replaceWords): Added method to replace a set of
691            words by a single word. Useful for synonyms.
692    
693    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
694    
695            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
696    
697    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
698    
699            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
700            vectors. Thanks to Ariel Maguyon for his error report.
701            (removeSparseTerms): New function to remove columns from a
702            term-document matrix exceeding a sparse factor.
703    
704    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
705    
706            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
707    
708    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
709    
710            * man/sFilter.Rd: Corrected documentation on statement format (use
711            '==' instead of '=').
712    
713    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
714    
715            * R/aobjects.R (StructuredTextDocument): Inherits from
716            TextDocument.
717    
718    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
719    
720            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
721            on sparse matrices as proposed by Martin Maechler.
722    
723    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
724    
725            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
726            \pkg{filehash} version makes them deprecated.
727    
728    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
729    
730            * R/termdocmatrix.R (textvector): Stemming is now performed before
731            erasing stopwords.
732            (weightMatrix): Adapted to handle sparse matrices.
733            (TermDocMatrix): Sparse matrix is now efficiently built by
734            direct stepwise insertion of row values into it.
735    
736    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
737    
738            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
739            due to ongoing problems. For our purposes the latter is as useful
740            as the replaced package.
741    
742    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
743    
744            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
745    
746            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
747    
748    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
749    
750            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
751            languages with available stopwords.
752    
753    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
754    
755            * inst/doc/tm.Rnw: Minor corrections in the vignette.
756    
757    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
758    
759            * DESCRIPTION: Update to version 0.2, since a lot of new features
760            have been integrated.
761    
762            * inst/stopwords: Updated existing stopwords and added stopwords
763            for various other languages.
764    
765    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
766    
767            * man/: Updated documentation.
768    
769            * Work/testDb.R: Script to test database stuff.
770    
771            * R/: Fixed various database related bugs. Seems to be rather
772            useable now, i.e., consider as alpha status for now.
773    
774    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
775    
776            * R/: Fixed some bugs related to database support.
777    
778    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
779    
780            * man/: Added a lot of examples to the manuals.
781    
782    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
783    
784            * man/: Updated parts of the documentation.
785    
786            * R/textdoccol.R (asPlain): Added conversion from newsgroup
787            documents to plain text documents.
788    
789    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
790    
791            * R/textdoccol.R: Finished experimental database support. Not yet
792            intensively tested.
793    
794            * R/source.R: Now each source has a default reader.
795    
796            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
797            class anymore.
798    
799            * R/plaintextdoc.R: Custom show method for plain text documents.
800    
801            * R/aobjects.R: Added a class for structured text documents.
802    
803            * R/reader.R: Replaced remaining \code{parser} occurrences with
804            \code{reader}.
805    
806            * R/textdoccol.R (summary): Indent tags.
807    
808            * R/textdoccol.R (removePunctuation): Transform method to remove
809            punctuation marks.
810    
811    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
812    
813            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
814            using prescindMeta().
815    
816    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
817    
818            * R/textdoccol.R: Improved database support.
819    
820    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
821    
822            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
823    
824            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
825            language code.
826    
827            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
828            into parserControl argument.
829    
830            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
831    
832    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
833    
834            * Work/tmDataSetup.R: The datasets acq and crude can now be
835            created on the fly.
836    
837            * R/stopwords.R: Introduced a function returning the stopwords for
838            a given language (English, German and French at the moment)
839    
840            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
841            otherwise falls back to Snowball package.
842    
843    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
844    
845            * man/dissimilarity-methods.Rd: Make clear that any method offered
846            by "dists" from package "cba" can be used.
847    
848    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
849    
850            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
851            to Kurt's latex suggestion. Removed points and underscores in
852            variable names for consistent naming.
853    
854            * DESCRIPTION: Update to version 0.1-2.
855    
856            * man/TextRepository.Rd: Fixed bug in documentation.
857    
858    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
859    
860            * DESCRIPTION: Update to version 0.1-1.
861    
862    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
863    
864            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
865            wordStem.
866    
867    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * R/: Changes due to Kurt's review.
870    
871    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
872    
873            * R/: Implemented improvements based upon comments by David
874            Meyer.
875    
876    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
877    
878            * inst/doc/: Rewrote vignette.
879    
880            * man/: Improved documentation.
881    
882    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
883    
884            * man/: Updated documentation.
885    
886            * DESCRIPTION: Changed package name to "tm". Updated version to
887            0.1 for first CRAN release.
888    
889            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
890            list archive example.
891    
892            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
893            archive example.
894    
895            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
896            from (several mails per box) mbox format to (single mail per file)
897            eml format.
898    
899    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
900    
901            * data/crude.rda: Rebuilt.
902    
903            * data/acq.rda: Rebuilt.
904    
905            * R/reader.R: Factored out reader and parser methods from
906            textdoccol.R.
907    
908            * R/source.R: Factored out Source methods from aobjects.R and
909            textdoccol.R.
910            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
911            feeds.
912    
913            * R/textdoccol.R (DirSource): Added support for recursive
914            traversal of directories.
915    
916    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
917    
918            * R/textdoccol.R ([[): Loads the document corpus automatically
919            into memory upon access.
920            (tm_transform, tm_filter): Removed several checks whether the
921            document is already loaded ([[ ensures this now).
922            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
923            mailing list archive.
924    
925    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
926    
927            * R/aobjects.R (TextDocument): Is now a virtual class.
928            (Source): Is now a virtual class.
929    
930    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
931    
932            * R/textdoccol.R (c): Support for an arbitrary number of document
933            collections.
934    
935    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
936    
937            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
938            append_meta and remove_meta.
939    
940            * R/textdoccol.R: Removed modify_metadata method.
941    
942            * R/textrepo.R: Removed modify_metadata method.
943    
944            * R/textdoccol.R (remove_meta): Supports removal of document
945            collection metadata and document (= in data frame) metadata.
946    
947    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
948    
949            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
950    
951            * data/crude.rda: Rebuilt.
952    
953            * data/acq.rda: Rebuilt.
954    
955            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
956    
957            * R/textdoccol.R ([): Bug fix for subsetting a document
958            collection's data frame.
959    
960    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
961    
962            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
963            to s_filter.
964    
965            * R/textdoccol.R: Local text documents' metadata can now be copied
966            to a document collection's data frame with prescind_meta.
967    
968    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
969    
970            * R/: Text documents' slot metadata is now accessible in s_filter.
971    
972            * R/: Rewrote s_filter function (has still some restrictions).
973    
974    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
975    
976            * R/: Various fixes in handling metadata.
977    
978            * R/: Added update mechanism for text document collections.
979    
980    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
981    
982            * R/: Merging of document collections now creates a binary tree
983            for reconstructing merged document collections.
984    
985            * R/: Redesign of metadata for document collections.
986    
987    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
988    
989            * R/: Messages now use \code{ngettext}.
990    
991    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
992    
993            * R/: Added functions for modifying and removing metadata.
994    
995    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
996    
997            * man/: Updated some documentation.
998    
999            * R/: Corrected some connection issues.
1000    
1001            * inst/doc: Worked on the vignette.
1002    
1003    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1004    
1005            * inst/: Added texts and started vignette.
1006    
1007            * R/: Final changes based upon David's comments.
1008    
1009    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1010    
1011            * NAMESPACE: Corrected exports (generic methods need exportMethods
1012            directives!).
1013    
1014    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1015    
1016            * R/: Modified the TextDocCol constructur and various parsers. It
1017            is now modular and supports various file formats via plugins (see
1018            the new "Source" class).
1019    
1020    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1021    
1022            * man/: Revised documentation after previous code changes.
1023    
1024    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1025    
1026            * R/: Remaining changes as discussed with David.
1027    
1028    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1029    
1030            * R/: Some changes as suggested by David. The rest will follow
1031            within the next days.
1032    
1033    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1034    
1035            * man/: Finished documentation.
1036    
1037    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            * man/: Wrote some documentation.
1040    
1041    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1042    
1043            * R/: Further syntactic sugar in form of additional assignment and
1044            accessor methods.
1045    
1046    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1047    
1048            * R/: Syntactic sugar in form of "length", "show" and "summary"
1049            operators.
1050    
1051    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1052    
1053            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1054            and dissimilarities.
1055    
1056    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1057    
1058            * R/: Added similarity functions.
1059    
1060            * data/: Added english stopwords.
1061    
1062    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1063    
1064            * data/: Examples compiled for new features
1065    
1066            * R/: Changes due to new structure.
1067    
1068            * NAMESPACE: Corrected namespace to reflect new structure.
1069    
1070            * R/termdocmatrix.R: Adapted for new naming scheme.
1071    
1072    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1073    
1074            * R/textdoccol.R: Adapted code for new class structure. Wrote
1075            several transform and filter functions operating on text document
1076            collections (alias text document databases).
1077    
1078            * R/aobjects.R: Adapted class structure with inheritance,
1079            repositories and additional meta data. Loading files on demand is
1080            now possible.
1081    
1082    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1083    
1084            * R/: Some cosmetic cleanups.
1085    
1086            * inst/: Removed vignette on clustering. That and much more is now
1087            described in the JSS paper on text mining. Based upon that
1088            article an elaborated vignette will be incorporated in the future.
1089    
1090    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1091    
1092            * R/: Updated generic S4 methods to comply with signature changes
1093            in newer versions of R (> 2.3)
1094    
1095    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1096    
1097            * ext/R/importRIS.R: Automatic RIS import is now possible.
1098    
1099    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1100    
1101            * R/textdoccol.R: Added RIS HTML input format.
1102    
1103    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1104    
1105            * R/textdoccol.R: Removed bug that caused invalid text document
1106            collections when handling many input files.
1107    
1108    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1109    
1110            * R/textdoccol.R: Restructured and extended file import
1111            mechanism.
1112    
1113            * inst/doc/clustering.Rnw: Adapted vignette for use with
1114            ReutNews.rda
1115    
1116            * man/ReutNews.Rd: Documentation for ReutNews.rda
1117    
1118            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1119    
1120  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1121    
1122          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.1164

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge