SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 37, Wed Jan 11 17:49:17 2006 UTC pkg/ChangeLog revision 1216, Thu Apr 11 12:05:53 2013 UTC
# Line 1  Line 1 
1    2013-04-11  Ingo Feinerer <feinerer@logic.at>
2    
3            * R/transform.R (removeWords): Use PCRE UCP to use Unicode properties
4            to determine character types.
5    
6    2012-12-14  Ingo Feinerer <feinerer@logic.at>
7    
8            * R/matrix.R (TermDocumentMatrix): Ensure dimnames of type character
9            when generating a simple_triplet_matrix. Reported by Arho Suominen.
10    
11    2012-12-10  Ingo Feinerer <feinerer@logic.at>
12    
13            * man/tm_reduce.Rd: Document right to left folding order. Adapt
14            example as well. Suggested by Mark Rosenstein.
15    
16    2012-12-04  Ingo Feinerer <feinerer@logic.at>
17    
18            * R/filter.R (sFilter): Avoid attach() and simplify.
19    
20    2012-11-02  Ingo Feinerer <feinerer@logic.at>
21    
22            * R/doc.R (.TextDocument): Use casts to ensure data types and to avoid
23            removal of attributes.
24    
25    2012-10-03 Ingo Feinerer  <feinerer@logic.at>
26    
27            * R/weight.R (weightTfIdf, weightSMART): Gracefully handle empty
28            columns and rows (avoids blow-up due to NaN values). Suggested by Jaap
29            Frölich.
30    
31    2012-07-27 Ingo Feinerer  <feinerer@logic.at>
32    
33            * R/transform.R (removeWords): Allow longer stopword lists.
34    
35    2012-01-31  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/reader.R (readXML): Readers can now set the document language
38            themselves.
39    
40    2012-01-14  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/source.R (XMLSource, getElem.XMLSource): Simplifications as
43            proposed by Milan Bouchet-Valat.
44    
45    2012-01-11  Ingo Feinerer  <feinerer@logic.at>
46    
47            * R/matrix.R (termFreq): Fix processing of user provided
48            stopwords. Reported by Bettina Grün.
49    
50    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
51    
52            * R/matrix.R (termFreq): Fix invalid handling of
53            control$wordLengths[1]. Reported by Steven C. Bagley.
54    
55    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
56    
57            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
58    
59    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
60    
61            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
62    
63    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/transform.R (removePunctuation): Add option to preserve
66            intra-word dashes.
67    
68    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
69    
70            * R/matrix.R (termFreq): Allow reordering of control option
71            processing.
72    
73    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
74    
75            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
76            pdfinfo tool.
77    
78            * inst/stopwords/SMART.dat: Add SMART information retrieval system
79            stopwords (which are also used by the MC toolkit).
80    
81            * R/matrix (termFreq): Allow local option \code{bounds$local} to
82            restrict how often a term may appear in each document (generalizes
83            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
84            for word length bounds (generalizes \code{minWordLength}).
85    
86            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
87            \code{bounds$global} for restricting how often a term is allowed
88            to appear in different documents.
89    
90            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
91            local options delegated internally to termFreq() and global
92            options which are processed by the term-document matrix
93            constructor itself.
94    
95    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
96    
97            * man/getTokenizers.Rd: Document getTokenizers().
98    
99            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
100    
101    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
102    
103            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
104    
105            * man/combine.Rd: Document c.term_frequency().
106    
107    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
110            can be accessed via '[' and not '[['.
111    
112    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/stopwords.R (stopwords): Raise an error if no stopwords are
115            available for requested language. Suggested by Derek M Jones.
116    
117    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
120            normalization.
121    
122    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/transform.R (stemDocument.PlainTextDocument): Use language
125            argument.
126    
127    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
128    
129            * R/source.R: Store strings and connections instead of unevaluated
130            calls.
131    
132    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
133    
134            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
135    
136    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
137    
138            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
139            (instead of a list element).
140    
141    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
142    
143            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
144            documents by names (fallback to IDs if names are not set).
145    
146    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
147    
148            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
149            \code{recursive} now determines whether existing corpus meta data
150            is used.
151    
152    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
153    
154            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
155    
156    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
157    
158            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
159            remove terms not occurring in the corpus anymore.
160    
161    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
162    
163            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
164            and Heaps' law.
165    
166    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
169            provided by a source.
170    
171    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
172    
173            * R/source.R (.Source): Provide document names.
174    
175    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/meta.R (`content_or_meta`): Utility function.
178    
179    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
180    
181            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
182            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
183    
184    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
185    
186            * R/weight.R (weightTfIdf): Added normalization option.
187    
188            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
189            analysis.
190    
191    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
192    
193            * R/score.R (tm_tag_score): Compute a score from the number of
194            tags matching in a document.
195    
196    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
197    
198            * R/complete.R (stemCompletion): New completion heuristics.
199    
200    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
201    
202            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
203    
204    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
205    
206            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
207            setOldClass(c(..., "list")) works.
208    
209    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
210    
211            * R/transform.R (stemDocument.character): In case input is a
212            simple character just delegate to the default Snowball stemmer.
213    
214    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
215    
216            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
217            data.
218    
219    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
220    
221            * R/doc.R (`Content<-`): Be careful with names attribute.
222    
223    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
224    
225            * R/source.R (DirSource): Improved implementation especially when
226            handling many (> 1M) files.
227    
228    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
229    
230            * R/source.R (getElem.URISource): Use encoding argument.
231    
232    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
233    
234            * R/doc.R (setOldClass): Register S3 document classes to be
235            recognized by S4 methods.
236    
237    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
238    
239            * R/matrix.R (termFreq): Add option to remove punctuation
240            characters.
241    
242    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
243    
244            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
245            merging multiple term-document matrices.
246    
247    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
248    
249            * R/corpus.R (setOldClass): Register S3 corpus classes to be
250            recognized by S4 methods.
251    
252            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
253            that CRAN Mac OS X builds do not fail any longer.
254    
255    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
256    
257            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
258            of RWeka:AlphabeticTokenizer() as default.
259    
260    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
261    
262            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
263            caused words at the beginning or the end of a line not to be removed. Do
264            not delete whitespace anymore.
265    
266    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
267    
268            * R/source.R (DirSource): Default to working directory if no path
269            is specified.
270    
271    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
272    
273            * R/source.R (DirSource): Stop on empty directories.
274    
275    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
276    
277            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
278            named documents.
279    
280    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
281    
282            * R/transform.R (removeWords): Improve regular expressions.
283    
284    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
285    
286            * R/meta.R (DublinCore): Allow lower case tags.
287    
288    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
289    
290            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
291            instead of x$children.
292    
293    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
294    
295            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
296    
297    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
298    
299            * R/: Use S3 instead of S4 class system.
300    
301    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
302    
303            * R/reader.R (readMail): Moved to tm.plugin.mail package.
304    
305    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
306    
307            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
308            postings are basically e-mails with some extra headers.
309    
310    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
311    
312            * R/transform.R: Move convertMboxEml, removeCitation,
313            removeMultipart, and removeSignature to the tm.plugin.mail package
314            since they are mainly utility functions (for handling e-mails) and
315            not very framework specific.
316    
317    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
318    
319            * man/: Fix documentation.
320    
321    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
324            plain text document instead of an XML document for texts of the
325            Reuters-21578 dataset.
326    
327            * R/sparse.R: Removed since the slam package is now available on
328            CRAN.
329    
330            * DESCRIPTION (Depends): Add slam package.
331    
332    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
333    
334            * R/transform.R (stemDoc): Fix character(0) handling.
335    
336    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
337    
338            * R/doc.R (show): Pretty print.
339    
340    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
341    
342            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
343            gracefully.
344    
345    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
346    
347            * R/corpus.R: Make corpus virtual. Implement corpus with standard
348            and permanent storage semantics.
349    
350            * DESCRIPTION: New major release. A *lot* of improvements.
351    
352    2009-05-04   Ingo Feinerer <feinerer@logic.at>
353    
354            * NAMESPACE: Export some simple_triplet_matrix functions.
355    
356    2009-04-28   Ingo Feinerer <feinerer@logic.at>
357    
358            * R/weight.R: Adapt tf-idf to new matrix format.
359    
360    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
361    
362            * R/matrix.R: Create two distinct classes for term-document and
363            document-term matrices.
364    
365    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
366    
367            * R/termdocmatrix.R: No longer use Matrix package. This reduces
368            package start-up time significantly.
369    
370    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
371    
372            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
373    
374    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
375    
376            * R/transform.R (tmReduce): Combine multiple maps into one
377            transformation.
378    
379    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
380    
381            * R/weight.R: Remove weightLogical since it does not return a
382            dgCMatrix.
383    
384            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
385            or TermDocumentMatrix instead.
386    
387    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
388    
389            * inst/doc/extensions.Rnw: Finished vignette.
390    
391    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
392    
393            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
394            DocumentTermMatrix representations.
395    
396    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
397    
398            * R/reader.R (readXML): New reader for arbitrary XML files.
399    
400    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
401    
402            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
403            (XMLSource): New XMLSource class for arbitrary XML files.
404            (Source): New slot Vectorized.
405    
406    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
407    
408            * R/reader.R (readTabular): Experimental reader for tabular data
409            structures which can be customized via user-defined mappings.
410    
411            * R/reader.R: Always use UTC time zone.
412    
413            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
414    
415    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
416    
417            * R/reader.R (readDOC): Options can be passed over to antiword.
418    
419            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
420            pdftotext.
421    
422    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
423    
424            * R/source.R (DirSource): Add pattern and ignore.case arguments
425            which are internally passed over to list.files().
426    
427    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
428    
429            * inst/doc/tm.Rnw: Suppress pointless loading message.
430    
431    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
432    
433            * DESCRIPTION: Speed up package loading (via moving packages not
434            strictly necessary for normal operation to Suggests instead of
435            Depends).
436    
437    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
438    
439            * R/reader.R (readNewsgroup): The date format is now configurable.
440    
441    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
442    
443            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
444    
445    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
446    
447            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
448    
449    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
450    
451            * R/source.R (DataframeSource): New source class for data frames.
452    
453            * R/source.R: Fixed non-standard call evaluation.
454    
455    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
456    
457            * R/source.R (URISource): New source class for a single document.
458    
459    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
460    
461            * R/source.R: Refactoring.
462    
463    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
464    
465            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
466            Rmpi installations more gracefully.
467    
468    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
469    
470            * R/source.R (Source): Add Length slot.
471    
472    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
473    
474            * R/AAA.R: Unify duplicated .onLoad function.
475    
476    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
477    
478            * DESCRIPTION (Suggests): Added Rmpi.
479    
480    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
481    
482            * R/source.R (getElem): Fix 'no visible binding' warning.
483    
484            * man/WeightFunction.Rd: Fix signature.
485    
486    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
487    
488            * R/weight.R: Introduce name abbreviations for weighting functions.
489    
490    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
491    
492            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
493    
494            * R/cluster.R: Provide convenience functions for using a MPI
495            cluster.
496    
497            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
498            available.
499    
500            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
501            available.
502    
503    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
504    
505            * R/textdoccol.R (lapply): Removed debug print out.
506    
507    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
508    
509            * R/reader.R (readRCV1): Improved meta data extraction from
510            Reuters Corpus Volume 1 documents.
511    
512    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
513    
514            * R/transform.R: Ensure that all mappings preserve multiline
515            structures.
516    
517    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * R/filter.R: Every filter has now an attribute indicating whether
520            it sould be applied to document level (doclevel).
521    
522            * R/textdoccol.R (tmFilter): Set searchFullText as new default
523            filter.
524    
525    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
526    
527            * R/transform.R (replacePatterns): Replaced removeWords by
528            replacePatterns. Suggested by Christian Buchta.
529    
530            * R/textdoccol.R (inspect): Improved formatting.
531    
532    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
533    
534            * inst/CITATION: Updated JSS article information.
535    
536            * R/textdoccol.R (setAs): Added coerce method from list to
537            corpus.
538    
539            * R/meta.R (meta): Improved meta data handling.
540    
541    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
542    
543            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
544            Christian Buchta.
545    
546            * inst/CITATION: Added template to include JSS article reference.
547    
548    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
549    
550            * R/textdoccol.R (tmMap): Introduced lazy mapping.
551    
552            * R/source.R: Added VectorSource.
553    
554    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * man/: Language codes should be in ISO 639-1 format.
557    
558            * R/textdoccol.R (asPlain): Preserve local meta data.
559    
560    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
561    
562            * R/textdoccol.R (writeCorpus): Function for writing a corpus
563            containing plain text documents to disk.
564    
565    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
566    
567            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
568            always set correctly.
569    
570            * R/textdoccol.R: Set load = TRUE as default for load on demand
571            since in most cases this is the wanted behaviour.
572    
573    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
576    
577            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
578    
579    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
580    
581            * R/meta.R (meta): New function for consistent access to meta data
582            of document collections, repositories, and texts.
583    
584    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
585    
586            * R/: Better support for encodings.
587    
588    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
589    
590            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
591            selection when no reader argument is given.
592    
593    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
594    
595            * R/source.R (CSVSource): Now uses read.csv instead of scan
596            internally.
597    
598    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
599    
600            * R/reader.R (getReaders): Returns available reader functions.
601    
602            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
603            as default.
604    
605    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
606    
607            * R/stopwords.R (stopwords): Shortened code, removed codetools
608            variable warnings.
609    
610            * man/: Documentation for showMeta, added an example for tmMap.
611    
612            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
613            some minor typos fixed.
614    
615    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
616    
617            * R/aobjects.R (showMeta): Added method for pretty printing a
618            text document's meta data.
619    
620    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
621    
622            * R/textdoccol.R (TextDocCol): Better handling of empty
623            arguments.
624    
625            * NAMESPACE: Exported readDOC.
626    
627            * man/completeStems.Rd: Added an example.
628    
629    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
630    
631            * R/stopwords.R (stopwords): Look up .dat files at every
632            call. Allows users to modify stopword .dat files interactively.
633    
634    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
635    
636            * R/termdocmatrix.R (termFreq): Correct processing of empty
637            documents.
638    
639    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
640    
641            * man/: Updated documentation.
642    
643    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
644    
645            * R/complete.R (completeStems): Completes (heuristically) word
646            stems.
647    
648            * R/termdocmatrix.R (TermDocMatrix2): New modular
649            constructor.
650    
651            * NAMESPACE: Exported termFreq.
652    
653    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
654    
655            * R/reader.R (readDOC): Added MS Word reader (using antiword).
656    
657    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
658    
659            * R/weight.R: Weighting functions for TermDocMatrix.
660    
661    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
662    
663            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
664            functions for accessing dimension, column, and row names.
665    
666            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
667    
668    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
669    
670            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
671    
672    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
673    
674            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
675    
676    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
677    
678            * R/reader.R (readPDF): Removed manual checks for pdftotext and
679            pdfinfo. The system call gives a warning anyway.
680    
681    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
682    
683            * R/textdoccol.R (asPlain): Conversion from
684            StructuredTextDocuments to PlainTextDocuments.
685    
686    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
687    
688            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
689            for accessing term-document matrices.
690    
691            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
692            are installed.
693    
694    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
695    
696            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
697            Christian Buchta.
698    
699    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
700    
701            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
702    
703    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
704    
705            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
706    
707            * R/reader.R (readPDF): Added PDF reader.
708    
709    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
710    
711            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
712    
713            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
714    
715            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
716    
717            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
718    
719    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
720    
721            * R/distmeasure.R (dissimilarity): Replaced dists call from
722            package cba by new dist call from package proxy.
723    
724    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
725    
726            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
727    
728    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
729    
730            * R/termdocmatrix.R: require() uses the quietly option to suppress
731            loading messages.
732    
733    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
734    
735            * R/dictionary.R: Added dictionary support.
736    
737    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
738    
739            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
740            documents. This simplifies some functions, e.g., asPlain.
741    
742    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
743    
744            * inst/doc/tm.Rnw: Fixed some typos in vignette.
745    
746    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
747    
748            * R/textdoccol.R (replaceWords): Added method to replace a set of
749            words by a single word. Useful for synonyms.
750    
751    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
752    
753            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
754    
755    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
756    
757            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
758            vectors. Thanks to Ariel Maguyon for his error report.
759            (removeSparseTerms): New function to remove columns from a
760            term-document matrix exceeding a sparse factor.
761    
762    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
763    
764            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
765    
766    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
767    
768            * man/sFilter.Rd: Corrected documentation on statement format (use
769            '==' instead of '=').
770    
771    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
772    
773            * R/aobjects.R (StructuredTextDocument): Inherits from
774            TextDocument.
775    
776    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
777    
778            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
779            on sparse matrices as proposed by Martin Maechler.
780    
781    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
782    
783            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
784            \pkg{filehash} version makes them deprecated.
785    
786    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
787    
788            * R/termdocmatrix.R (textvector): Stemming is now performed before
789            erasing stopwords.
790            (weightMatrix): Adapted to handle sparse matrices.
791            (TermDocMatrix): Sparse matrix is now efficiently built by
792            direct stepwise insertion of row values into it.
793    
794    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
795    
796            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
797            due to ongoing problems. For our purposes the latter is as useful
798            as the replaced package.
799    
800    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
801    
802            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
803    
804            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
805    
806    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
807    
808            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
809            languages with available stopwords.
810    
811    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
812    
813            * inst/doc/tm.Rnw: Minor corrections in the vignette.
814    
815    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
816    
817            * DESCRIPTION: Update to version 0.2, since a lot of new features
818            have been integrated.
819    
820            * inst/stopwords: Updated existing stopwords and added stopwords
821            for various other languages.
822    
823    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
824    
825            * man/: Updated documentation.
826    
827            * Work/testDb.R: Script to test database stuff.
828    
829            * R/: Fixed various database related bugs. Seems to be rather
830            useable now, i.e., consider as alpha status for now.
831    
832    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
833    
834            * R/: Fixed some bugs related to database support.
835    
836    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
837    
838            * man/: Added a lot of examples to the manuals.
839    
840    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
841    
842            * man/: Updated parts of the documentation.
843    
844            * R/textdoccol.R (asPlain): Added conversion from newsgroup
845            documents to plain text documents.
846    
847    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
848    
849            * R/textdoccol.R: Finished experimental database support. Not yet
850            intensively tested.
851    
852            * R/source.R: Now each source has a default reader.
853    
854            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
855            class anymore.
856    
857            * R/plaintextdoc.R: Custom show method for plain text documents.
858    
859            * R/aobjects.R: Added a class for structured text documents.
860    
861            * R/reader.R: Replaced remaining \code{parser} occurrences with
862            \code{reader}.
863    
864            * R/textdoccol.R (summary): Indent tags.
865    
866            * R/textdoccol.R (removePunctuation): Transform method to remove
867            punctuation marks.
868    
869    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
870    
871            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
872            using prescindMeta().
873    
874    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
875    
876            * R/textdoccol.R: Improved database support.
877    
878    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
879    
880            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
881    
882            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
883            language code.
884    
885            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
886            into parserControl argument.
887    
888            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
889    
890    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
891    
892            * Work/tmDataSetup.R: The datasets acq and crude can now be
893            created on the fly.
894    
895            * R/stopwords.R: Introduced a function returning the stopwords for
896            a given language (English, German and French at the moment)
897    
898            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
899            otherwise falls back to Snowball package.
900    
901    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
902    
903            * man/dissimilarity-methods.Rd: Make clear that any method offered
904            by "dists" from package "cba" can be used.
905    
906    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
907    
908            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
909            to Kurt's latex suggestion. Removed points and underscores in
910            variable names for consistent naming.
911    
912            * DESCRIPTION: Update to version 0.1-2.
913    
914            * man/TextRepository.Rd: Fixed bug in documentation.
915    
916    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
917    
918            * DESCRIPTION: Update to version 0.1-1.
919    
920    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
921    
922            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
923            wordStem.
924    
925    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
926    
927            * R/: Changes due to Kurt's review.
928    
929    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
930    
931            * R/: Implemented improvements based upon comments by David
932            Meyer.
933    
934    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
935    
936            * inst/doc/: Rewrote vignette.
937    
938            * man/: Improved documentation.
939    
940    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
941    
942            * man/: Updated documentation.
943    
944            * DESCRIPTION: Changed package name to "tm". Updated version to
945            0.1 for first CRAN release.
946    
947            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
948            list archive example.
949    
950            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
951            archive example.
952    
953            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
954            from (several mails per box) mbox format to (single mail per file)
955            eml format.
956    
957    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
958    
959            * data/crude.rda: Rebuilt.
960    
961            * data/acq.rda: Rebuilt.
962    
963            * R/reader.R: Factored out reader and parser methods from
964            textdoccol.R.
965    
966            * R/source.R: Factored out Source methods from aobjects.R and
967            textdoccol.R.
968            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
969            feeds.
970    
971            * R/textdoccol.R (DirSource): Added support for recursive
972            traversal of directories.
973    
974    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
975    
976            * R/textdoccol.R ([[): Loads the document corpus automatically
977            into memory upon access.
978            (tm_transform, tm_filter): Removed several checks whether the
979            document is already loaded ([[ ensures this now).
980            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
981            mailing list archive.
982    
983    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * R/aobjects.R (TextDocument): Is now a virtual class.
986            (Source): Is now a virtual class.
987    
988    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
989    
990            * R/textdoccol.R (c): Support for an arbitrary number of document
991            collections.
992    
993    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
994    
995            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
996            append_meta and remove_meta.
997    
998            * R/textdoccol.R: Removed modify_metadata method.
999    
1000            * R/textrepo.R: Removed modify_metadata method.
1001    
1002            * R/textdoccol.R (remove_meta): Supports removal of document
1003            collection metadata and document (= in data frame) metadata.
1004    
1005    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1006    
1007            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
1008    
1009            * data/crude.rda: Rebuilt.
1010    
1011            * data/acq.rda: Rebuilt.
1012    
1013            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
1014    
1015            * R/textdoccol.R ([): Bug fix for subsetting a document
1016            collection's data frame.
1017    
1018    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1019    
1020            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
1021            to s_filter.
1022    
1023            * R/textdoccol.R: Local text documents' metadata can now be copied
1024            to a document collection's data frame with prescind_meta.
1025    
1026    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1027    
1028            * R/: Text documents' slot metadata is now accessible in s_filter.
1029    
1030            * R/: Rewrote s_filter function (has still some restrictions).
1031    
1032    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1033    
1034            * R/: Various fixes in handling metadata.
1035    
1036            * R/: Added update mechanism for text document collections.
1037    
1038    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1039    
1040            * R/: Merging of document collections now creates a binary tree
1041            for reconstructing merged document collections.
1042    
1043            * R/: Redesign of metadata for document collections.
1044    
1045    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1046    
1047            * R/: Messages now use \code{ngettext}.
1048    
1049    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1050    
1051            * R/: Added functions for modifying and removing metadata.
1052    
1053    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1054    
1055            * man/: Updated some documentation.
1056    
1057            * R/: Corrected some connection issues.
1058    
1059            * inst/doc: Worked on the vignette.
1060    
1061    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1062    
1063            * inst/: Added texts and started vignette.
1064    
1065            * R/: Final changes based upon David's comments.
1066    
1067    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1068    
1069            * NAMESPACE: Corrected exports (generic methods need exportMethods
1070            directives!).
1071    
1072    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1073    
1074            * R/: Modified the TextDocCol constructur and various parsers. It
1075            is now modular and supports various file formats via plugins (see
1076            the new "Source" class).
1077    
1078    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1079    
1080            * man/: Revised documentation after previous code changes.
1081    
1082    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1083    
1084            * R/: Remaining changes as discussed with David.
1085    
1086    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1087    
1088            * R/: Some changes as suggested by David. The rest will follow
1089            within the next days.
1090    
1091    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1092    
1093            * man/: Finished documentation.
1094    
1095    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1096    
1097            * man/: Wrote some documentation.
1098    
1099    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1100    
1101            * R/: Further syntactic sugar in form of additional assignment and
1102            accessor methods.
1103    
1104    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1105    
1106            * R/: Syntactic sugar in form of "length", "show" and "summary"
1107            operators.
1108    
1109    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1110    
1111            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1112            and dissimilarities.
1113    
1114    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1115    
1116            * R/: Added similarity functions.
1117    
1118            * data/: Added english stopwords.
1119    
1120    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1121    
1122            * data/: Examples compiled for new features
1123    
1124            * R/: Changes due to new structure.
1125    
1126            * NAMESPACE: Corrected namespace to reflect new structure.
1127    
1128            * R/termdocmatrix.R: Adapted for new naming scheme.
1129    
1130    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1131    
1132            * R/textdoccol.R: Adapted code for new class structure. Wrote
1133            several transform and filter functions operating on text document
1134            collections (alias text document databases).
1135    
1136            * R/aobjects.R: Adapted class structure with inheritance,
1137            repositories and additional meta data. Loading files on demand is
1138            now possible.
1139    
1140    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1141    
1142            * R/: Some cosmetic cleanups.
1143    
1144            * inst/: Removed vignette on clustering. That and much more is now
1145            described in the JSS paper on text mining. Based upon that
1146            article an elaborated vignette will be incorporated in the future.
1147    
1148    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1149    
1150            * R/: Updated generic S4 methods to comply with signature changes
1151            in newer versions of R (> 2.3)
1152    
1153    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1154    
1155            * ext/R/importRIS.R: Automatic RIS import is now possible.
1156    
1157    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1158    
1159            * R/textdoccol.R: Added RIS HTML input format.
1160    
1161    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1162    
1163            * R/textdoccol.R: Removed bug that caused invalid text document
1164            collections when handling many input files.
1165    
1166  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1167    
1168          * R/textdoccol.R: Restructured and extended file import          * R/textdoccol.R: Restructured and extended file import

Legend:
Removed from v.37  
changed lines
  Added in v.1216

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge