SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC pkg/ChangeLog revision 1196, Tue Dec 4 12:49:27 2012 UTC
# Line 1  Line 1 
1    2012-12-04  Ingo Feinerer <feinerer@logic.at>
2    
3            * R/filter.R (sFilter): Avoid attach() and simplify.
4    
5    2012-11-02  Ingo Feinerer <feinerer@logic.at>
6    
7            * R/doc.R (.TextDocument): Use casts to ensure data types and to avoid
8            removal of attributes.
9    
10    2012-10-03 Ingo Feinerer  <feinerer@logic.at>
11    
12            * R/weight.R (weightTfIdf, weightSMART): Gracefully handle empty
13            columns and rows (avoids blow-up due to NaN values). Suggested by Jaap
14            Frölich.
15    
16    2012-07-27 Ingo Feinerer  <feinerer@logic.at>
17    
18            * R/transform.R (removeWords): Allow longer stopword lists.
19    
20    2012-01-31  Ingo Feinerer  <feinerer@logic.at>
21    
22            * R/reader.R (readXML): Readers can now set the document language
23            themselves.
24    
25    2012-01-14  Ingo Feinerer  <feinerer@logic.at>
26    
27            * R/source.R (XMLSource, getElem.XMLSource): Simplifications as
28            proposed by Milan Bouchet-Valat.
29    
30    2012-01-11  Ingo Feinerer  <feinerer@logic.at>
31    
32            * R/matrix.R (termFreq): Fix processing of user provided
33            stopwords. Reported by Bettina Grün.
34    
35    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/matrix.R (termFreq): Fix invalid handling of
38            control$wordLengths[1]. Reported by Steven C. Bagley.
39    
40    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
41    
42            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
43    
44    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
45    
46            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
47    
48    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
49    
50            * R/transform.R (removePunctuation): Add option to preserve
51            intra-word dashes.
52    
53    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
54    
55            * R/matrix.R (termFreq): Allow reordering of control option
56            processing.
57    
58    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
59    
60            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
61            pdfinfo tool.
62    
63            * inst/stopwords/SMART.dat: Add SMART information retrieval system
64            stopwords (which are also used by the MC toolkit).
65    
66            * R/matrix (termFreq): Allow local option \code{bounds$local} to
67            restrict how often a term may appear in each document (generalizes
68            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
69            for word length bounds (generalizes \code{minWordLength}).
70    
71            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
72            \code{bounds$global} for restricting how often a term is allowed
73            to appear in different documents.
74    
75            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
76            local options delegated internally to termFreq() and global
77            options which are processed by the term-document matrix
78            constructor itself.
79    
80    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
81    
82            * man/getTokenizers.Rd: Document getTokenizers().
83    
84            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
85    
86    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
87    
88            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
89    
90            * man/combine.Rd: Document c.term_frequency().
91    
92    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
93    
94            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
95            can be accessed via '[' and not '[['.
96    
97    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/stopwords.R (stopwords): Raise an error if no stopwords are
100            available for requested language. Suggested by Derek M Jones.
101    
102    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
105            normalization.
106    
107    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/transform.R (stemDocument.PlainTextDocument): Use language
110            argument.
111    
112    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/source.R: Store strings and connections instead of unevaluated
115            calls.
116    
117    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
120    
121    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
122    
123            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
124            (instead of a list element).
125    
126    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
127    
128            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
129            documents by names (fallback to IDs if names are not set).
130    
131    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
134            \code{recursive} now determines whether existing corpus meta data
135            is used.
136    
137    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
138    
139            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
140    
141    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
142    
143            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
144            remove terms not occurring in the corpus anymore.
145    
146    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
147    
148            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
149            and Heaps' law.
150    
151    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
152    
153            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
154            provided by a source.
155    
156    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
157    
158            * R/source.R (.Source): Provide document names.
159    
160    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
161    
162            * R/meta.R (`content_or_meta`): Utility function.
163    
164    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
165    
166            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
167            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
168    
169    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
170    
171            * R/weight.R (weightTfIdf): Added normalization option.
172    
173            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
174            analysis.
175    
176    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
177    
178            * R/score.R (tm_tag_score): Compute a score from the number of
179            tags matching in a document.
180    
181    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
182    
183            * R/complete.R (stemCompletion): New completion heuristics.
184    
185    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
186    
187            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
188    
189    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
190    
191            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
192            setOldClass(c(..., "list")) works.
193    
194    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
195    
196            * R/transform.R (stemDocument.character): In case input is a
197            simple character just delegate to the default Snowball stemmer.
198    
199    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
200    
201            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
202            data.
203    
204    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
205    
206            * R/doc.R (`Content<-`): Be careful with names attribute.
207    
208    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
209    
210            * R/source.R (DirSource): Improved implementation especially when
211            handling many (> 1M) files.
212    
213    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
214    
215            * R/source.R (getElem.URISource): Use encoding argument.
216    
217    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
218    
219            * R/doc.R (setOldClass): Register S3 document classes to be
220            recognized by S4 methods.
221    
222    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
223    
224            * R/matrix.R (termFreq): Add option to remove punctuation
225            characters.
226    
227    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
228    
229            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
230            merging multiple term-document matrices.
231    
232    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
233    
234            * R/corpus.R (setOldClass): Register S3 corpus classes to be
235            recognized by S4 methods.
236    
237            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
238            that CRAN Mac OS X builds do not fail any longer.
239    
240    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
241    
242            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
243            of RWeka:AlphabeticTokenizer() as default.
244    
245    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
246    
247            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
248            caused words at the beginning or the end of a line not to be removed. Do
249            not delete whitespace anymore.
250    
251    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
252    
253            * R/source.R (DirSource): Default to working directory if no path
254            is specified.
255    
256    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
257    
258            * R/source.R (DirSource): Stop on empty directories.
259    
260    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
261    
262            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
263            named documents.
264    
265    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
266    
267            * R/transform.R (removeWords): Improve regular expressions.
268    
269    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
270    
271            * R/meta.R (DublinCore): Allow lower case tags.
272    
273    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
274    
275            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
276            instead of x$children.
277    
278    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
279    
280            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
281    
282    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
283    
284            * R/: Use S3 instead of S4 class system.
285    
286    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
287    
288            * R/reader.R (readMail): Moved to tm.plugin.mail package.
289    
290    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
291    
292            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
293            postings are basically e-mails with some extra headers.
294    
295    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
296    
297            * R/transform.R: Move convertMboxEml, removeCitation,
298            removeMultipart, and removeSignature to the tm.plugin.mail package
299            since they are mainly utility functions (for handling e-mails) and
300            not very framework specific.
301    
302    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
303    
304            * man/: Fix documentation.
305    
306    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
307    
308            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
309            plain text document instead of an XML document for texts of the
310            Reuters-21578 dataset.
311    
312            * R/sparse.R: Removed since the slam package is now available on
313            CRAN.
314    
315            * DESCRIPTION (Depends): Add slam package.
316    
317    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
318    
319            * R/transform.R (stemDoc): Fix character(0) handling.
320    
321    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/doc.R (show): Pretty print.
324    
325    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
326    
327            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
328            gracefully.
329    
330    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
331    
332            * R/corpus.R: Make corpus virtual. Implement corpus with standard
333            and permanent storage semantics.
334    
335            * DESCRIPTION: New major release. A *lot* of improvements.
336    
337    2009-05-04   Ingo Feinerer <feinerer@logic.at>
338    
339            * NAMESPACE: Export some simple_triplet_matrix functions.
340    
341    2009-04-28   Ingo Feinerer <feinerer@logic.at>
342    
343            * R/weight.R: Adapt tf-idf to new matrix format.
344    
345    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
346    
347            * R/matrix.R: Create two distinct classes for term-document and
348            document-term matrices.
349    
350    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
351    
352            * R/termdocmatrix.R: No longer use Matrix package. This reduces
353            package start-up time significantly.
354    
355    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
356    
357            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
358    
359    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
360    
361            * R/transform.R (tmReduce): Combine multiple maps into one
362            transformation.
363    
364    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
365    
366            * R/weight.R: Remove weightLogical since it does not return a
367            dgCMatrix.
368    
369            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
370            or TermDocumentMatrix instead.
371    
372    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
373    
374            * inst/doc/extensions.Rnw: Finished vignette.
375    
376    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
377    
378            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
379            DocumentTermMatrix representations.
380    
381    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
382    
383            * R/reader.R (readXML): New reader for arbitrary XML files.
384    
385    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
386    
387            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
388            (XMLSource): New XMLSource class for arbitrary XML files.
389            (Source): New slot Vectorized.
390    
391    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
392    
393            * R/reader.R (readTabular): Experimental reader for tabular data
394            structures which can be customized via user-defined mappings.
395    
396            * R/reader.R: Always use UTC time zone.
397    
398            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
399    
400    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
401    
402            * R/reader.R (readDOC): Options can be passed over to antiword.
403    
404            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
405            pdftotext.
406    
407    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
408    
409            * R/source.R (DirSource): Add pattern and ignore.case arguments
410            which are internally passed over to list.files().
411    
412    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
413    
414            * inst/doc/tm.Rnw: Suppress pointless loading message.
415    
416    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
417    
418            * DESCRIPTION: Speed up package loading (via moving packages not
419            strictly necessary for normal operation to Suggests instead of
420            Depends).
421    
422    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
423    
424            * R/reader.R (readNewsgroup): The date format is now configurable.
425    
426    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
427    
428            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
429    
430    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
431    
432            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
433    
434    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
435    
436            * R/source.R (DataframeSource): New source class for data frames.
437    
438            * R/source.R: Fixed non-standard call evaluation.
439    
440    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
441    
442            * R/source.R (URISource): New source class for a single document.
443    
444    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
445    
446            * R/source.R: Refactoring.
447    
448    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
449    
450            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
451            Rmpi installations more gracefully.
452    
453    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
454    
455            * R/source.R (Source): Add Length slot.
456    
457    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
458    
459            * R/AAA.R: Unify duplicated .onLoad function.
460    
461    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
462    
463            * DESCRIPTION (Suggests): Added Rmpi.
464    
465    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
466    
467            * R/source.R (getElem): Fix 'no visible binding' warning.
468    
469            * man/WeightFunction.Rd: Fix signature.
470    
471    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
472    
473            * R/weight.R: Introduce name abbreviations for weighting functions.
474    
475    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
476    
477            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
478    
479            * R/cluster.R: Provide convenience functions for using a MPI
480            cluster.
481    
482            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
483            available.
484    
485            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
486            available.
487    
488    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
489    
490            * R/textdoccol.R (lapply): Removed debug print out.
491    
492    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * R/reader.R (readRCV1): Improved meta data extraction from
495            Reuters Corpus Volume 1 documents.
496    
497    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
498    
499            * R/transform.R: Ensure that all mappings preserve multiline
500            structures.
501    
502    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
503    
504            * R/filter.R: Every filter has now an attribute indicating whether
505            it sould be applied to document level (doclevel).
506    
507            * R/textdoccol.R (tmFilter): Set searchFullText as new default
508            filter.
509    
510    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
511    
512            * R/transform.R (replacePatterns): Replaced removeWords by
513            replacePatterns. Suggested by Christian Buchta.
514    
515            * R/textdoccol.R (inspect): Improved formatting.
516    
517    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * inst/CITATION: Updated JSS article information.
520    
521            * R/textdoccol.R (setAs): Added coerce method from list to
522            corpus.
523    
524            * R/meta.R (meta): Improved meta data handling.
525    
526    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
529            Christian Buchta.
530    
531            * inst/CITATION: Added template to include JSS article reference.
532    
533    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
534    
535            * R/textdoccol.R (tmMap): Introduced lazy mapping.
536    
537            * R/source.R: Added VectorSource.
538    
539    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
540    
541            * man/: Language codes should be in ISO 639-1 format.
542    
543            * R/textdoccol.R (asPlain): Preserve local meta data.
544    
545    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
546    
547            * R/textdoccol.R (writeCorpus): Function for writing a corpus
548            containing plain text documents to disk.
549    
550    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
551    
552            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
553            always set correctly.
554    
555            * R/textdoccol.R: Set load = TRUE as default for load on demand
556            since in most cases this is the wanted behaviour.
557    
558    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
559    
560            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
561    
562            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
563    
564    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
565    
566            * R/meta.R (meta): New function for consistent access to meta data
567            of document collections, repositories, and texts.
568    
569    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
570    
571            * R/: Better support for encodings.
572    
573    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
576            selection when no reader argument is given.
577    
578    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
579    
580            * R/source.R (CSVSource): Now uses read.csv instead of scan
581            internally.
582    
583    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
584    
585            * R/reader.R (getReaders): Returns available reader functions.
586    
587            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
588            as default.
589    
590    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
591    
592            * R/stopwords.R (stopwords): Shortened code, removed codetools
593            variable warnings.
594    
595            * man/: Documentation for showMeta, added an example for tmMap.
596    
597            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
598            some minor typos fixed.
599    
600    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
601    
602            * R/aobjects.R (showMeta): Added method for pretty printing a
603            text document's meta data.
604    
605    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
606    
607            * R/textdoccol.R (TextDocCol): Better handling of empty
608            arguments.
609    
610            * NAMESPACE: Exported readDOC.
611    
612            * man/completeStems.Rd: Added an example.
613    
614    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
615    
616            * R/stopwords.R (stopwords): Look up .dat files at every
617            call. Allows users to modify stopword .dat files interactively.
618    
619    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
620    
621            * R/termdocmatrix.R (termFreq): Correct processing of empty
622            documents.
623    
624    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
625    
626            * man/: Updated documentation.
627    
628    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
629    
630            * R/complete.R (completeStems): Completes (heuristically) word
631            stems.
632    
633            * R/termdocmatrix.R (TermDocMatrix2): New modular
634            constructor.
635    
636            * NAMESPACE: Exported termFreq.
637    
638    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
639    
640            * R/reader.R (readDOC): Added MS Word reader (using antiword).
641    
642    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
643    
644            * R/weight.R: Weighting functions for TermDocMatrix.
645    
646    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
647    
648            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
649            functions for accessing dimension, column, and row names.
650    
651            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
652    
653    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
654    
655            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
656    
657    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
658    
659            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
660    
661    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
662    
663            * R/reader.R (readPDF): Removed manual checks for pdftotext and
664            pdfinfo. The system call gives a warning anyway.
665    
666    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
667    
668            * R/textdoccol.R (asPlain): Conversion from
669            StructuredTextDocuments to PlainTextDocuments.
670    
671    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
672    
673            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
674            for accessing term-document matrices.
675    
676            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
677            are installed.
678    
679    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
680    
681            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
682            Christian Buchta.
683    
684    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
687    
688    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
689    
690            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
691    
692            * R/reader.R (readPDF): Added PDF reader.
693    
694    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
695    
696            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
697    
698            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
699    
700            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
701    
702            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
703    
704    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
705    
706            * R/distmeasure.R (dissimilarity): Replaced dists call from
707            package cba by new dist call from package proxy.
708    
709    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
710    
711            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
712    
713    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
714    
715            * R/termdocmatrix.R: require() uses the quietly option to suppress
716            loading messages.
717    
718    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
719    
720            * R/dictionary.R: Added dictionary support.
721    
722    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
723    
724            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
725            documents. This simplifies some functions, e.g., asPlain.
726    
727    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
728    
729            * inst/doc/tm.Rnw: Fixed some typos in vignette.
730    
731    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
732    
733            * R/textdoccol.R (replaceWords): Added method to replace a set of
734            words by a single word. Useful for synonyms.
735    
736    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
737    
738            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
739    
740    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
741    
742            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
743            vectors. Thanks to Ariel Maguyon for his error report.
744            (removeSparseTerms): New function to remove columns from a
745            term-document matrix exceeding a sparse factor.
746    
747    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
748    
749            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
750    
751    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
752    
753            * man/sFilter.Rd: Corrected documentation on statement format (use
754            '==' instead of '=').
755    
756    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
757    
758            * R/aobjects.R (StructuredTextDocument): Inherits from
759            TextDocument.
760    
761    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
764            on sparse matrices as proposed by Martin Maechler.
765    
766    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
767    
768            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
769            \pkg{filehash} version makes them deprecated.
770    
771    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
772    
773            * R/termdocmatrix.R (textvector): Stemming is now performed before
774            erasing stopwords.
775            (weightMatrix): Adapted to handle sparse matrices.
776            (TermDocMatrix): Sparse matrix is now efficiently built by
777            direct stepwise insertion of row values into it.
778    
779    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
780    
781            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
782            due to ongoing problems. For our purposes the latter is as useful
783            as the replaced package.
784    
785    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
786    
787            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
788    
789            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
790    
791    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
792    
793            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
794            languages with available stopwords.
795    
796    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
797    
798            * inst/doc/tm.Rnw: Minor corrections in the vignette.
799    
800    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
801    
802            * DESCRIPTION: Update to version 0.2, since a lot of new features
803            have been integrated.
804    
805            * inst/stopwords: Updated existing stopwords and added stopwords
806            for various other languages.
807    
808    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
809    
810            * man/: Updated documentation.
811    
812            * Work/testDb.R: Script to test database stuff.
813    
814            * R/: Fixed various database related bugs. Seems to be rather
815            useable now, i.e., consider as alpha status for now.
816    
817    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
818    
819            * R/: Fixed some bugs related to database support.
820    
821    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
822    
823            * man/: Added a lot of examples to the manuals.
824    
825    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
826    
827            * man/: Updated parts of the documentation.
828    
829            * R/textdoccol.R (asPlain): Added conversion from newsgroup
830            documents to plain text documents.
831    
832    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
833    
834            * R/textdoccol.R: Finished experimental database support. Not yet
835            intensively tested.
836    
837            * R/source.R: Now each source has a default reader.
838    
839            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
840            class anymore.
841    
842            * R/plaintextdoc.R: Custom show method for plain text documents.
843    
844            * R/aobjects.R: Added a class for structured text documents.
845    
846            * R/reader.R: Replaced remaining \code{parser} occurrences with
847            \code{reader}.
848    
849            * R/textdoccol.R (summary): Indent tags.
850    
851            * R/textdoccol.R (removePunctuation): Transform method to remove
852            punctuation marks.
853    
854    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
855    
856            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
857            using prescindMeta().
858    
859    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
860    
861            * R/textdoccol.R: Improved database support.
862    
863    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
864    
865            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
866    
867            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
868            language code.
869    
870            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
871            into parserControl argument.
872    
873            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
874    
875    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
876    
877            * Work/tmDataSetup.R: The datasets acq and crude can now be
878            created on the fly.
879    
880            * R/stopwords.R: Introduced a function returning the stopwords for
881            a given language (English, German and French at the moment)
882    
883            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
884            otherwise falls back to Snowball package.
885    
886    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
887    
888            * man/dissimilarity-methods.Rd: Make clear that any method offered
889            by "dists" from package "cba" can be used.
890    
891    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
892    
893            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
894            to Kurt's latex suggestion. Removed points and underscores in
895            variable names for consistent naming.
896    
897            * DESCRIPTION: Update to version 0.1-2.
898    
899            * man/TextRepository.Rd: Fixed bug in documentation.
900    
901    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
902    
903            * DESCRIPTION: Update to version 0.1-1.
904    
905    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
906    
907            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
908            wordStem.
909    
910    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
911    
912            * R/: Changes due to Kurt's review.
913    
914    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
915    
916            * R/: Implemented improvements based upon comments by David
917            Meyer.
918    
919    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
920    
921            * inst/doc/: Rewrote vignette.
922    
923            * man/: Improved documentation.
924    
925    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
926    
927            * man/: Updated documentation.
928    
929            * DESCRIPTION: Changed package name to "tm". Updated version to
930            0.1 for first CRAN release.
931    
932            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
933            list archive example.
934    
935            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
936            archive example.
937    
938            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
939            from (several mails per box) mbox format to (single mail per file)
940            eml format.
941    
942    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
943    
944            * data/crude.rda: Rebuilt.
945    
946            * data/acq.rda: Rebuilt.
947    
948            * R/reader.R: Factored out reader and parser methods from
949            textdoccol.R.
950    
951            * R/source.R: Factored out Source methods from aobjects.R and
952            textdoccol.R.
953            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
954            feeds.
955    
956            * R/textdoccol.R (DirSource): Added support for recursive
957            traversal of directories.
958    
959    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
960    
961            * R/textdoccol.R ([[): Loads the document corpus automatically
962            into memory upon access.
963            (tm_transform, tm_filter): Removed several checks whether the
964            document is already loaded ([[ ensures this now).
965            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
966            mailing list archive.
967    
968    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
969    
970            * R/aobjects.R (TextDocument): Is now a virtual class.
971            (Source): Is now a virtual class.
972    
973    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
974    
975            * R/textdoccol.R (c): Support for an arbitrary number of document
976            collections.
977    
978    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
979    
980            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
981            append_meta and remove_meta.
982    
983            * R/textdoccol.R: Removed modify_metadata method.
984    
985            * R/textrepo.R: Removed modify_metadata method.
986    
987            * R/textdoccol.R (remove_meta): Supports removal of document
988            collection metadata and document (= in data frame) metadata.
989    
990    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
991    
992            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
993    
994            * data/crude.rda: Rebuilt.
995    
996            * data/acq.rda: Rebuilt.
997    
998            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
999    
1000            * R/textdoccol.R ([): Bug fix for subsetting a document
1001            collection's data frame.
1002    
1003    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1004    
1005            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
1006            to s_filter.
1007    
1008            * R/textdoccol.R: Local text documents' metadata can now be copied
1009            to a document collection's data frame with prescind_meta.
1010    
1011    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1012    
1013            * R/: Text documents' slot metadata is now accessible in s_filter.
1014    
1015            * R/: Rewrote s_filter function (has still some restrictions).
1016    
1017    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1018    
1019            * R/: Various fixes in handling metadata.
1020    
1021            * R/: Added update mechanism for text document collections.
1022    
1023    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1024    
1025            * R/: Merging of document collections now creates a binary tree
1026            for reconstructing merged document collections.
1027    
1028            * R/: Redesign of metadata for document collections.
1029    
1030    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1031    
1032            * R/: Messages now use \code{ngettext}.
1033    
1034    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1035    
1036            * R/: Added functions for modifying and removing metadata.
1037    
1038    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1039    
1040            * man/: Updated some documentation.
1041    
1042            * R/: Corrected some connection issues.
1043    
1044            * inst/doc: Worked on the vignette.
1045    
1046    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1047    
1048            * inst/: Added texts and started vignette.
1049    
1050            * R/: Final changes based upon David's comments.
1051    
1052    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1053    
1054            * NAMESPACE: Corrected exports (generic methods need exportMethods
1055            directives!).
1056    
1057    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1058    
1059            * R/: Modified the TextDocCol constructur and various parsers. It
1060            is now modular and supports various file formats via plugins (see
1061            the new "Source" class).
1062    
1063    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1064    
1065            * man/: Revised documentation after previous code changes.
1066    
1067    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1068    
1069            * R/: Remaining changes as discussed with David.
1070    
1071    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1072    
1073            * R/: Some changes as suggested by David. The rest will follow
1074            within the next days.
1075    
1076    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1077    
1078            * man/: Finished documentation.
1079    
1080    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1081    
1082            * man/: Wrote some documentation.
1083    
1084    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1085    
1086            * R/: Further syntactic sugar in form of additional assignment and
1087            accessor methods.
1088    
1089    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1090    
1091            * R/: Syntactic sugar in form of "length", "show" and "summary"
1092            operators.
1093    
1094    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1095    
1096            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1097            and dissimilarities.
1098    
1099    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1100    
1101            * R/: Added similarity functions.
1102    
1103            * data/: Added english stopwords.
1104    
1105    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1106    
1107            * data/: Examples compiled for new features
1108    
1109            * R/: Changes due to new structure.
1110    
1111            * NAMESPACE: Corrected namespace to reflect new structure.
1112    
1113            * R/termdocmatrix.R: Adapted for new naming scheme.
1114    
1115    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1116    
1117            * R/textdoccol.R: Adapted code for new class structure. Wrote
1118            several transform and filter functions operating on text document
1119            collections (alias text document databases).
1120    
1121            * R/aobjects.R: Adapted class structure with inheritance,
1122            repositories and additional meta data. Loading files on demand is
1123            now possible.
1124    
1125    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1126    
1127            * R/: Some cosmetic cleanups.
1128    
1129            * inst/: Removed vignette on clustering. That and much more is now
1130            described in the JSS paper on text mining. Based upon that
1131            article an elaborated vignette will be incorporated in the future.
1132    
1133    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1134    
1135            * R/: Updated generic S4 methods to comply with signature changes
1136            in newer versions of R (> 2.3)
1137    
1138    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1139    
1140            * ext/R/importRIS.R: Automatic RIS import is now possible.
1141    
1142    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1143    
1144            * R/textdoccol.R: Added RIS HTML input format.
1145    
1146    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1147    
1148            * R/textdoccol.R: Removed bug that caused invalid text document
1149            collections when handling many input files.
1150    
1151    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1152    
1153            * R/textdoccol.R: Restructured and extended file import
1154            mechanism.
1155    
1156            * inst/doc/clustering.Rnw: Adapted vignette for use with
1157            ReutNews.rda
1158    
1159            * man/ReutNews.Rd: Documentation for ReutNews.rda
1160    
1161            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1162    
1163  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1164    
1165          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.1196

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge