SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC pkg/ChangeLog revision 1194, Fri Nov 2 15:15:03 2012 UTC
# Line 1  Line 1 
1    2012-11-02  Ingo Feinerer <feinerer@logic.at>
2    
3            * R/doc.R (.TextDocument): Use casts to ensure data types and to avoid
4            removal of attributes.
5    
6    2012-10-03 Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/weight.R (weightTfIdf, weightSMART): Gracefully handle empty
9            columns and rows (avoids blow-up due to NaN values). Suggested by Jaap
10            Frölich.
11    
12    2012-07-27 Ingo Feinerer  <feinerer@logic.at>
13    
14            * R/transform.R (removeWords): Allow longer stopword lists.
15    
16    2012-01-31  Ingo Feinerer  <feinerer@logic.at>
17    
18            * R/reader.R (readXML): Readers can now set the document language
19            themselves.
20    
21    2012-01-14  Ingo Feinerer  <feinerer@logic.at>
22    
23            * R/source.R (XMLSource, getElem.XMLSource): Simplifications as
24            proposed by Milan Bouchet-Valat.
25    
26    2012-01-11  Ingo Feinerer  <feinerer@logic.at>
27    
28            * R/matrix.R (termFreq): Fix processing of user provided
29            stopwords. Reported by Bettina Grün.
30    
31    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
32    
33            * R/matrix.R (termFreq): Fix invalid handling of
34            control$wordLengths[1]. Reported by Steven C. Bagley.
35    
36    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
37    
38            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
39    
40    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
43    
44    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
45    
46            * R/transform.R (removePunctuation): Add option to preserve
47            intra-word dashes.
48    
49    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
50    
51            * R/matrix.R (termFreq): Allow reordering of control option
52            processing.
53    
54    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
55    
56            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
57            pdfinfo tool.
58    
59            * inst/stopwords/SMART.dat: Add SMART information retrieval system
60            stopwords (which are also used by the MC toolkit).
61    
62            * R/matrix (termFreq): Allow local option \code{bounds$local} to
63            restrict how often a term may appear in each document (generalizes
64            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
65            for word length bounds (generalizes \code{minWordLength}).
66    
67            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
68            \code{bounds$global} for restricting how often a term is allowed
69            to appear in different documents.
70    
71            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
72            local options delegated internally to termFreq() and global
73            options which are processed by the term-document matrix
74            constructor itself.
75    
76    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
77    
78            * man/getTokenizers.Rd: Document getTokenizers().
79    
80            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
81    
82    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
83    
84            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
85    
86            * man/combine.Rd: Document c.term_frequency().
87    
88    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
91            can be accessed via '[' and not '[['.
92    
93    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
94    
95            * R/stopwords.R (stopwords): Raise an error if no stopwords are
96            available for requested language. Suggested by Derek M Jones.
97    
98    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
99    
100            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
101            normalization.
102    
103    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
104    
105            * R/transform.R (stemDocument.PlainTextDocument): Use language
106            argument.
107    
108    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
109    
110            * R/source.R: Store strings and connections instead of unevaluated
111            calls.
112    
113    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
114    
115            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
116    
117    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
120            (instead of a list element).
121    
122    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
125            documents by names (fallback to IDs if names are not set).
126    
127    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
128    
129            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
130            \code{recursive} now determines whether existing corpus meta data
131            is used.
132    
133    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
134    
135            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
136    
137    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
138    
139            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
140            remove terms not occurring in the corpus anymore.
141    
142    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
145            and Heaps' law.
146    
147    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
148    
149            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
150            provided by a source.
151    
152    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
153    
154            * R/source.R (.Source): Provide document names.
155    
156    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
157    
158            * R/meta.R (`content_or_meta`): Utility function.
159    
160    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
161    
162            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
163            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
164    
165    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
166    
167            * R/weight.R (weightTfIdf): Added normalization option.
168    
169            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
170            analysis.
171    
172    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
173    
174            * R/score.R (tm_tag_score): Compute a score from the number of
175            tags matching in a document.
176    
177    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
178    
179            * R/complete.R (stemCompletion): New completion heuristics.
180    
181    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
182    
183            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
184    
185    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
186    
187            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
188            setOldClass(c(..., "list")) works.
189    
190    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
191    
192            * R/transform.R (stemDocument.character): In case input is a
193            simple character just delegate to the default Snowball stemmer.
194    
195    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
196    
197            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
198            data.
199    
200    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
201    
202            * R/doc.R (`Content<-`): Be careful with names attribute.
203    
204    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
205    
206            * R/source.R (DirSource): Improved implementation especially when
207            handling many (> 1M) files.
208    
209    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
210    
211            * R/source.R (getElem.URISource): Use encoding argument.
212    
213    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
214    
215            * R/doc.R (setOldClass): Register S3 document classes to be
216            recognized by S4 methods.
217    
218    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
219    
220            * R/matrix.R (termFreq): Add option to remove punctuation
221            characters.
222    
223    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
224    
225            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
226            merging multiple term-document matrices.
227    
228    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
229    
230            * R/corpus.R (setOldClass): Register S3 corpus classes to be
231            recognized by S4 methods.
232    
233            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
234            that CRAN Mac OS X builds do not fail any longer.
235    
236    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
237    
238            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
239            of RWeka:AlphabeticTokenizer() as default.
240    
241    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
242    
243            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
244            caused words at the beginning or the end of a line not to be removed. Do
245            not delete whitespace anymore.
246    
247    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
248    
249            * R/source.R (DirSource): Default to working directory if no path
250            is specified.
251    
252    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
253    
254            * R/source.R (DirSource): Stop on empty directories.
255    
256    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
257    
258            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
259            named documents.
260    
261    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
262    
263            * R/transform.R (removeWords): Improve regular expressions.
264    
265    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
266    
267            * R/meta.R (DublinCore): Allow lower case tags.
268    
269    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
270    
271            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
272            instead of x$children.
273    
274    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
275    
276            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
277    
278    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
279    
280            * R/: Use S3 instead of S4 class system.
281    
282    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
283    
284            * R/reader.R (readMail): Moved to tm.plugin.mail package.
285    
286    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
287    
288            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
289            postings are basically e-mails with some extra headers.
290    
291    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
292    
293            * R/transform.R: Move convertMboxEml, removeCitation,
294            removeMultipart, and removeSignature to the tm.plugin.mail package
295            since they are mainly utility functions (for handling e-mails) and
296            not very framework specific.
297    
298    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
299    
300            * man/: Fix documentation.
301    
302    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
303    
304            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
305            plain text document instead of an XML document for texts of the
306            Reuters-21578 dataset.
307    
308            * R/sparse.R: Removed since the slam package is now available on
309            CRAN.
310    
311            * DESCRIPTION (Depends): Add slam package.
312    
313    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
314    
315            * R/transform.R (stemDoc): Fix character(0) handling.
316    
317    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
318    
319            * R/doc.R (show): Pretty print.
320    
321    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
324            gracefully.
325    
326    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
327    
328            * R/corpus.R: Make corpus virtual. Implement corpus with standard
329            and permanent storage semantics.
330    
331            * DESCRIPTION: New major release. A *lot* of improvements.
332    
333    2009-05-04   Ingo Feinerer <feinerer@logic.at>
334    
335            * NAMESPACE: Export some simple_triplet_matrix functions.
336    
337    2009-04-28   Ingo Feinerer <feinerer@logic.at>
338    
339            * R/weight.R: Adapt tf-idf to new matrix format.
340    
341    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
342    
343            * R/matrix.R: Create two distinct classes for term-document and
344            document-term matrices.
345    
346    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
347    
348            * R/termdocmatrix.R: No longer use Matrix package. This reduces
349            package start-up time significantly.
350    
351    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
352    
353            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
354    
355    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
356    
357            * R/transform.R (tmReduce): Combine multiple maps into one
358            transformation.
359    
360    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
361    
362            * R/weight.R: Remove weightLogical since it does not return a
363            dgCMatrix.
364    
365            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
366            or TermDocumentMatrix instead.
367    
368    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
369    
370            * inst/doc/extensions.Rnw: Finished vignette.
371    
372    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
373    
374            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
375            DocumentTermMatrix representations.
376    
377    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
378    
379            * R/reader.R (readXML): New reader for arbitrary XML files.
380    
381    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
382    
383            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
384            (XMLSource): New XMLSource class for arbitrary XML files.
385            (Source): New slot Vectorized.
386    
387    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
388    
389            * R/reader.R (readTabular): Experimental reader for tabular data
390            structures which can be customized via user-defined mappings.
391    
392            * R/reader.R: Always use UTC time zone.
393    
394            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
395    
396    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
397    
398            * R/reader.R (readDOC): Options can be passed over to antiword.
399    
400            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
401            pdftotext.
402    
403    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
404    
405            * R/source.R (DirSource): Add pattern and ignore.case arguments
406            which are internally passed over to list.files().
407    
408    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
409    
410            * inst/doc/tm.Rnw: Suppress pointless loading message.
411    
412    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
413    
414            * DESCRIPTION: Speed up package loading (via moving packages not
415            strictly necessary for normal operation to Suggests instead of
416            Depends).
417    
418    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
419    
420            * R/reader.R (readNewsgroup): The date format is now configurable.
421    
422    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
423    
424            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
425    
426    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
427    
428            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
429    
430    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
431    
432            * R/source.R (DataframeSource): New source class for data frames.
433    
434            * R/source.R: Fixed non-standard call evaluation.
435    
436    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
437    
438            * R/source.R (URISource): New source class for a single document.
439    
440    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
441    
442            * R/source.R: Refactoring.
443    
444    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
445    
446            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
447            Rmpi installations more gracefully.
448    
449    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
450    
451            * R/source.R (Source): Add Length slot.
452    
453    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
454    
455            * R/AAA.R: Unify duplicated .onLoad function.
456    
457    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
458    
459            * DESCRIPTION (Suggests): Added Rmpi.
460    
461    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
462    
463            * R/source.R (getElem): Fix 'no visible binding' warning.
464    
465            * man/WeightFunction.Rd: Fix signature.
466    
467    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
468    
469            * R/weight.R: Introduce name abbreviations for weighting functions.
470    
471    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
472    
473            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
474    
475            * R/cluster.R: Provide convenience functions for using a MPI
476            cluster.
477    
478            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
479            available.
480    
481            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
482            available.
483    
484    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
485    
486            * R/textdoccol.R (lapply): Removed debug print out.
487    
488    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
489    
490            * R/reader.R (readRCV1): Improved meta data extraction from
491            Reuters Corpus Volume 1 documents.
492    
493    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
494    
495            * R/transform.R: Ensure that all mappings preserve multiline
496            structures.
497    
498    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * R/filter.R: Every filter has now an attribute indicating whether
501            it sould be applied to document level (doclevel).
502    
503            * R/textdoccol.R (tmFilter): Set searchFullText as new default
504            filter.
505    
506    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * R/transform.R (replacePatterns): Replaced removeWords by
509            replacePatterns. Suggested by Christian Buchta.
510    
511            * R/textdoccol.R (inspect): Improved formatting.
512    
513    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
514    
515            * inst/CITATION: Updated JSS article information.
516    
517            * R/textdoccol.R (setAs): Added coerce method from list to
518            corpus.
519    
520            * R/meta.R (meta): Improved meta data handling.
521    
522    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
523    
524            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
525            Christian Buchta.
526    
527            * inst/CITATION: Added template to include JSS article reference.
528    
529    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
530    
531            * R/textdoccol.R (tmMap): Introduced lazy mapping.
532    
533            * R/source.R: Added VectorSource.
534    
535    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
536    
537            * man/: Language codes should be in ISO 639-1 format.
538    
539            * R/textdoccol.R (asPlain): Preserve local meta data.
540    
541    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
542    
543            * R/textdoccol.R (writeCorpus): Function for writing a corpus
544            containing plain text documents to disk.
545    
546    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
547    
548            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
549            always set correctly.
550    
551            * R/textdoccol.R: Set load = TRUE as default for load on demand
552            since in most cases this is the wanted behaviour.
553    
554    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
557    
558            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
559    
560    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
561    
562            * R/meta.R (meta): New function for consistent access to meta data
563            of document collections, repositories, and texts.
564    
565    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
566    
567            * R/: Better support for encodings.
568    
569    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
570    
571            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
572            selection when no reader argument is given.
573    
574    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
575    
576            * R/source.R (CSVSource): Now uses read.csv instead of scan
577            internally.
578    
579    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
580    
581            * R/reader.R (getReaders): Returns available reader functions.
582    
583            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
584            as default.
585    
586    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
587    
588            * R/stopwords.R (stopwords): Shortened code, removed codetools
589            variable warnings.
590    
591            * man/: Documentation for showMeta, added an example for tmMap.
592    
593            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
594            some minor typos fixed.
595    
596    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
597    
598            * R/aobjects.R (showMeta): Added method for pretty printing a
599            text document's meta data.
600    
601    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
602    
603            * R/textdoccol.R (TextDocCol): Better handling of empty
604            arguments.
605    
606            * NAMESPACE: Exported readDOC.
607    
608            * man/completeStems.Rd: Added an example.
609    
610    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
611    
612            * R/stopwords.R (stopwords): Look up .dat files at every
613            call. Allows users to modify stopword .dat files interactively.
614    
615    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
616    
617            * R/termdocmatrix.R (termFreq): Correct processing of empty
618            documents.
619    
620    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
621    
622            * man/: Updated documentation.
623    
624    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
625    
626            * R/complete.R (completeStems): Completes (heuristically) word
627            stems.
628    
629            * R/termdocmatrix.R (TermDocMatrix2): New modular
630            constructor.
631    
632            * NAMESPACE: Exported termFreq.
633    
634    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
635    
636            * R/reader.R (readDOC): Added MS Word reader (using antiword).
637    
638    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
639    
640            * R/weight.R: Weighting functions for TermDocMatrix.
641    
642    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
643    
644            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
645            functions for accessing dimension, column, and row names.
646    
647            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
648    
649    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
650    
651            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
652    
653    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
654    
655            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
656    
657    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
658    
659            * R/reader.R (readPDF): Removed manual checks for pdftotext and
660            pdfinfo. The system call gives a warning anyway.
661    
662    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
663    
664            * R/textdoccol.R (asPlain): Conversion from
665            StructuredTextDocuments to PlainTextDocuments.
666    
667    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
668    
669            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
670            for accessing term-document matrices.
671    
672            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
673            are installed.
674    
675    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
678            Christian Buchta.
679    
680    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
681    
682            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
683    
684    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
687    
688            * R/reader.R (readPDF): Added PDF reader.
689    
690    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
691    
692            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
693    
694            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
695    
696            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
697    
698            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
699    
700    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
701    
702            * R/distmeasure.R (dissimilarity): Replaced dists call from
703            package cba by new dist call from package proxy.
704    
705    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
706    
707            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
708    
709    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
710    
711            * R/termdocmatrix.R: require() uses the quietly option to suppress
712            loading messages.
713    
714    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
715    
716            * R/dictionary.R: Added dictionary support.
717    
718    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
719    
720            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
721            documents. This simplifies some functions, e.g., asPlain.
722    
723    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
724    
725            * inst/doc/tm.Rnw: Fixed some typos in vignette.
726    
727    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
728    
729            * R/textdoccol.R (replaceWords): Added method to replace a set of
730            words by a single word. Useful for synonyms.
731    
732    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
733    
734            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
735    
736    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
737    
738            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
739            vectors. Thanks to Ariel Maguyon for his error report.
740            (removeSparseTerms): New function to remove columns from a
741            term-document matrix exceeding a sparse factor.
742    
743    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
744    
745            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
746    
747    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
748    
749            * man/sFilter.Rd: Corrected documentation on statement format (use
750            '==' instead of '=').
751    
752    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
753    
754            * R/aobjects.R (StructuredTextDocument): Inherits from
755            TextDocument.
756    
757    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
758    
759            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
760            on sparse matrices as proposed by Martin Maechler.
761    
762    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
763    
764            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
765            \pkg{filehash} version makes them deprecated.
766    
767    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
768    
769            * R/termdocmatrix.R (textvector): Stemming is now performed before
770            erasing stopwords.
771            (weightMatrix): Adapted to handle sparse matrices.
772            (TermDocMatrix): Sparse matrix is now efficiently built by
773            direct stepwise insertion of row values into it.
774    
775    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
776    
777            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
778            due to ongoing problems. For our purposes the latter is as useful
779            as the replaced package.
780    
781    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
782    
783            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
784    
785            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
786    
787    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
788    
789            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
790            languages with available stopwords.
791    
792    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
793    
794            * inst/doc/tm.Rnw: Minor corrections in the vignette.
795    
796    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
797    
798            * DESCRIPTION: Update to version 0.2, since a lot of new features
799            have been integrated.
800    
801            * inst/stopwords: Updated existing stopwords and added stopwords
802            for various other languages.
803    
804    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
805    
806            * man/: Updated documentation.
807    
808            * Work/testDb.R: Script to test database stuff.
809    
810            * R/: Fixed various database related bugs. Seems to be rather
811            useable now, i.e., consider as alpha status for now.
812    
813    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
814    
815            * R/: Fixed some bugs related to database support.
816    
817    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
818    
819            * man/: Added a lot of examples to the manuals.
820    
821    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
822    
823            * man/: Updated parts of the documentation.
824    
825            * R/textdoccol.R (asPlain): Added conversion from newsgroup
826            documents to plain text documents.
827    
828    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
829    
830            * R/textdoccol.R: Finished experimental database support. Not yet
831            intensively tested.
832    
833            * R/source.R: Now each source has a default reader.
834    
835            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
836            class anymore.
837    
838            * R/plaintextdoc.R: Custom show method for plain text documents.
839    
840            * R/aobjects.R: Added a class for structured text documents.
841    
842            * R/reader.R: Replaced remaining \code{parser} occurrences with
843            \code{reader}.
844    
845            * R/textdoccol.R (summary): Indent tags.
846    
847            * R/textdoccol.R (removePunctuation): Transform method to remove
848            punctuation marks.
849    
850    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
851    
852            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
853            using prescindMeta().
854    
855    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
856    
857            * R/textdoccol.R: Improved database support.
858    
859    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
860    
861            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
862    
863            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
864            language code.
865    
866            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
867            into parserControl argument.
868    
869            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
870    
871    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
872    
873            * Work/tmDataSetup.R: The datasets acq and crude can now be
874            created on the fly.
875    
876            * R/stopwords.R: Introduced a function returning the stopwords for
877            a given language (English, German and French at the moment)
878    
879            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
880            otherwise falls back to Snowball package.
881    
882    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
883    
884            * man/dissimilarity-methods.Rd: Make clear that any method offered
885            by "dists" from package "cba" can be used.
886    
887    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
888    
889            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
890            to Kurt's latex suggestion. Removed points and underscores in
891            variable names for consistent naming.
892    
893            * DESCRIPTION: Update to version 0.1-2.
894    
895            * man/TextRepository.Rd: Fixed bug in documentation.
896    
897    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
898    
899            * DESCRIPTION: Update to version 0.1-1.
900    
901    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
902    
903            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
904            wordStem.
905    
906    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
907    
908            * R/: Changes due to Kurt's review.
909    
910    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
911    
912            * R/: Implemented improvements based upon comments by David
913            Meyer.
914    
915    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
916    
917            * inst/doc/: Rewrote vignette.
918    
919            * man/: Improved documentation.
920    
921    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
922    
923            * man/: Updated documentation.
924    
925            * DESCRIPTION: Changed package name to "tm". Updated version to
926            0.1 for first CRAN release.
927    
928            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
929            list archive example.
930    
931            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
932            archive example.
933    
934            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
935            from (several mails per box) mbox format to (single mail per file)
936            eml format.
937    
938    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
939    
940            * data/crude.rda: Rebuilt.
941    
942            * data/acq.rda: Rebuilt.
943    
944            * R/reader.R: Factored out reader and parser methods from
945            textdoccol.R.
946    
947            * R/source.R: Factored out Source methods from aobjects.R and
948            textdoccol.R.
949            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
950            feeds.
951    
952            * R/textdoccol.R (DirSource): Added support for recursive
953            traversal of directories.
954    
955    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
956    
957            * R/textdoccol.R ([[): Loads the document corpus automatically
958            into memory upon access.
959            (tm_transform, tm_filter): Removed several checks whether the
960            document is already loaded ([[ ensures this now).
961            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
962            mailing list archive.
963    
964    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
965    
966            * R/aobjects.R (TextDocument): Is now a virtual class.
967            (Source): Is now a virtual class.
968    
969    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
970    
971            * R/textdoccol.R (c): Support for an arbitrary number of document
972            collections.
973    
974    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
975    
976            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
977            append_meta and remove_meta.
978    
979            * R/textdoccol.R: Removed modify_metadata method.
980    
981            * R/textrepo.R: Removed modify_metadata method.
982    
983            * R/textdoccol.R (remove_meta): Supports removal of document
984            collection metadata and document (= in data frame) metadata.
985    
986    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
987    
988            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
989    
990            * data/crude.rda: Rebuilt.
991    
992            * data/acq.rda: Rebuilt.
993    
994            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
995    
996            * R/textdoccol.R ([): Bug fix for subsetting a document
997            collection's data frame.
998    
999    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1000    
1001            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
1002            to s_filter.
1003    
1004            * R/textdoccol.R: Local text documents' metadata can now be copied
1005            to a document collection's data frame with prescind_meta.
1006    
1007    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1008    
1009            * R/: Text documents' slot metadata is now accessible in s_filter.
1010    
1011            * R/: Rewrote s_filter function (has still some restrictions).
1012    
1013    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1014    
1015            * R/: Various fixes in handling metadata.
1016    
1017            * R/: Added update mechanism for text document collections.
1018    
1019    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1020    
1021            * R/: Merging of document collections now creates a binary tree
1022            for reconstructing merged document collections.
1023    
1024            * R/: Redesign of metadata for document collections.
1025    
1026    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1027    
1028            * R/: Messages now use \code{ngettext}.
1029    
1030    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1031    
1032            * R/: Added functions for modifying and removing metadata.
1033    
1034    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1035    
1036            * man/: Updated some documentation.
1037    
1038            * R/: Corrected some connection issues.
1039    
1040            * inst/doc: Worked on the vignette.
1041    
1042    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1043    
1044            * inst/: Added texts and started vignette.
1045    
1046            * R/: Final changes based upon David's comments.
1047    
1048    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1049    
1050            * NAMESPACE: Corrected exports (generic methods need exportMethods
1051            directives!).
1052    
1053    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1054    
1055            * R/: Modified the TextDocCol constructur and various parsers. It
1056            is now modular and supports various file formats via plugins (see
1057            the new "Source" class).
1058    
1059    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1060    
1061            * man/: Revised documentation after previous code changes.
1062    
1063    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1064    
1065            * R/: Remaining changes as discussed with David.
1066    
1067    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1068    
1069            * R/: Some changes as suggested by David. The rest will follow
1070            within the next days.
1071    
1072    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1073    
1074            * man/: Finished documentation.
1075    
1076    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1077    
1078            * man/: Wrote some documentation.
1079    
1080    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1081    
1082            * R/: Further syntactic sugar in form of additional assignment and
1083            accessor methods.
1084    
1085    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1086    
1087            * R/: Syntactic sugar in form of "length", "show" and "summary"
1088            operators.
1089    
1090    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1091    
1092            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1093            and dissimilarities.
1094    
1095    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1096    
1097            * R/: Added similarity functions.
1098    
1099            * data/: Added english stopwords.
1100    
1101    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1102    
1103            * data/: Examples compiled for new features
1104    
1105            * R/: Changes due to new structure.
1106    
1107            * NAMESPACE: Corrected namespace to reflect new structure.
1108    
1109            * R/termdocmatrix.R: Adapted for new naming scheme.
1110    
1111    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1112    
1113            * R/textdoccol.R: Adapted code for new class structure. Wrote
1114            several transform and filter functions operating on text document
1115            collections (alias text document databases).
1116    
1117            * R/aobjects.R: Adapted class structure with inheritance,
1118            repositories and additional meta data. Loading files on demand is
1119            now possible.
1120    
1121    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1122    
1123            * R/: Some cosmetic cleanups.
1124    
1125            * inst/: Removed vignette on clustering. That and much more is now
1126            described in the JSS paper on text mining. Based upon that
1127            article an elaborated vignette will be incorporated in the future.
1128    
1129    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1130    
1131            * R/: Updated generic S4 methods to comply with signature changes
1132            in newer versions of R (> 2.3)
1133    
1134    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1135    
1136            * ext/R/importRIS.R: Automatic RIS import is now possible.
1137    
1138    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1139    
1140            * R/textdoccol.R: Added RIS HTML input format.
1141    
1142    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1143    
1144            * R/textdoccol.R: Removed bug that caused invalid text document
1145            collections when handling many input files.
1146    
1147    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1148    
1149            * R/textdoccol.R: Restructured and extended file import
1150            mechanism.
1151    
1152            * inst/doc/clustering.Rnw: Adapted vignette for use with
1153            ReutNews.rda
1154    
1155            * man/ReutNews.Rd: Documentation for ReutNews.rda
1156    
1157            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1158    
1159    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1160    
1161            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1162            clustering facilities of this package.
1163    
1164    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1165    
1166            * R/aobjects.R: Changed package document structure to avoid class
1167            dependency problems.
1168    
1169  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1170    
1171            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1172            data set.
1173    
1174          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
1175          CMD check textmin" works without errors.          CMD check textmin" works without errors.
1176    

Legend:
Removed from v.28  
changed lines
  Added in v.1194

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge