SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC pkg/ChangeLog revision 1175, Wed Feb 1 06:08:02 2012 UTC
# Line 1  Line 1 
1    2012-01-31  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/reader.R (readXML): Readers can now set the document language
4            themselves.
5    
6    2012-01-14  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/source.R (XMLSource, getElem.XMLSource): Simplifications as
9            proposed by Milan Bouchet-Valat.
10    
11    2012-01-11  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/matrix.R (termFreq): Fix processing of user provided
14            stopwords. Reported by Bettina GrĂ¼n.
15    
16    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
17    
18            * R/matrix.R (termFreq): Fix invalid handling of
19            control$wordLengths[1]. Reported by Steven C. Bagley.
20    
21    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
22    
23            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
24    
25    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
26    
27            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
28    
29    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
30    
31            * R/transform.R (removePunctuation): Add option to preserve
32            intra-word dashes.
33    
34    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
35    
36            * R/matrix.R (termFreq): Allow reordering of control option
37            processing.
38    
39    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
40    
41            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
42            pdfinfo tool.
43    
44            * inst/stopwords/SMART.dat: Add SMART information retrieval system
45            stopwords (which are also used by the MC toolkit).
46    
47            * R/matrix (termFreq): Allow local option \code{bounds$local} to
48            restrict how often a term may appear in each document (generalizes
49            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
50            for word length bounds (generalizes \code{minWordLength}).
51    
52            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
53            \code{bounds$global} for restricting how often a term is allowed
54            to appear in different documents.
55    
56            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
57            local options delegated internally to termFreq() and global
58            options which are processed by the term-document matrix
59            constructor itself.
60    
61    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
62    
63            * man/getTokenizers.Rd: Document getTokenizers().
64    
65            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
66    
67    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
68    
69            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
70    
71            * man/combine.Rd: Document c.term_frequency().
72    
73    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
74    
75            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
76            can be accessed via '[' and not '[['.
77    
78    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
79    
80            * R/stopwords.R (stopwords): Raise an error if no stopwords are
81            available for requested language. Suggested by Derek M Jones.
82    
83    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
84    
85            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
86            normalization.
87    
88    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/transform.R (stemDocument.PlainTextDocument): Use language
91            argument.
92    
93    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
94    
95            * R/source.R: Store strings and connections instead of unevaluated
96            calls.
97    
98    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
99    
100            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
101    
102    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
105            (instead of a list element).
106    
107    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
110            documents by names (fallback to IDs if names are not set).
111    
112    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
115            \code{recursive} now determines whether existing corpus meta data
116            is used.
117    
118    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
119    
120            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
121    
122    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
125            remove terms not occurring in the corpus anymore.
126    
127    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
128    
129            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
130            and Heaps' law.
131    
132    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
133    
134            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
135            provided by a source.
136    
137    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
138    
139            * R/source.R (.Source): Provide document names.
140    
141    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
142    
143            * R/meta.R (`content_or_meta`): Utility function.
144    
145    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
146    
147            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
148            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
149    
150    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
151    
152            * R/weight.R (weightTfIdf): Added normalization option.
153    
154            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
155            analysis.
156    
157    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
158    
159            * R/score.R (tm_tag_score): Compute a score from the number of
160            tags matching in a document.
161    
162    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
163    
164            * R/complete.R (stemCompletion): New completion heuristics.
165    
166    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
169    
170    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
171    
172            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
173            setOldClass(c(..., "list")) works.
174    
175    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/transform.R (stemDocument.character): In case input is a
178            simple character just delegate to the default Snowball stemmer.
179    
180    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
181    
182            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
183            data.
184    
185    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
186    
187            * R/doc.R (`Content<-`): Be careful with names attribute.
188    
189    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
190    
191            * R/source.R (DirSource): Improved implementation especially when
192            handling many (> 1M) files.
193    
194    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
195    
196            * R/source.R (getElem.URISource): Use encoding argument.
197    
198    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
199    
200            * R/doc.R (setOldClass): Register S3 document classes to be
201            recognized by S4 methods.
202    
203    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
204    
205            * R/matrix.R (termFreq): Add option to remove punctuation
206            characters.
207    
208    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
209    
210            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
211            merging multiple term-document matrices.
212    
213    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
214    
215            * R/corpus.R (setOldClass): Register S3 corpus classes to be
216            recognized by S4 methods.
217    
218            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
219            that CRAN Mac OS X builds do not fail any longer.
220    
221    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
222    
223            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
224            of RWeka:AlphabeticTokenizer() as default.
225    
226    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
227    
228            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
229            caused words at the beginning or the end of a line not to be removed. Do
230            not delete whitespace anymore.
231    
232    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
233    
234            * R/source.R (DirSource): Default to working directory if no path
235            is specified.
236    
237    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
238    
239            * R/source.R (DirSource): Stop on empty directories.
240    
241    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
242    
243            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
244            named documents.
245    
246    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
247    
248            * R/transform.R (removeWords): Improve regular expressions.
249    
250    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
251    
252            * R/meta.R (DublinCore): Allow lower case tags.
253    
254    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
255    
256            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
257            instead of x$children.
258    
259    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
260    
261            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
262    
263    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
264    
265            * R/: Use S3 instead of S4 class system.
266    
267    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
268    
269            * R/reader.R (readMail): Moved to tm.plugin.mail package.
270    
271    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
272    
273            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
274            postings are basically e-mails with some extra headers.
275    
276    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
277    
278            * R/transform.R: Move convertMboxEml, removeCitation,
279            removeMultipart, and removeSignature to the tm.plugin.mail package
280            since they are mainly utility functions (for handling e-mails) and
281            not very framework specific.
282    
283    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
284    
285            * man/: Fix documentation.
286    
287    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
288    
289            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
290            plain text document instead of an XML document for texts of the
291            Reuters-21578 dataset.
292    
293            * R/sparse.R: Removed since the slam package is now available on
294            CRAN.
295    
296            * DESCRIPTION (Depends): Add slam package.
297    
298    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
299    
300            * R/transform.R (stemDoc): Fix character(0) handling.
301    
302    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
303    
304            * R/doc.R (show): Pretty print.
305    
306    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
307    
308            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
309            gracefully.
310    
311    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
312    
313            * R/corpus.R: Make corpus virtual. Implement corpus with standard
314            and permanent storage semantics.
315    
316            * DESCRIPTION: New major release. A *lot* of improvements.
317    
318    2009-05-04   Ingo Feinerer <feinerer@logic.at>
319    
320            * NAMESPACE: Export some simple_triplet_matrix functions.
321    
322    2009-04-28   Ingo Feinerer <feinerer@logic.at>
323    
324            * R/weight.R: Adapt tf-idf to new matrix format.
325    
326    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
327    
328            * R/matrix.R: Create two distinct classes for term-document and
329            document-term matrices.
330    
331    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
332    
333            * R/termdocmatrix.R: No longer use Matrix package. This reduces
334            package start-up time significantly.
335    
336    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
337    
338            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
339    
340    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
341    
342            * R/transform.R (tmReduce): Combine multiple maps into one
343            transformation.
344    
345    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
346    
347            * R/weight.R: Remove weightLogical since it does not return a
348            dgCMatrix.
349    
350            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
351            or TermDocumentMatrix instead.
352    
353    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
354    
355            * inst/doc/extensions.Rnw: Finished vignette.
356    
357    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
358    
359            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
360            DocumentTermMatrix representations.
361    
362    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
363    
364            * R/reader.R (readXML): New reader for arbitrary XML files.
365    
366    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
367    
368            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
369            (XMLSource): New XMLSource class for arbitrary XML files.
370            (Source): New slot Vectorized.
371    
372    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
373    
374            * R/reader.R (readTabular): Experimental reader for tabular data
375            structures which can be customized via user-defined mappings.
376    
377            * R/reader.R: Always use UTC time zone.
378    
379            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
380    
381    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
382    
383            * R/reader.R (readDOC): Options can be passed over to antiword.
384    
385            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
386            pdftotext.
387    
388    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
389    
390            * R/source.R (DirSource): Add pattern and ignore.case arguments
391            which are internally passed over to list.files().
392    
393    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
394    
395            * inst/doc/tm.Rnw: Suppress pointless loading message.
396    
397    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
398    
399            * DESCRIPTION: Speed up package loading (via moving packages not
400            strictly necessary for normal operation to Suggests instead of
401            Depends).
402    
403    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
404    
405            * R/reader.R (readNewsgroup): The date format is now configurable.
406    
407    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
408    
409            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
410    
411    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
412    
413            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
414    
415    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
416    
417            * R/source.R (DataframeSource): New source class for data frames.
418    
419            * R/source.R: Fixed non-standard call evaluation.
420    
421    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
422    
423            * R/source.R (URISource): New source class for a single document.
424    
425    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
426    
427            * R/source.R: Refactoring.
428    
429    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
430    
431            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
432            Rmpi installations more gracefully.
433    
434    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
435    
436            * R/source.R (Source): Add Length slot.
437    
438    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
439    
440            * R/AAA.R: Unify duplicated .onLoad function.
441    
442    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
443    
444            * DESCRIPTION (Suggests): Added Rmpi.
445    
446    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
447    
448            * R/source.R (getElem): Fix 'no visible binding' warning.
449    
450            * man/WeightFunction.Rd: Fix signature.
451    
452    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
453    
454            * R/weight.R: Introduce name abbreviations for weighting functions.
455    
456    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
457    
458            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
459    
460            * R/cluster.R: Provide convenience functions for using a MPI
461            cluster.
462    
463            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
464            available.
465    
466            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
467            available.
468    
469    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
470    
471            * R/textdoccol.R (lapply): Removed debug print out.
472    
473    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/reader.R (readRCV1): Improved meta data extraction from
476            Reuters Corpus Volume 1 documents.
477    
478    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * R/transform.R: Ensure that all mappings preserve multiline
481            structures.
482    
483    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
484    
485            * R/filter.R: Every filter has now an attribute indicating whether
486            it sould be applied to document level (doclevel).
487    
488            * R/textdoccol.R (tmFilter): Set searchFullText as new default
489            filter.
490    
491    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
492    
493            * R/transform.R (replacePatterns): Replaced removeWords by
494            replacePatterns. Suggested by Christian Buchta.
495    
496            * R/textdoccol.R (inspect): Improved formatting.
497    
498    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * inst/CITATION: Updated JSS article information.
501    
502            * R/textdoccol.R (setAs): Added coerce method from list to
503            corpus.
504    
505            * R/meta.R (meta): Improved meta data handling.
506    
507    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
508    
509            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
510            Christian Buchta.
511    
512            * inst/CITATION: Added template to include JSS article reference.
513    
514    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
515    
516            * R/textdoccol.R (tmMap): Introduced lazy mapping.
517    
518            * R/source.R: Added VectorSource.
519    
520    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
521    
522            * man/: Language codes should be in ISO 639-1 format.
523    
524            * R/textdoccol.R (asPlain): Preserve local meta data.
525    
526    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/textdoccol.R (writeCorpus): Function for writing a corpus
529            containing plain text documents to disk.
530    
531    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
532    
533            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
534            always set correctly.
535    
536            * R/textdoccol.R: Set load = TRUE as default for load on demand
537            since in most cases this is the wanted behaviour.
538    
539    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
540    
541            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
542    
543            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
544    
545    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
546    
547            * R/meta.R (meta): New function for consistent access to meta data
548            of document collections, repositories, and texts.
549    
550    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
551    
552            * R/: Better support for encodings.
553    
554    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
557            selection when no reader argument is given.
558    
559    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
560    
561            * R/source.R (CSVSource): Now uses read.csv instead of scan
562            internally.
563    
564    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
565    
566            * R/reader.R (getReaders): Returns available reader functions.
567    
568            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
569            as default.
570    
571    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
572    
573            * R/stopwords.R (stopwords): Shortened code, removed codetools
574            variable warnings.
575    
576            * man/: Documentation for showMeta, added an example for tmMap.
577    
578            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
579            some minor typos fixed.
580    
581    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
582    
583            * R/aobjects.R (showMeta): Added method for pretty printing a
584            text document's meta data.
585    
586    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
587    
588            * R/textdoccol.R (TextDocCol): Better handling of empty
589            arguments.
590    
591            * NAMESPACE: Exported readDOC.
592    
593            * man/completeStems.Rd: Added an example.
594    
595    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
596    
597            * R/stopwords.R (stopwords): Look up .dat files at every
598            call. Allows users to modify stopword .dat files interactively.
599    
600    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
601    
602            * R/termdocmatrix.R (termFreq): Correct processing of empty
603            documents.
604    
605    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
606    
607            * man/: Updated documentation.
608    
609    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
610    
611            * R/complete.R (completeStems): Completes (heuristically) word
612            stems.
613    
614            * R/termdocmatrix.R (TermDocMatrix2): New modular
615            constructor.
616    
617            * NAMESPACE: Exported termFreq.
618    
619    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
620    
621            * R/reader.R (readDOC): Added MS Word reader (using antiword).
622    
623    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
624    
625            * R/weight.R: Weighting functions for TermDocMatrix.
626    
627    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
628    
629            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
630            functions for accessing dimension, column, and row names.
631    
632            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
633    
634    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
635    
636            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
637    
638    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
639    
640            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
641    
642    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
643    
644            * R/reader.R (readPDF): Removed manual checks for pdftotext and
645            pdfinfo. The system call gives a warning anyway.
646    
647    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
648    
649            * R/textdoccol.R (asPlain): Conversion from
650            StructuredTextDocuments to PlainTextDocuments.
651    
652    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
653    
654            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
655            for accessing term-document matrices.
656    
657            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
658            are installed.
659    
660    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
661    
662            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
663            Christian Buchta.
664    
665    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
666    
667            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
668    
669    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
670    
671            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
672    
673            * R/reader.R (readPDF): Added PDF reader.
674    
675    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
678    
679            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
680    
681            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
682    
683            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
684    
685    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
686    
687            * R/distmeasure.R (dissimilarity): Replaced dists call from
688            package cba by new dist call from package proxy.
689    
690    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
691    
692            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
693    
694    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
695    
696            * R/termdocmatrix.R: require() uses the quietly option to suppress
697            loading messages.
698    
699    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
700    
701            * R/dictionary.R: Added dictionary support.
702    
703    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
704    
705            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
706            documents. This simplifies some functions, e.g., asPlain.
707    
708    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
709    
710            * inst/doc/tm.Rnw: Fixed some typos in vignette.
711    
712    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
713    
714            * R/textdoccol.R (replaceWords): Added method to replace a set of
715            words by a single word. Useful for synonyms.
716    
717    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
718    
719            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
720    
721    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
722    
723            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
724            vectors. Thanks to Ariel Maguyon for his error report.
725            (removeSparseTerms): New function to remove columns from a
726            term-document matrix exceeding a sparse factor.
727    
728    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
729    
730            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
731    
732    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
733    
734            * man/sFilter.Rd: Corrected documentation on statement format (use
735            '==' instead of '=').
736    
737    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
738    
739            * R/aobjects.R (StructuredTextDocument): Inherits from
740            TextDocument.
741    
742    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
743    
744            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
745            on sparse matrices as proposed by Martin Maechler.
746    
747    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
748    
749            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
750            \pkg{filehash} version makes them deprecated.
751    
752    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
753    
754            * R/termdocmatrix.R (textvector): Stemming is now performed before
755            erasing stopwords.
756            (weightMatrix): Adapted to handle sparse matrices.
757            (TermDocMatrix): Sparse matrix is now efficiently built by
758            direct stepwise insertion of row values into it.
759    
760    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
761    
762            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
763            due to ongoing problems. For our purposes the latter is as useful
764            as the replaced package.
765    
766    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
767    
768            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
769    
770            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
771    
772    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
773    
774            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
775            languages with available stopwords.
776    
777    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
778    
779            * inst/doc/tm.Rnw: Minor corrections in the vignette.
780    
781    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
782    
783            * DESCRIPTION: Update to version 0.2, since a lot of new features
784            have been integrated.
785    
786            * inst/stopwords: Updated existing stopwords and added stopwords
787            for various other languages.
788    
789    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
790    
791            * man/: Updated documentation.
792    
793            * Work/testDb.R: Script to test database stuff.
794    
795            * R/: Fixed various database related bugs. Seems to be rather
796            useable now, i.e., consider as alpha status for now.
797    
798    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
799    
800            * R/: Fixed some bugs related to database support.
801    
802    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
803    
804            * man/: Added a lot of examples to the manuals.
805    
806    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
807    
808            * man/: Updated parts of the documentation.
809    
810            * R/textdoccol.R (asPlain): Added conversion from newsgroup
811            documents to plain text documents.
812    
813    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
814    
815            * R/textdoccol.R: Finished experimental database support. Not yet
816            intensively tested.
817    
818            * R/source.R: Now each source has a default reader.
819    
820            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
821            class anymore.
822    
823            * R/plaintextdoc.R: Custom show method for plain text documents.
824    
825            * R/aobjects.R: Added a class for structured text documents.
826    
827            * R/reader.R: Replaced remaining \code{parser} occurrences with
828            \code{reader}.
829    
830            * R/textdoccol.R (summary): Indent tags.
831    
832            * R/textdoccol.R (removePunctuation): Transform method to remove
833            punctuation marks.
834    
835    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
836    
837            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
838            using prescindMeta().
839    
840    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
841    
842            * R/textdoccol.R: Improved database support.
843    
844    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
845    
846            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
847    
848            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
849            language code.
850    
851            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
852            into parserControl argument.
853    
854            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
855    
856    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
857    
858            * Work/tmDataSetup.R: The datasets acq and crude can now be
859            created on the fly.
860    
861            * R/stopwords.R: Introduced a function returning the stopwords for
862            a given language (English, German and French at the moment)
863    
864            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
865            otherwise falls back to Snowball package.
866    
867    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * man/dissimilarity-methods.Rd: Make clear that any method offered
870            by "dists" from package "cba" can be used.
871    
872    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
873    
874            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
875            to Kurt's latex suggestion. Removed points and underscores in
876            variable names for consistent naming.
877    
878            * DESCRIPTION: Update to version 0.1-2.
879    
880            * man/TextRepository.Rd: Fixed bug in documentation.
881    
882    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
883    
884            * DESCRIPTION: Update to version 0.1-1.
885    
886    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
887    
888            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
889            wordStem.
890    
891    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
892    
893            * R/: Changes due to Kurt's review.
894    
895    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
896    
897            * R/: Implemented improvements based upon comments by David
898            Meyer.
899    
900    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
901    
902            * inst/doc/: Rewrote vignette.
903    
904            * man/: Improved documentation.
905    
906    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
907    
908            * man/: Updated documentation.
909    
910            * DESCRIPTION: Changed package name to "tm". Updated version to
911            0.1 for first CRAN release.
912    
913            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
914            list archive example.
915    
916            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
917            archive example.
918    
919            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
920            from (several mails per box) mbox format to (single mail per file)
921            eml format.
922    
923    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
924    
925            * data/crude.rda: Rebuilt.
926    
927            * data/acq.rda: Rebuilt.
928    
929            * R/reader.R: Factored out reader and parser methods from
930            textdoccol.R.
931    
932            * R/source.R: Factored out Source methods from aobjects.R and
933            textdoccol.R.
934            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
935            feeds.
936    
937            * R/textdoccol.R (DirSource): Added support for recursive
938            traversal of directories.
939    
940    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
941    
942            * R/textdoccol.R ([[): Loads the document corpus automatically
943            into memory upon access.
944            (tm_transform, tm_filter): Removed several checks whether the
945            document is already loaded ([[ ensures this now).
946            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
947            mailing list archive.
948    
949    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
950    
951            * R/aobjects.R (TextDocument): Is now a virtual class.
952            (Source): Is now a virtual class.
953    
954    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
955    
956            * R/textdoccol.R (c): Support for an arbitrary number of document
957            collections.
958    
959    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
960    
961            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
962            append_meta and remove_meta.
963    
964            * R/textdoccol.R: Removed modify_metadata method.
965    
966            * R/textrepo.R: Removed modify_metadata method.
967    
968            * R/textdoccol.R (remove_meta): Supports removal of document
969            collection metadata and document (= in data frame) metadata.
970    
971    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
972    
973            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
974    
975            * data/crude.rda: Rebuilt.
976    
977            * data/acq.rda: Rebuilt.
978    
979            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
980    
981            * R/textdoccol.R ([): Bug fix for subsetting a document
982            collection's data frame.
983    
984    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
985    
986            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
987            to s_filter.
988    
989            * R/textdoccol.R: Local text documents' metadata can now be copied
990            to a document collection's data frame with prescind_meta.
991    
992    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
993    
994            * R/: Text documents' slot metadata is now accessible in s_filter.
995    
996            * R/: Rewrote s_filter function (has still some restrictions).
997    
998    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
999    
1000            * R/: Various fixes in handling metadata.
1001    
1002            * R/: Added update mechanism for text document collections.
1003    
1004    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1005    
1006            * R/: Merging of document collections now creates a binary tree
1007            for reconstructing merged document collections.
1008    
1009            * R/: Redesign of metadata for document collections.
1010    
1011    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1012    
1013            * R/: Messages now use \code{ngettext}.
1014    
1015    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1016    
1017            * R/: Added functions for modifying and removing metadata.
1018    
1019    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1020    
1021            * man/: Updated some documentation.
1022    
1023            * R/: Corrected some connection issues.
1024    
1025            * inst/doc: Worked on the vignette.
1026    
1027    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1028    
1029            * inst/: Added texts and started vignette.
1030    
1031            * R/: Final changes based upon David's comments.
1032    
1033    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1034    
1035            * NAMESPACE: Corrected exports (generic methods need exportMethods
1036            directives!).
1037    
1038    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1039    
1040            * R/: Modified the TextDocCol constructur and various parsers. It
1041            is now modular and supports various file formats via plugins (see
1042            the new "Source" class).
1043    
1044    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1045    
1046            * man/: Revised documentation after previous code changes.
1047    
1048    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1049    
1050            * R/: Remaining changes as discussed with David.
1051    
1052    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1053    
1054            * R/: Some changes as suggested by David. The rest will follow
1055            within the next days.
1056    
1057    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1058    
1059            * man/: Finished documentation.
1060    
1061    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1062    
1063            * man/: Wrote some documentation.
1064    
1065    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1066    
1067            * R/: Further syntactic sugar in form of additional assignment and
1068            accessor methods.
1069    
1070    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1071    
1072            * R/: Syntactic sugar in form of "length", "show" and "summary"
1073            operators.
1074    
1075    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1076    
1077            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1078            and dissimilarities.
1079    
1080    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1081    
1082            * R/: Added similarity functions.
1083    
1084            * data/: Added english stopwords.
1085    
1086    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1087    
1088            * data/: Examples compiled for new features
1089    
1090            * R/: Changes due to new structure.
1091    
1092            * NAMESPACE: Corrected namespace to reflect new structure.
1093    
1094            * R/termdocmatrix.R: Adapted for new naming scheme.
1095    
1096    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1097    
1098            * R/textdoccol.R: Adapted code for new class structure. Wrote
1099            several transform and filter functions operating on text document
1100            collections (alias text document databases).
1101    
1102            * R/aobjects.R: Adapted class structure with inheritance,
1103            repositories and additional meta data. Loading files on demand is
1104            now possible.
1105    
1106    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1107    
1108            * R/: Some cosmetic cleanups.
1109    
1110            * inst/: Removed vignette on clustering. That and much more is now
1111            described in the JSS paper on text mining. Based upon that
1112            article an elaborated vignette will be incorporated in the future.
1113    
1114    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1115    
1116            * R/: Updated generic S4 methods to comply with signature changes
1117            in newer versions of R (> 2.3)
1118    
1119    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1120    
1121            * ext/R/importRIS.R: Automatic RIS import is now possible.
1122    
1123    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1124    
1125            * R/textdoccol.R: Added RIS HTML input format.
1126    
1127    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1128    
1129            * R/textdoccol.R: Removed bug that caused invalid text document
1130            collections when handling many input files.
1131    
1132    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1133    
1134            * R/textdoccol.R: Restructured and extended file import
1135            mechanism.
1136    
1137            * inst/doc/clustering.Rnw: Adapted vignette for use with
1138            ReutNews.rda
1139    
1140            * man/ReutNews.Rd: Documentation for ReutNews.rda
1141    
1142            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1143    
1144  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1145    
1146          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.1175

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge