SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC pkg/ChangeLog revision 1117, Fri Feb 4 20:44:37 2011 UTC
# Line 1  Line 1 
1    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/source.R: Store strings and connections instead of unevaluated
4            calls.
5    
6    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
9    
10    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
11    
12            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
13            (instead of a list element).
14    
15    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
16    
17            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
18            documents by names (fallback to IDs if names are not set).
19    
20    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
21    
22            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
23            \code{recursive} now determines whether existing corpus meta data
24            is used.
25    
26    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
27    
28            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
29    
30    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
31    
32            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
33            remove terms not occurring in the corpus anymore.
34    
35    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
38            and Heaps' law.
39    
40    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
43            provided by a source.
44    
45    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
46    
47            * R/source.R (.Source): Provide document names.
48    
49    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
50    
51            * R/meta.R (`content_or_meta`): Utility function.
52    
53    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
54    
55            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
56            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
57    
58    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
59    
60            * R/weight.R (weightTfIdf): Added normalization option.
61    
62            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
63            analysis.
64    
65    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
66    
67            * R/score.R (tm_tag_score): Compute a score from the number of
68            tags matching in a document.
69    
70    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
71    
72            * R/complete.R (stemCompletion): New completion heuristics.
73    
74    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
75    
76            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
77    
78    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
79    
80            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
81            setOldClass(c(..., "list")) works.
82    
83    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
84    
85            * R/transform.R (stemDocument.character): In case input is a
86            simple character just delegate to the default Snowball stemmer.
87    
88    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
91            data.
92    
93    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
94    
95            * R/doc.R (`Content<-`): Be careful with names attribute.
96    
97    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
98    
99            * R/source.R (DirSource): Improved implementation especially when
100            handling many (> 1M) files.
101    
102    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/source.R (getElem.URISource): Use encoding argument.
105    
106    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
107    
108            * R/doc.R (setOldClass): Register S3 document classes to be
109            recognized by S4 methods.
110    
111    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
112    
113            * R/matrix.R (termFreq): Add option to remove punctuation
114            characters.
115    
116    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
117    
118            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
119            merging multiple term-document matrices.
120    
121    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
122    
123            * R/corpus.R (setOldClass): Register S3 corpus classes to be
124            recognized by S4 methods.
125    
126            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
127            that CRAN Mac OS X builds do not fail any longer.
128    
129    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
130    
131            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
132            of RWeka:AlphabeticTokenizer() as default.
133    
134    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
135    
136            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
137            caused words at the beginning or the end of a line not to be removed. Do
138            not delete whitespace anymore.
139    
140    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
141    
142            * R/source.R (DirSource): Default to working directory if no path
143            is specified.
144    
145    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
146    
147            * R/source.R (DirSource): Stop on empty directories.
148    
149    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
150    
151            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
152            named documents.
153    
154    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
155    
156            * R/transform.R (removeWords): Improve regular expressions.
157    
158    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
159    
160            * R/meta.R (DublinCore): Allow lower case tags.
161    
162    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
163    
164            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
165            instead of x$children.
166    
167    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
168    
169            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
170    
171    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
172    
173            * R/: Use S3 instead of S4 class system.
174    
175    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/reader.R (readMail): Moved to tm.plugin.mail package.
178    
179    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
180    
181            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
182            postings are basically e-mails with some extra headers.
183    
184    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
185    
186            * R/transform.R: Move convertMboxEml, removeCitation,
187            removeMultipart, and removeSignature to the tm.plugin.mail package
188            since they are mainly utility functions (for handling e-mails) and
189            not very framework specific.
190    
191    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
192    
193            * man/: Fix documentation.
194    
195    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
196    
197            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
198            plain text document instead of an XML document for texts of the
199            Reuters-21578 dataset.
200    
201            * R/sparse.R: Removed since the slam package is now available on
202            CRAN.
203    
204            * DESCRIPTION (Depends): Add slam package.
205    
206    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
207    
208            * R/transform.R (stemDoc): Fix character(0) handling.
209    
210    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
211    
212            * R/doc.R (show): Pretty print.
213    
214    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
215    
216            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
217            gracefully.
218    
219    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
220    
221            * R/corpus.R: Make corpus virtual. Implement corpus with standard
222            and permanent storage semantics.
223    
224            * DESCRIPTION: New major release. A *lot* of improvements.
225    
226    2009-05-04   Ingo Feinerer <feinerer@logic.at>
227    
228            * NAMESPACE: Export some simple_triplet_matrix functions.
229    
230    2009-04-28   Ingo Feinerer <feinerer@logic.at>
231    
232            * R/weight.R: Adapt tf-idf to new matrix format.
233    
234    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
235    
236            * R/matrix.R: Create two distinct classes for term-document and
237            document-term matrices.
238    
239    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
240    
241            * R/termdocmatrix.R: No longer use Matrix package. This reduces
242            package start-up time significantly.
243    
244    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
245    
246            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
247    
248    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
249    
250            * R/transform.R (tmReduce): Combine multiple maps into one
251            transformation.
252    
253    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
254    
255            * R/weight.R: Remove weightLogical since it does not return a
256            dgCMatrix.
257    
258            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
259            or TermDocumentMatrix instead.
260    
261    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
262    
263            * inst/doc/extensions.Rnw: Finished vignette.
264    
265    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
266    
267            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
268            DocumentTermMatrix representations.
269    
270    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
271    
272            * R/reader.R (readXML): New reader for arbitrary XML files.
273    
274    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
275    
276            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
277            (XMLSource): New XMLSource class for arbitrary XML files.
278            (Source): New slot Vectorized.
279    
280    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
281    
282            * R/reader.R (readTabular): Experimental reader for tabular data
283            structures which can be customized via user-defined mappings.
284    
285            * R/reader.R: Always use UTC time zone.
286    
287            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
288    
289    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
290    
291            * R/reader.R (readDOC): Options can be passed over to antiword.
292    
293            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
294            pdftotext.
295    
296    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
297    
298            * R/source.R (DirSource): Add pattern and ignore.case arguments
299            which are internally passed over to list.files().
300    
301    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
302    
303            * inst/doc/tm.Rnw: Suppress pointless loading message.
304    
305    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
306    
307            * DESCRIPTION: Speed up package loading (via moving packages not
308            strictly necessary for normal operation to Suggests instead of
309            Depends).
310    
311    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
312    
313            * R/reader.R (readNewsgroup): The date format is now configurable.
314    
315    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
316    
317            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
318    
319    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
320    
321            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
322    
323    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
324    
325            * R/source.R (DataframeSource): New source class for data frames.
326    
327            * R/source.R: Fixed non-standard call evaluation.
328    
329    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
330    
331            * R/source.R (URISource): New source class for a single document.
332    
333    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
334    
335            * R/source.R: Refactoring.
336    
337    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
338    
339            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
340            Rmpi installations more gracefully.
341    
342    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
343    
344            * R/source.R (Source): Add Length slot.
345    
346    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
347    
348            * R/AAA.R: Unify duplicated .onLoad function.
349    
350    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
351    
352            * DESCRIPTION (Suggests): Added Rmpi.
353    
354    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
355    
356            * R/source.R (getElem): Fix 'no visible binding' warning.
357    
358            * man/WeightFunction.Rd: Fix signature.
359    
360    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
361    
362            * R/weight.R: Introduce name abbreviations for weighting functions.
363    
364    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
365    
366            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
367    
368            * R/cluster.R: Provide convenience functions for using a MPI
369            cluster.
370    
371            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
372            available.
373    
374            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
375            available.
376    
377    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
378    
379            * R/textdoccol.R (lapply): Removed debug print out.
380    
381    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
382    
383            * R/reader.R (readRCV1): Improved meta data extraction from
384            Reuters Corpus Volume 1 documents.
385    
386    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
387    
388            * R/transform.R: Ensure that all mappings preserve multiline
389            structures.
390    
391    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
392    
393            * R/filter.R: Every filter has now an attribute indicating whether
394            it sould be applied to document level (doclevel).
395    
396            * R/textdoccol.R (tmFilter): Set searchFullText as new default
397            filter.
398    
399    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
400    
401            * R/transform.R (replacePatterns): Replaced removeWords by
402            replacePatterns. Suggested by Christian Buchta.
403    
404            * R/textdoccol.R (inspect): Improved formatting.
405    
406    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
407    
408            * inst/CITATION: Updated JSS article information.
409    
410            * R/textdoccol.R (setAs): Added coerce method from list to
411            corpus.
412    
413            * R/meta.R (meta): Improved meta data handling.
414    
415    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
416    
417            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
418            Christian Buchta.
419    
420            * inst/CITATION: Added template to include JSS article reference.
421    
422    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
423    
424            * R/textdoccol.R (tmMap): Introduced lazy mapping.
425    
426            * R/source.R: Added VectorSource.
427    
428    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
429    
430            * man/: Language codes should be in ISO 639-1 format.
431    
432            * R/textdoccol.R (asPlain): Preserve local meta data.
433    
434    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * R/textdoccol.R (writeCorpus): Function for writing a corpus
437            containing plain text documents to disk.
438    
439    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
442            always set correctly.
443    
444            * R/textdoccol.R: Set load = TRUE as default for load on demand
445            since in most cases this is the wanted behaviour.
446    
447    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
448    
449            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
450    
451            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
452    
453    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/meta.R (meta): New function for consistent access to meta data
456            of document collections, repositories, and texts.
457    
458    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/: Better support for encodings.
461    
462    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
463    
464            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
465            selection when no reader argument is given.
466    
467    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
468    
469            * R/source.R (CSVSource): Now uses read.csv instead of scan
470            internally.
471    
472    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
473    
474            * R/reader.R (getReaders): Returns available reader functions.
475    
476            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
477            as default.
478    
479    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
480    
481            * R/stopwords.R (stopwords): Shortened code, removed codetools
482            variable warnings.
483    
484            * man/: Documentation for showMeta, added an example for tmMap.
485    
486            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
487            some minor typos fixed.
488    
489    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
490    
491            * R/aobjects.R (showMeta): Added method for pretty printing a
492            text document's meta data.
493    
494    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * R/textdoccol.R (TextDocCol): Better handling of empty
497            arguments.
498    
499            * NAMESPACE: Exported readDOC.
500    
501            * man/completeStems.Rd: Added an example.
502    
503    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
504    
505            * R/stopwords.R (stopwords): Look up .dat files at every
506            call. Allows users to modify stopword .dat files interactively.
507    
508    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * R/termdocmatrix.R (termFreq): Correct processing of empty
511            documents.
512    
513    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
514    
515            * man/: Updated documentation.
516    
517    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * R/complete.R (completeStems): Completes (heuristically) word
520            stems.
521    
522            * R/termdocmatrix.R (TermDocMatrix2): New modular
523            constructor.
524    
525            * NAMESPACE: Exported termFreq.
526    
527    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
528    
529            * R/reader.R (readDOC): Added MS Word reader (using antiword).
530    
531    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
532    
533            * R/weight.R: Weighting functions for TermDocMatrix.
534    
535    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
536    
537            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
538            functions for accessing dimension, column, and row names.
539    
540            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
541    
542    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
543    
544            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
545    
546    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
547    
548            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
549    
550    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
551    
552            * R/reader.R (readPDF): Removed manual checks for pdftotext and
553            pdfinfo. The system call gives a warning anyway.
554    
555    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
556    
557            * R/textdoccol.R (asPlain): Conversion from
558            StructuredTextDocuments to PlainTextDocuments.
559    
560    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
561    
562            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
563            for accessing term-document matrices.
564    
565            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
566            are installed.
567    
568    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
571            Christian Buchta.
572    
573    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
576    
577    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
578    
579            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
580    
581            * R/reader.R (readPDF): Added PDF reader.
582    
583    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
584    
585            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
586    
587            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
588    
589            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
590    
591            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
592    
593    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
594    
595            * R/distmeasure.R (dissimilarity): Replaced dists call from
596            package cba by new dist call from package proxy.
597    
598    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
599    
600            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
601    
602    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
603    
604            * R/termdocmatrix.R: require() uses the quietly option to suppress
605            loading messages.
606    
607    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
608    
609            * R/dictionary.R: Added dictionary support.
610    
611    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
612    
613            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
614            documents. This simplifies some functions, e.g., asPlain.
615    
616    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
617    
618            * inst/doc/tm.Rnw: Fixed some typos in vignette.
619    
620    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
621    
622            * R/textdoccol.R (replaceWords): Added method to replace a set of
623            words by a single word. Useful for synonyms.
624    
625    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
626    
627            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
628    
629    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
630    
631            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
632            vectors. Thanks to Ariel Maguyon for his error report.
633            (removeSparseTerms): New function to remove columns from a
634            term-document matrix exceeding a sparse factor.
635    
636    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
637    
638            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
639    
640    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
641    
642            * man/sFilter.Rd: Corrected documentation on statement format (use
643            '==' instead of '=').
644    
645    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
646    
647            * R/aobjects.R (StructuredTextDocument): Inherits from
648            TextDocument.
649    
650    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
651    
652            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
653            on sparse matrices as proposed by Martin Maechler.
654    
655    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
658            \pkg{filehash} version makes them deprecated.
659    
660    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
661    
662            * R/termdocmatrix.R (textvector): Stemming is now performed before
663            erasing stopwords.
664            (weightMatrix): Adapted to handle sparse matrices.
665            (TermDocMatrix): Sparse matrix is now efficiently built by
666            direct stepwise insertion of row values into it.
667    
668    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
669    
670            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
671            due to ongoing problems. For our purposes the latter is as useful
672            as the replaced package.
673    
674    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
675    
676            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
677    
678            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
679    
680    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
681    
682            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
683            languages with available stopwords.
684    
685    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
686    
687            * inst/doc/tm.Rnw: Minor corrections in the vignette.
688    
689    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
690    
691            * DESCRIPTION: Update to version 0.2, since a lot of new features
692            have been integrated.
693    
694            * inst/stopwords: Updated existing stopwords and added stopwords
695            for various other languages.
696    
697    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
698    
699            * man/: Updated documentation.
700    
701            * Work/testDb.R: Script to test database stuff.
702    
703            * R/: Fixed various database related bugs. Seems to be rather
704            useable now, i.e., consider as alpha status for now.
705    
706    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
707    
708            * R/: Fixed some bugs related to database support.
709    
710    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
711    
712            * man/: Added a lot of examples to the manuals.
713    
714    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
715    
716            * man/: Updated parts of the documentation.
717    
718            * R/textdoccol.R (asPlain): Added conversion from newsgroup
719            documents to plain text documents.
720    
721    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
722    
723            * R/textdoccol.R: Finished experimental database support. Not yet
724            intensively tested.
725    
726            * R/source.R: Now each source has a default reader.
727    
728            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
729            class anymore.
730    
731            * R/plaintextdoc.R: Custom show method for plain text documents.
732    
733            * R/aobjects.R: Added a class for structured text documents.
734    
735            * R/reader.R: Replaced remaining \code{parser} occurrences with
736            \code{reader}.
737    
738            * R/textdoccol.R (summary): Indent tags.
739    
740            * R/textdoccol.R (removePunctuation): Transform method to remove
741            punctuation marks.
742    
743    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
744    
745            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
746            using prescindMeta().
747    
748    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
749    
750            * R/textdoccol.R: Improved database support.
751    
752    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
753    
754            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
755    
756            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
757            language code.
758    
759            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
760            into parserControl argument.
761    
762            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
763    
764    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
765    
766            * Work/tmDataSetup.R: The datasets acq and crude can now be
767            created on the fly.
768    
769            * R/stopwords.R: Introduced a function returning the stopwords for
770            a given language (English, German and French at the moment)
771    
772            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
773            otherwise falls back to Snowball package.
774    
775    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
776    
777            * man/dissimilarity-methods.Rd: Make clear that any method offered
778            by "dists" from package "cba" can be used.
779    
780    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
781    
782            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
783            to Kurt's latex suggestion. Removed points and underscores in
784            variable names for consistent naming.
785    
786            * DESCRIPTION: Update to version 0.1-2.
787    
788            * man/TextRepository.Rd: Fixed bug in documentation.
789    
790    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
791    
792            * DESCRIPTION: Update to version 0.1-1.
793    
794    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
795    
796            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
797            wordStem.
798    
799    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
800    
801            * R/: Changes due to Kurt's review.
802    
803    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
804    
805            * R/: Implemented improvements based upon comments by David
806            Meyer.
807    
808    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
809    
810            * inst/doc/: Rewrote vignette.
811    
812            * man/: Improved documentation.
813    
814    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
815    
816            * man/: Updated documentation.
817    
818            * DESCRIPTION: Changed package name to "tm". Updated version to
819            0.1 for first CRAN release.
820    
821            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
822            list archive example.
823    
824            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
825            archive example.
826    
827            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
828            from (several mails per box) mbox format to (single mail per file)
829            eml format.
830    
831    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
832    
833            * data/crude.rda: Rebuilt.
834    
835            * data/acq.rda: Rebuilt.
836    
837            * R/reader.R: Factored out reader and parser methods from
838            textdoccol.R.
839    
840            * R/source.R: Factored out Source methods from aobjects.R and
841            textdoccol.R.
842            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
843            feeds.
844    
845            * R/textdoccol.R (DirSource): Added support for recursive
846            traversal of directories.
847    
848    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
849    
850            * R/textdoccol.R ([[): Loads the document corpus automatically
851            into memory upon access.
852            (tm_transform, tm_filter): Removed several checks whether the
853            document is already loaded ([[ ensures this now).
854            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
855            mailing list archive.
856    
857    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
858    
859            * R/aobjects.R (TextDocument): Is now a virtual class.
860            (Source): Is now a virtual class.
861    
862    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
863    
864            * R/textdoccol.R (c): Support for an arbitrary number of document
865            collections.
866    
867    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
870            append_meta and remove_meta.
871    
872            * R/textdoccol.R: Removed modify_metadata method.
873    
874            * R/textrepo.R: Removed modify_metadata method.
875    
876            * R/textdoccol.R (remove_meta): Supports removal of document
877            collection metadata and document (= in data frame) metadata.
878    
879    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
880    
881            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
882    
883            * data/crude.rda: Rebuilt.
884    
885            * data/acq.rda: Rebuilt.
886    
887            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
888    
889            * R/textdoccol.R ([): Bug fix for subsetting a document
890            collection's data frame.
891    
892    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
893    
894            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
895            to s_filter.
896    
897            * R/textdoccol.R: Local text documents' metadata can now be copied
898            to a document collection's data frame with prescind_meta.
899    
900    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
901    
902            * R/: Text documents' slot metadata is now accessible in s_filter.
903    
904            * R/: Rewrote s_filter function (has still some restrictions).
905    
906    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
907    
908            * R/: Various fixes in handling metadata.
909    
910            * R/: Added update mechanism for text document collections.
911    
912    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
913    
914            * R/: Merging of document collections now creates a binary tree
915            for reconstructing merged document collections.
916    
917            * R/: Redesign of metadata for document collections.
918    
919    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
920    
921            * R/: Messages now use \code{ngettext}.
922    
923    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
924    
925            * R/: Added functions for modifying and removing metadata.
926    
927    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
928    
929            * man/: Updated some documentation.
930    
931            * R/: Corrected some connection issues.
932    
933            * inst/doc: Worked on the vignette.
934    
935    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
936    
937            * inst/: Added texts and started vignette.
938    
939            * R/: Final changes based upon David's comments.
940    
941    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
942    
943            * NAMESPACE: Corrected exports (generic methods need exportMethods
944            directives!).
945    
946    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
947    
948            * R/: Modified the TextDocCol constructur and various parsers. It
949            is now modular and supports various file formats via plugins (see
950            the new "Source" class).
951    
952    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
953    
954            * man/: Revised documentation after previous code changes.
955    
956    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
957    
958            * R/: Remaining changes as discussed with David.
959    
960    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
961    
962            * R/: Some changes as suggested by David. The rest will follow
963            within the next days.
964    
965    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
966    
967            * man/: Finished documentation.
968    
969    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
970    
971            * man/: Wrote some documentation.
972    
973    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
974    
975            * R/: Further syntactic sugar in form of additional assignment and
976            accessor methods.
977    
978    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
979    
980            * R/: Syntactic sugar in form of "length", "show" and "summary"
981            operators.
982    
983    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * R/: Diverse updates. Mainly on default operators ("[" or "c")
986            and dissimilarities.
987    
988    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
989    
990            * R/: Added similarity functions.
991    
992            * data/: Added english stopwords.
993    
994    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
995    
996            * data/: Examples compiled for new features
997    
998            * R/: Changes due to new structure.
999    
1000            * NAMESPACE: Corrected namespace to reflect new structure.
1001    
1002            * R/termdocmatrix.R: Adapted for new naming scheme.
1003    
1004    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1005    
1006            * R/textdoccol.R: Adapted code for new class structure. Wrote
1007            several transform and filter functions operating on text document
1008            collections (alias text document databases).
1009    
1010            * R/aobjects.R: Adapted class structure with inheritance,
1011            repositories and additional meta data. Loading files on demand is
1012            now possible.
1013    
1014    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1015    
1016            * R/: Some cosmetic cleanups.
1017    
1018            * inst/: Removed vignette on clustering. That and much more is now
1019            described in the JSS paper on text mining. Based upon that
1020            article an elaborated vignette will be incorporated in the future.
1021    
1022    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1023    
1024            * R/: Updated generic S4 methods to comply with signature changes
1025            in newer versions of R (> 2.3)
1026    
1027    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1028    
1029            * ext/R/importRIS.R: Automatic RIS import is now possible.
1030    
1031    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1032    
1033            * R/textdoccol.R: Added RIS HTML input format.
1034    
1035    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1036    
1037            * R/textdoccol.R: Removed bug that caused invalid text document
1038            collections when handling many input files.
1039    
1040    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1041    
1042            * R/textdoccol.R: Restructured and extended file import
1043            mechanism.
1044    
1045            * inst/doc/clustering.Rnw: Adapted vignette for use with
1046            ReutNews.rda
1047    
1048            * man/ReutNews.Rd: Documentation for ReutNews.rda
1049    
1050            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1051    
1052    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1053    
1054            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1055            clustering facilities of this package.
1056    
1057    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1058    
1059            * R/aobjects.R: Changed package document structure to avoid class
1060            dependency problems.
1061    
1062  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1063    
1064            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1065            data set.
1066    
1067          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
1068          CMD check textmin" works without errors.          CMD check textmin" works without errors.
1069    

Legend:
Removed from v.28  
changed lines
  Added in v.1117

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge