SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC pkg/ChangeLog revision 1149, Fri Nov 4 15:48:50 2011 UTC
# Line 1  Line 1 
1    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
2    
3            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
4    
5            * man/combine.Rd: Document c.term_frequency.
6    
7    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
8    
9            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
10            can be accessed via '[' and not '[['.
11    
12    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
13    
14            * R/stopwords.R (stopwords): Raise an error if no stopwords are
15            available for requested language. Suggested by Derek M Jones.
16    
17    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
18    
19            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
20            normalization.
21    
22    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
23    
24            * R/transform.R (stemDocument.PlainTextDocument): Use language
25            argument.
26    
27    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
28    
29            * R/source.R: Store strings and connections instead of unevaluated
30            calls.
31    
32    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
33    
34            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
35    
36    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
37    
38            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
39            (instead of a list element).
40    
41    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
42    
43            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
44            documents by names (fallback to IDs if names are not set).
45    
46    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
47    
48            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
49            \code{recursive} now determines whether existing corpus meta data
50            is used.
51    
52    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
53    
54            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
55    
56    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
57    
58            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
59            remove terms not occurring in the corpus anymore.
60    
61    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
62    
63            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
64            and Heaps' law.
65    
66    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
67    
68            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
69            provided by a source.
70    
71    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
72    
73            * R/source.R (.Source): Provide document names.
74    
75    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
76    
77            * R/meta.R (`content_or_meta`): Utility function.
78    
79    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
80    
81            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
82            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
83    
84    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
85    
86            * R/weight.R (weightTfIdf): Added normalization option.
87    
88            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
89            analysis.
90    
91    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
92    
93            * R/score.R (tm_tag_score): Compute a score from the number of
94            tags matching in a document.
95    
96    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
97    
98            * R/complete.R (stemCompletion): New completion heuristics.
99    
100    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
101    
102            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
103    
104    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
105    
106            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
107            setOldClass(c(..., "list")) works.
108    
109    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
110    
111            * R/transform.R (stemDocument.character): In case input is a
112            simple character just delegate to the default Snowball stemmer.
113    
114    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
115    
116            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
117            data.
118    
119    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
120    
121            * R/doc.R (`Content<-`): Be careful with names attribute.
122    
123    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
124    
125            * R/source.R (DirSource): Improved implementation especially when
126            handling many (> 1M) files.
127    
128    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
129    
130            * R/source.R (getElem.URISource): Use encoding argument.
131    
132    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
133    
134            * R/doc.R (setOldClass): Register S3 document classes to be
135            recognized by S4 methods.
136    
137    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
138    
139            * R/matrix.R (termFreq): Add option to remove punctuation
140            characters.
141    
142    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
145            merging multiple term-document matrices.
146    
147    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
148    
149            * R/corpus.R (setOldClass): Register S3 corpus classes to be
150            recognized by S4 methods.
151    
152            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
153            that CRAN Mac OS X builds do not fail any longer.
154    
155    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
156    
157            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
158            of RWeka:AlphabeticTokenizer() as default.
159    
160    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
161    
162            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
163            caused words at the beginning or the end of a line not to be removed. Do
164            not delete whitespace anymore.
165    
166    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/source.R (DirSource): Default to working directory if no path
169            is specified.
170    
171    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
172    
173            * R/source.R (DirSource): Stop on empty directories.
174    
175    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
178            named documents.
179    
180    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
181    
182            * R/transform.R (removeWords): Improve regular expressions.
183    
184    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
185    
186            * R/meta.R (DublinCore): Allow lower case tags.
187    
188    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
189    
190            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
191            instead of x$children.
192    
193    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
194    
195            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
196    
197    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
198    
199            * R/: Use S3 instead of S4 class system.
200    
201    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
202    
203            * R/reader.R (readMail): Moved to tm.plugin.mail package.
204    
205    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
206    
207            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
208            postings are basically e-mails with some extra headers.
209    
210    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
211    
212            * R/transform.R: Move convertMboxEml, removeCitation,
213            removeMultipart, and removeSignature to the tm.plugin.mail package
214            since they are mainly utility functions (for handling e-mails) and
215            not very framework specific.
216    
217    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
218    
219            * man/: Fix documentation.
220    
221    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
222    
223            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
224            plain text document instead of an XML document for texts of the
225            Reuters-21578 dataset.
226    
227            * R/sparse.R: Removed since the slam package is now available on
228            CRAN.
229    
230            * DESCRIPTION (Depends): Add slam package.
231    
232    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
233    
234            * R/transform.R (stemDoc): Fix character(0) handling.
235    
236    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
237    
238            * R/doc.R (show): Pretty print.
239    
240    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
241    
242            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
243            gracefully.
244    
245    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
246    
247            * R/corpus.R: Make corpus virtual. Implement corpus with standard
248            and permanent storage semantics.
249    
250            * DESCRIPTION: New major release. A *lot* of improvements.
251    
252    2009-05-04   Ingo Feinerer <feinerer@logic.at>
253    
254            * NAMESPACE: Export some simple_triplet_matrix functions.
255    
256    2009-04-28   Ingo Feinerer <feinerer@logic.at>
257    
258            * R/weight.R: Adapt tf-idf to new matrix format.
259    
260    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
261    
262            * R/matrix.R: Create two distinct classes for term-document and
263            document-term matrices.
264    
265    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
266    
267            * R/termdocmatrix.R: No longer use Matrix package. This reduces
268            package start-up time significantly.
269    
270    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
271    
272            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
273    
274    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
275    
276            * R/transform.R (tmReduce): Combine multiple maps into one
277            transformation.
278    
279    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
280    
281            * R/weight.R: Remove weightLogical since it does not return a
282            dgCMatrix.
283    
284            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
285            or TermDocumentMatrix instead.
286    
287    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
288    
289            * inst/doc/extensions.Rnw: Finished vignette.
290    
291    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
292    
293            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
294            DocumentTermMatrix representations.
295    
296    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
297    
298            * R/reader.R (readXML): New reader for arbitrary XML files.
299    
300    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
301    
302            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
303            (XMLSource): New XMLSource class for arbitrary XML files.
304            (Source): New slot Vectorized.
305    
306    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
307    
308            * R/reader.R (readTabular): Experimental reader for tabular data
309            structures which can be customized via user-defined mappings.
310    
311            * R/reader.R: Always use UTC time zone.
312    
313            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
314    
315    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
316    
317            * R/reader.R (readDOC): Options can be passed over to antiword.
318    
319            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
320            pdftotext.
321    
322    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
323    
324            * R/source.R (DirSource): Add pattern and ignore.case arguments
325            which are internally passed over to list.files().
326    
327    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
328    
329            * inst/doc/tm.Rnw: Suppress pointless loading message.
330    
331    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
332    
333            * DESCRIPTION: Speed up package loading (via moving packages not
334            strictly necessary for normal operation to Suggests instead of
335            Depends).
336    
337    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
338    
339            * R/reader.R (readNewsgroup): The date format is now configurable.
340    
341    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
342    
343            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
344    
345    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
346    
347            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
348    
349    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
350    
351            * R/source.R (DataframeSource): New source class for data frames.
352    
353            * R/source.R: Fixed non-standard call evaluation.
354    
355    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
356    
357            * R/source.R (URISource): New source class for a single document.
358    
359    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
360    
361            * R/source.R: Refactoring.
362    
363    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
364    
365            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
366            Rmpi installations more gracefully.
367    
368    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
369    
370            * R/source.R (Source): Add Length slot.
371    
372    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
373    
374            * R/AAA.R: Unify duplicated .onLoad function.
375    
376    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
377    
378            * DESCRIPTION (Suggests): Added Rmpi.
379    
380    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
381    
382            * R/source.R (getElem): Fix 'no visible binding' warning.
383    
384            * man/WeightFunction.Rd: Fix signature.
385    
386    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
387    
388            * R/weight.R: Introduce name abbreviations for weighting functions.
389    
390    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
391    
392            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
393    
394            * R/cluster.R: Provide convenience functions for using a MPI
395            cluster.
396    
397            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
398            available.
399    
400            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
401            available.
402    
403    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
404    
405            * R/textdoccol.R (lapply): Removed debug print out.
406    
407    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
408    
409            * R/reader.R (readRCV1): Improved meta data extraction from
410            Reuters Corpus Volume 1 documents.
411    
412    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
413    
414            * R/transform.R: Ensure that all mappings preserve multiline
415            structures.
416    
417    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
418    
419            * R/filter.R: Every filter has now an attribute indicating whether
420            it sould be applied to document level (doclevel).
421    
422            * R/textdoccol.R (tmFilter): Set searchFullText as new default
423            filter.
424    
425    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
426    
427            * R/transform.R (replacePatterns): Replaced removeWords by
428            replacePatterns. Suggested by Christian Buchta.
429    
430            * R/textdoccol.R (inspect): Improved formatting.
431    
432    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
433    
434            * inst/CITATION: Updated JSS article information.
435    
436            * R/textdoccol.R (setAs): Added coerce method from list to
437            corpus.
438    
439            * R/meta.R (meta): Improved meta data handling.
440    
441    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
442    
443            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
444            Christian Buchta.
445    
446            * inst/CITATION: Added template to include JSS article reference.
447    
448    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
449    
450            * R/textdoccol.R (tmMap): Introduced lazy mapping.
451    
452            * R/source.R: Added VectorSource.
453    
454    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * man/: Language codes should be in ISO 639-1 format.
457    
458            * R/textdoccol.R (asPlain): Preserve local meta data.
459    
460    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
461    
462            * R/textdoccol.R (writeCorpus): Function for writing a corpus
463            containing plain text documents to disk.
464    
465    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
466    
467            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
468            always set correctly.
469    
470            * R/textdoccol.R: Set load = TRUE as default for load on demand
471            since in most cases this is the wanted behaviour.
472    
473    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
476    
477            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
478    
479    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
480    
481            * R/meta.R (meta): New function for consistent access to meta data
482            of document collections, repositories, and texts.
483    
484    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
485    
486            * R/: Better support for encodings.
487    
488    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
489    
490            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
491            selection when no reader argument is given.
492    
493    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
494    
495            * R/source.R (CSVSource): Now uses read.csv instead of scan
496            internally.
497    
498    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * R/reader.R (getReaders): Returns available reader functions.
501    
502            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
503            as default.
504    
505    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
506    
507            * R/stopwords.R (stopwords): Shortened code, removed codetools
508            variable warnings.
509    
510            * man/: Documentation for showMeta, added an example for tmMap.
511    
512            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
513            some minor typos fixed.
514    
515    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
516    
517            * R/aobjects.R (showMeta): Added method for pretty printing a
518            text document's meta data.
519    
520    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
521    
522            * R/textdoccol.R (TextDocCol): Better handling of empty
523            arguments.
524    
525            * NAMESPACE: Exported readDOC.
526    
527            * man/completeStems.Rd: Added an example.
528    
529    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
530    
531            * R/stopwords.R (stopwords): Look up .dat files at every
532            call. Allows users to modify stopword .dat files interactively.
533    
534    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
535    
536            * R/termdocmatrix.R (termFreq): Correct processing of empty
537            documents.
538    
539    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
540    
541            * man/: Updated documentation.
542    
543    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * R/complete.R (completeStems): Completes (heuristically) word
546            stems.
547    
548            * R/termdocmatrix.R (TermDocMatrix2): New modular
549            constructor.
550    
551            * NAMESPACE: Exported termFreq.
552    
553    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
554    
555            * R/reader.R (readDOC): Added MS Word reader (using antiword).
556    
557    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
558    
559            * R/weight.R: Weighting functions for TermDocMatrix.
560    
561    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
562    
563            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
564            functions for accessing dimension, column, and row names.
565    
566            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
567    
568    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
571    
572    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
573    
574            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
575    
576    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
577    
578            * R/reader.R (readPDF): Removed manual checks for pdftotext and
579            pdfinfo. The system call gives a warning anyway.
580    
581    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
582    
583            * R/textdoccol.R (asPlain): Conversion from
584            StructuredTextDocuments to PlainTextDocuments.
585    
586    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
587    
588            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
589            for accessing term-document matrices.
590    
591            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
592            are installed.
593    
594    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
595    
596            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
597            Christian Buchta.
598    
599    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
600    
601            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
602    
603    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
604    
605            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
606    
607            * R/reader.R (readPDF): Added PDF reader.
608    
609    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
610    
611            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
612    
613            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
614    
615            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
616    
617            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
618    
619    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
620    
621            * R/distmeasure.R (dissimilarity): Replaced dists call from
622            package cba by new dist call from package proxy.
623    
624    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
625    
626            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
627    
628    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
629    
630            * R/termdocmatrix.R: require() uses the quietly option to suppress
631            loading messages.
632    
633    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
634    
635            * R/dictionary.R: Added dictionary support.
636    
637    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
638    
639            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
640            documents. This simplifies some functions, e.g., asPlain.
641    
642    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
643    
644            * inst/doc/tm.Rnw: Fixed some typos in vignette.
645    
646    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
647    
648            * R/textdoccol.R (replaceWords): Added method to replace a set of
649            words by a single word. Useful for synonyms.
650    
651    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
652    
653            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
654    
655    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
658            vectors. Thanks to Ariel Maguyon for his error report.
659            (removeSparseTerms): New function to remove columns from a
660            term-document matrix exceeding a sparse factor.
661    
662    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
663    
664            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
665    
666    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
667    
668            * man/sFilter.Rd: Corrected documentation on statement format (use
669            '==' instead of '=').
670    
671    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
672    
673            * R/aobjects.R (StructuredTextDocument): Inherits from
674            TextDocument.
675    
676    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
677    
678            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
679            on sparse matrices as proposed by Martin Maechler.
680    
681    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
682    
683            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
684            \pkg{filehash} version makes them deprecated.
685    
686    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
687    
688            * R/termdocmatrix.R (textvector): Stemming is now performed before
689            erasing stopwords.
690            (weightMatrix): Adapted to handle sparse matrices.
691            (TermDocMatrix): Sparse matrix is now efficiently built by
692            direct stepwise insertion of row values into it.
693    
694    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
695    
696            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
697            due to ongoing problems. For our purposes the latter is as useful
698            as the replaced package.
699    
700    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
701    
702            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
703    
704            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
705    
706    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
707    
708            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
709            languages with available stopwords.
710    
711    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
712    
713            * inst/doc/tm.Rnw: Minor corrections in the vignette.
714    
715    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
716    
717            * DESCRIPTION: Update to version 0.2, since a lot of new features
718            have been integrated.
719    
720            * inst/stopwords: Updated existing stopwords and added stopwords
721            for various other languages.
722    
723    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
724    
725            * man/: Updated documentation.
726    
727            * Work/testDb.R: Script to test database stuff.
728    
729            * R/: Fixed various database related bugs. Seems to be rather
730            useable now, i.e., consider as alpha status for now.
731    
732    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
733    
734            * R/: Fixed some bugs related to database support.
735    
736    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
737    
738            * man/: Added a lot of examples to the manuals.
739    
740    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
741    
742            * man/: Updated parts of the documentation.
743    
744            * R/textdoccol.R (asPlain): Added conversion from newsgroup
745            documents to plain text documents.
746    
747    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
748    
749            * R/textdoccol.R: Finished experimental database support. Not yet
750            intensively tested.
751    
752            * R/source.R: Now each source has a default reader.
753    
754            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
755            class anymore.
756    
757            * R/plaintextdoc.R: Custom show method for plain text documents.
758    
759            * R/aobjects.R: Added a class for structured text documents.
760    
761            * R/reader.R: Replaced remaining \code{parser} occurrences with
762            \code{reader}.
763    
764            * R/textdoccol.R (summary): Indent tags.
765    
766            * R/textdoccol.R (removePunctuation): Transform method to remove
767            punctuation marks.
768    
769    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
770    
771            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
772            using prescindMeta().
773    
774    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
775    
776            * R/textdoccol.R: Improved database support.
777    
778    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
779    
780            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
781    
782            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
783            language code.
784    
785            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
786            into parserControl argument.
787    
788            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
789    
790    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
791    
792            * Work/tmDataSetup.R: The datasets acq and crude can now be
793            created on the fly.
794    
795            * R/stopwords.R: Introduced a function returning the stopwords for
796            a given language (English, German and French at the moment)
797    
798            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
799            otherwise falls back to Snowball package.
800    
801    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
802    
803            * man/dissimilarity-methods.Rd: Make clear that any method offered
804            by "dists" from package "cba" can be used.
805    
806    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
807    
808            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
809            to Kurt's latex suggestion. Removed points and underscores in
810            variable names for consistent naming.
811    
812            * DESCRIPTION: Update to version 0.1-2.
813    
814            * man/TextRepository.Rd: Fixed bug in documentation.
815    
816    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
817    
818            * DESCRIPTION: Update to version 0.1-1.
819    
820    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
821    
822            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
823            wordStem.
824    
825    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
826    
827            * R/: Changes due to Kurt's review.
828    
829    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
830    
831            * R/: Implemented improvements based upon comments by David
832            Meyer.
833    
834    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
835    
836            * inst/doc/: Rewrote vignette.
837    
838            * man/: Improved documentation.
839    
840    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
841    
842            * man/: Updated documentation.
843    
844            * DESCRIPTION: Changed package name to "tm". Updated version to
845            0.1 for first CRAN release.
846    
847            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
848            list archive example.
849    
850            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
851            archive example.
852    
853            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
854            from (several mails per box) mbox format to (single mail per file)
855            eml format.
856    
857    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
858    
859            * data/crude.rda: Rebuilt.
860    
861            * data/acq.rda: Rebuilt.
862    
863            * R/reader.R: Factored out reader and parser methods from
864            textdoccol.R.
865    
866            * R/source.R: Factored out Source methods from aobjects.R and
867            textdoccol.R.
868            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
869            feeds.
870    
871            * R/textdoccol.R (DirSource): Added support for recursive
872            traversal of directories.
873    
874    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
875    
876            * R/textdoccol.R ([[): Loads the document corpus automatically
877            into memory upon access.
878            (tm_transform, tm_filter): Removed several checks whether the
879            document is already loaded ([[ ensures this now).
880            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
881            mailing list archive.
882    
883    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
884    
885            * R/aobjects.R (TextDocument): Is now a virtual class.
886            (Source): Is now a virtual class.
887    
888    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
889    
890            * R/textdoccol.R (c): Support for an arbitrary number of document
891            collections.
892    
893    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
894    
895            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
896            append_meta and remove_meta.
897    
898            * R/textdoccol.R: Removed modify_metadata method.
899    
900            * R/textrepo.R: Removed modify_metadata method.
901    
902            * R/textdoccol.R (remove_meta): Supports removal of document
903            collection metadata and document (= in data frame) metadata.
904    
905    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
906    
907            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
908    
909            * data/crude.rda: Rebuilt.
910    
911            * data/acq.rda: Rebuilt.
912    
913            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
914    
915            * R/textdoccol.R ([): Bug fix for subsetting a document
916            collection's data frame.
917    
918    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
919    
920            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
921            to s_filter.
922    
923            * R/textdoccol.R: Local text documents' metadata can now be copied
924            to a document collection's data frame with prescind_meta.
925    
926    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
927    
928            * R/: Text documents' slot metadata is now accessible in s_filter.
929    
930            * R/: Rewrote s_filter function (has still some restrictions).
931    
932    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
933    
934            * R/: Various fixes in handling metadata.
935    
936            * R/: Added update mechanism for text document collections.
937    
938    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
939    
940            * R/: Merging of document collections now creates a binary tree
941            for reconstructing merged document collections.
942    
943            * R/: Redesign of metadata for document collections.
944    
945    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
946    
947            * R/: Messages now use \code{ngettext}.
948    
949    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
950    
951            * R/: Added functions for modifying and removing metadata.
952    
953    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
954    
955            * man/: Updated some documentation.
956    
957            * R/: Corrected some connection issues.
958    
959            * inst/doc: Worked on the vignette.
960    
961    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
962    
963            * inst/: Added texts and started vignette.
964    
965            * R/: Final changes based upon David's comments.
966    
967    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
968    
969            * NAMESPACE: Corrected exports (generic methods need exportMethods
970            directives!).
971    
972    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
973    
974            * R/: Modified the TextDocCol constructur and various parsers. It
975            is now modular and supports various file formats via plugins (see
976            the new "Source" class).
977    
978    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
979    
980            * man/: Revised documentation after previous code changes.
981    
982    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
983    
984            * R/: Remaining changes as discussed with David.
985    
986    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
987    
988            * R/: Some changes as suggested by David. The rest will follow
989            within the next days.
990    
991    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
992    
993            * man/: Finished documentation.
994    
995    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
996    
997            * man/: Wrote some documentation.
998    
999    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1000    
1001            * R/: Further syntactic sugar in form of additional assignment and
1002            accessor methods.
1003    
1004    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1005    
1006            * R/: Syntactic sugar in form of "length", "show" and "summary"
1007            operators.
1008    
1009    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1010    
1011            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1012            and dissimilarities.
1013    
1014    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1015    
1016            * R/: Added similarity functions.
1017    
1018            * data/: Added english stopwords.
1019    
1020    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1021    
1022            * data/: Examples compiled for new features
1023    
1024            * R/: Changes due to new structure.
1025    
1026            * NAMESPACE: Corrected namespace to reflect new structure.
1027    
1028            * R/termdocmatrix.R: Adapted for new naming scheme.
1029    
1030    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1031    
1032            * R/textdoccol.R: Adapted code for new class structure. Wrote
1033            several transform and filter functions operating on text document
1034            collections (alias text document databases).
1035    
1036            * R/aobjects.R: Adapted class structure with inheritance,
1037            repositories and additional meta data. Loading files on demand is
1038            now possible.
1039    
1040    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1041    
1042            * R/: Some cosmetic cleanups.
1043    
1044            * inst/: Removed vignette on clustering. That and much more is now
1045            described in the JSS paper on text mining. Based upon that
1046            article an elaborated vignette will be incorporated in the future.
1047    
1048    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1049    
1050            * R/: Updated generic S4 methods to comply with signature changes
1051            in newer versions of R (> 2.3)
1052    
1053    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1054    
1055            * ext/R/importRIS.R: Automatic RIS import is now possible.
1056    
1057    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1058    
1059            * R/textdoccol.R: Added RIS HTML input format.
1060    
1061    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1062    
1063            * R/textdoccol.R: Removed bug that caused invalid text document
1064            collections when handling many input files.
1065    
1066    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1067    
1068            * R/textdoccol.R: Restructured and extended file import
1069            mechanism.
1070    
1071            * inst/doc/clustering.Rnw: Adapted vignette for use with
1072            ReutNews.rda
1073    
1074            * man/ReutNews.Rd: Documentation for ReutNews.rda
1075    
1076            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1077    
1078    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1079    
1080            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1081            clustering facilities of this package.
1082    
1083    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1084    
1085            * R/aobjects.R: Changed package document structure to avoid class
1086            dependency problems.
1087    
1088  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1089    
1090            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1091            data set.
1092    
1093          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
1094          CMD check textmin" works without errors.          CMD check textmin" works without errors.
1095    

Legend:
Removed from v.28  
changed lines
  Added in v.1149

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge