SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC pkg/ChangeLog revision 1084, Fri Aug 6 21:47:23 2010 UTC
# Line 1  Line 1 
1    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
4    
5    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
6    
7            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
8            remove terms not occurring in the corpus anymore.
9    
10    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
11    
12            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
13            and Heaps' law.
14    
15    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
16    
17            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
18            provided by a source.
19    
20    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
21    
22            * R/source.R (.Source): Provide document names.
23    
24    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
25    
26            * R/meta.R (`content_or_meta`): Utility function.
27    
28    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
29    
30            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
31            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
32    
33    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
34    
35            * R/weight.R (weightTfIdf): Added normalization option.
36    
37            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
38            analysis.
39    
40    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/score.R (tm_tag_score): Compute a score from the number of
43            tags matching in a document.
44    
45    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
46    
47            * R/complete.R (stemCompletion): New completion heuristics.
48    
49    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
50    
51            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
52    
53    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
54    
55            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
56            setOldClass(c(..., "list")) works.
57    
58    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
59    
60            * R/transform.R (stemDocument.character): In case input is a
61            simple character just delegate to the default Snowball stemmer.
62    
63    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
66            data.
67    
68    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
69    
70            * R/doc.R (`Content<-`): Be careful with names attribute.
71    
72    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
73    
74            * R/source.R (DirSource): Improved implementation especially when
75            handling many (> 1M) files.
76    
77    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
78    
79            * R/source.R (getElem.URISource): Use encoding argument.
80    
81    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
82    
83            * R/doc.R (setOldClass): Register S3 document classes to be
84            recognized by S4 methods.
85    
86    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
87    
88            * R/matrix.R (termFreq): Add option to remove punctuation
89            characters.
90    
91    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
92    
93            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
94            merging multiple term-document matrices.
95    
96    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
97    
98            * R/corpus.R (setOldClass): Register S3 corpus classes to be
99            recognized by S4 methods.
100    
101            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
102            that CRAN Mac OS X builds do not fail any longer.
103    
104    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
105    
106            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
107            of RWeka:AlphabeticTokenizer() as default.
108    
109    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
110    
111            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
112            caused words at the beginning or the end of a line not to be removed. Do
113            not delete whitespace anymore.
114    
115    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
116    
117            * R/source.R (DirSource): Default to working directory if no path
118            is specified.
119    
120    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
121    
122            * R/source.R (DirSource): Stop on empty directories.
123    
124    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
125    
126            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
127            named documents.
128    
129    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
130    
131            * R/transform.R (removeWords): Improve regular expressions.
132    
133    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
134    
135            * R/meta.R (DublinCore): Allow lower case tags.
136    
137    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
138    
139            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
140            instead of x$children.
141    
142    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
145    
146    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
147    
148            * R/: Use S3 instead of S4 class system.
149    
150    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
151    
152            * R/reader.R (readMail): Moved to tm.plugin.mail package.
153    
154    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
155    
156            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
157            postings are basically e-mails with some extra headers.
158    
159    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
160    
161            * R/transform.R: Move convertMboxEml, removeCitation,
162            removeMultipart, and removeSignature to the tm.plugin.mail package
163            since they are mainly utility functions (for handling e-mails) and
164            not very framework specific.
165    
166    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
167    
168            * man/: Fix documentation.
169    
170    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
173            plain text document instead of an XML document for texts of the
174            Reuters-21578 dataset.
175    
176            * R/sparse.R: Removed since the slam package is now available on
177            CRAN.
178    
179            * DESCRIPTION (Depends): Add slam package.
180    
181    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
182    
183            * R/transform.R (stemDoc): Fix character(0) handling.
184    
185    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
186    
187            * R/doc.R (show): Pretty print.
188    
189    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
190    
191            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
192            gracefully.
193    
194    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
195    
196            * R/corpus.R: Make corpus virtual. Implement corpus with standard
197            and permanent storage semantics.
198    
199            * DESCRIPTION: New major release. A *lot* of improvements.
200    
201    2009-05-04   Ingo Feinerer <feinerer@logic.at>
202    
203            * NAMESPACE: Export some simple_triplet_matrix functions.
204    
205    2009-04-28   Ingo Feinerer <feinerer@logic.at>
206    
207            * R/weight.R: Adapt tf-idf to new matrix format.
208    
209    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
210    
211            * R/matrix.R: Create two distinct classes for term-document and
212            document-term matrices.
213    
214    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
215    
216            * R/termdocmatrix.R: No longer use Matrix package. This reduces
217            package start-up time significantly.
218    
219    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
220    
221            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
222    
223    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
224    
225            * R/transform.R (tmReduce): Combine multiple maps into one
226            transformation.
227    
228    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
229    
230            * R/weight.R: Remove weightLogical since it does not return a
231            dgCMatrix.
232    
233            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
234            or TermDocumentMatrix instead.
235    
236    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
237    
238            * inst/doc/extensions.Rnw: Finished vignette.
239    
240    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
241    
242            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
243            DocumentTermMatrix representations.
244    
245    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
246    
247            * R/reader.R (readXML): New reader for arbitrary XML files.
248    
249    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
250    
251            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
252            (XMLSource): New XMLSource class for arbitrary XML files.
253            (Source): New slot Vectorized.
254    
255    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
256    
257            * R/reader.R (readTabular): Experimental reader for tabular data
258            structures which can be customized via user-defined mappings.
259    
260            * R/reader.R: Always use UTC time zone.
261    
262            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
263    
264    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
265    
266            * R/reader.R (readDOC): Options can be passed over to antiword.
267    
268            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
269            pdftotext.
270    
271    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
272    
273            * R/source.R (DirSource): Add pattern and ignore.case arguments
274            which are internally passed over to list.files().
275    
276    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
277    
278            * inst/doc/tm.Rnw: Suppress pointless loading message.
279    
280    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
281    
282            * DESCRIPTION: Speed up package loading (via moving packages not
283            strictly necessary for normal operation to Suggests instead of
284            Depends).
285    
286    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
287    
288            * R/reader.R (readNewsgroup): The date format is now configurable.
289    
290    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
291    
292            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
293    
294    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
295    
296            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
297    
298    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
299    
300            * R/source.R (DataframeSource): New source class for data frames.
301    
302            * R/source.R: Fixed non-standard call evaluation.
303    
304    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
305    
306            * R/source.R (URISource): New source class for a single document.
307    
308    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
309    
310            * R/source.R: Refactoring.
311    
312    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
313    
314            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
315            Rmpi installations more gracefully.
316    
317    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
318    
319            * R/source.R (Source): Add Length slot.
320    
321    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/AAA.R: Unify duplicated .onLoad function.
324    
325    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
326    
327            * DESCRIPTION (Suggests): Added Rmpi.
328    
329    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
330    
331            * R/source.R (getElem): Fix 'no visible binding' warning.
332    
333            * man/WeightFunction.Rd: Fix signature.
334    
335    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
336    
337            * R/weight.R: Introduce name abbreviations for weighting functions.
338    
339    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
340    
341            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
342    
343            * R/cluster.R: Provide convenience functions for using a MPI
344            cluster.
345    
346            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
347            available.
348    
349            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
350            available.
351    
352    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
353    
354            * R/textdoccol.R (lapply): Removed debug print out.
355    
356    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
357    
358            * R/reader.R (readRCV1): Improved meta data extraction from
359            Reuters Corpus Volume 1 documents.
360    
361    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
362    
363            * R/transform.R: Ensure that all mappings preserve multiline
364            structures.
365    
366    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
367    
368            * R/filter.R: Every filter has now an attribute indicating whether
369            it sould be applied to document level (doclevel).
370    
371            * R/textdoccol.R (tmFilter): Set searchFullText as new default
372            filter.
373    
374    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
375    
376            * R/transform.R (replacePatterns): Replaced removeWords by
377            replacePatterns. Suggested by Christian Buchta.
378    
379            * R/textdoccol.R (inspect): Improved formatting.
380    
381    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
382    
383            * inst/CITATION: Updated JSS article information.
384    
385            * R/textdoccol.R (setAs): Added coerce method from list to
386            corpus.
387    
388            * R/meta.R (meta): Improved meta data handling.
389    
390    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
391    
392            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
393            Christian Buchta.
394    
395            * inst/CITATION: Added template to include JSS article reference.
396    
397    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
398    
399            * R/textdoccol.R (tmMap): Introduced lazy mapping.
400    
401            * R/source.R: Added VectorSource.
402    
403    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
404    
405            * man/: Language codes should be in ISO 639-1 format.
406    
407            * R/textdoccol.R (asPlain): Preserve local meta data.
408    
409    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
410    
411            * R/textdoccol.R (writeCorpus): Function for writing a corpus
412            containing plain text documents to disk.
413    
414    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
415    
416            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
417            always set correctly.
418    
419            * R/textdoccol.R: Set load = TRUE as default for load on demand
420            since in most cases this is the wanted behaviour.
421    
422    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
423    
424            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
425    
426            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
427    
428    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
429    
430            * R/meta.R (meta): New function for consistent access to meta data
431            of document collections, repositories, and texts.
432    
433    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
434    
435            * R/: Better support for encodings.
436    
437    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
438    
439            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
440            selection when no reader argument is given.
441    
442    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
443    
444            * R/source.R (CSVSource): Now uses read.csv instead of scan
445            internally.
446    
447    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
448    
449            * R/reader.R (getReaders): Returns available reader functions.
450    
451            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
452            as default.
453    
454    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * R/stopwords.R (stopwords): Shortened code, removed codetools
457            variable warnings.
458    
459            * man/: Documentation for showMeta, added an example for tmMap.
460    
461            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
462            some minor typos fixed.
463    
464    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
465    
466            * R/aobjects.R (showMeta): Added method for pretty printing a
467            text document's meta data.
468    
469    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
470    
471            * R/textdoccol.R (TextDocCol): Better handling of empty
472            arguments.
473    
474            * NAMESPACE: Exported readDOC.
475    
476            * man/completeStems.Rd: Added an example.
477    
478    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * R/stopwords.R (stopwords): Look up .dat files at every
481            call. Allows users to modify stopword .dat files interactively.
482    
483    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
484    
485            * R/termdocmatrix.R (termFreq): Correct processing of empty
486            documents.
487    
488    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
489    
490            * man/: Updated documentation.
491    
492    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * R/complete.R (completeStems): Completes (heuristically) word
495            stems.
496    
497            * R/termdocmatrix.R (TermDocMatrix2): New modular
498            constructor.
499    
500            * NAMESPACE: Exported termFreq.
501    
502    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
503    
504            * R/reader.R (readDOC): Added MS Word reader (using antiword).
505    
506    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * R/weight.R: Weighting functions for TermDocMatrix.
509    
510    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
511    
512            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
513            functions for accessing dimension, column, and row names.
514    
515            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
516    
517    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
520    
521    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
524    
525    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
526    
527            * R/reader.R (readPDF): Removed manual checks for pdftotext and
528            pdfinfo. The system call gives a warning anyway.
529    
530    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
531    
532            * R/textdoccol.R (asPlain): Conversion from
533            StructuredTextDocuments to PlainTextDocuments.
534    
535    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
536    
537            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
538            for accessing term-document matrices.
539    
540            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
541            are installed.
542    
543    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
546            Christian Buchta.
547    
548    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
549    
550            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
551    
552    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
553    
554            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
555    
556            * R/reader.R (readPDF): Added PDF reader.
557    
558    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
559    
560            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
561    
562            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
563    
564            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
565    
566            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
567    
568    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/distmeasure.R (dissimilarity): Replaced dists call from
571            package cba by new dist call from package proxy.
572    
573    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
576    
577    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
578    
579            * R/termdocmatrix.R: require() uses the quietly option to suppress
580            loading messages.
581    
582    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
583    
584            * R/dictionary.R: Added dictionary support.
585    
586    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
587    
588            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
589            documents. This simplifies some functions, e.g., asPlain.
590    
591    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
592    
593            * inst/doc/tm.Rnw: Fixed some typos in vignette.
594    
595    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
596    
597            * R/textdoccol.R (replaceWords): Added method to replace a set of
598            words by a single word. Useful for synonyms.
599    
600    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
601    
602            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
603    
604    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
605    
606            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
607            vectors. Thanks to Ariel Maguyon for his error report.
608            (removeSparseTerms): New function to remove columns from a
609            term-document matrix exceeding a sparse factor.
610    
611    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
612    
613            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
614    
615    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
616    
617            * man/sFilter.Rd: Corrected documentation on statement format (use
618            '==' instead of '=').
619    
620    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
621    
622            * R/aobjects.R (StructuredTextDocument): Inherits from
623            TextDocument.
624    
625    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
626    
627            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
628            on sparse matrices as proposed by Martin Maechler.
629    
630    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
631    
632            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
633            \pkg{filehash} version makes them deprecated.
634    
635    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
636    
637            * R/termdocmatrix.R (textvector): Stemming is now performed before
638            erasing stopwords.
639            (weightMatrix): Adapted to handle sparse matrices.
640            (TermDocMatrix): Sparse matrix is now efficiently built by
641            direct stepwise insertion of row values into it.
642    
643    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
644    
645            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
646            due to ongoing problems. For our purposes the latter is as useful
647            as the replaced package.
648    
649    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
650    
651            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
652    
653            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
654    
655    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
658            languages with available stopwords.
659    
660    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
661    
662            * inst/doc/tm.Rnw: Minor corrections in the vignette.
663    
664    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
665    
666            * DESCRIPTION: Update to version 0.2, since a lot of new features
667            have been integrated.
668    
669            * inst/stopwords: Updated existing stopwords and added stopwords
670            for various other languages.
671    
672    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
673    
674            * man/: Updated documentation.
675    
676            * Work/testDb.R: Script to test database stuff.
677    
678            * R/: Fixed various database related bugs. Seems to be rather
679            useable now, i.e., consider as alpha status for now.
680    
681    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
682    
683            * R/: Fixed some bugs related to database support.
684    
685    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
686    
687            * man/: Added a lot of examples to the manuals.
688    
689    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
690    
691            * man/: Updated parts of the documentation.
692    
693            * R/textdoccol.R (asPlain): Added conversion from newsgroup
694            documents to plain text documents.
695    
696    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
697    
698            * R/textdoccol.R: Finished experimental database support. Not yet
699            intensively tested.
700    
701            * R/source.R: Now each source has a default reader.
702    
703            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
704            class anymore.
705    
706            * R/plaintextdoc.R: Custom show method for plain text documents.
707    
708            * R/aobjects.R: Added a class for structured text documents.
709    
710            * R/reader.R: Replaced remaining \code{parser} occurrences with
711            \code{reader}.
712    
713            * R/textdoccol.R (summary): Indent tags.
714    
715            * R/textdoccol.R (removePunctuation): Transform method to remove
716            punctuation marks.
717    
718    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
719    
720            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
721            using prescindMeta().
722    
723    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
724    
725            * R/textdoccol.R: Improved database support.
726    
727    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
728    
729            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
730    
731            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
732            language code.
733    
734            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
735            into parserControl argument.
736    
737            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
738    
739    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
740    
741            * Work/tmDataSetup.R: The datasets acq and crude can now be
742            created on the fly.
743    
744            * R/stopwords.R: Introduced a function returning the stopwords for
745            a given language (English, German and French at the moment)
746    
747            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
748            otherwise falls back to Snowball package.
749    
750    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
751    
752            * man/dissimilarity-methods.Rd: Make clear that any method offered
753            by "dists" from package "cba" can be used.
754    
755    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
756    
757            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
758            to Kurt's latex suggestion. Removed points and underscores in
759            variable names for consistent naming.
760    
761            * DESCRIPTION: Update to version 0.1-2.
762    
763            * man/TextRepository.Rd: Fixed bug in documentation.
764    
765    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
766    
767            * DESCRIPTION: Update to version 0.1-1.
768    
769    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
770    
771            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
772            wordStem.
773    
774    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
775    
776            * R/: Changes due to Kurt's review.
777    
778    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
779    
780            * R/: Implemented improvements based upon comments by David
781            Meyer.
782    
783    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
784    
785            * inst/doc/: Rewrote vignette.
786    
787            * man/: Improved documentation.
788    
789    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
790    
791            * man/: Updated documentation.
792    
793            * DESCRIPTION: Changed package name to "tm". Updated version to
794            0.1 for first CRAN release.
795    
796            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
797            list archive example.
798    
799            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
800            archive example.
801    
802            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
803            from (several mails per box) mbox format to (single mail per file)
804            eml format.
805    
806    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
807    
808            * data/crude.rda: Rebuilt.
809    
810            * data/acq.rda: Rebuilt.
811    
812            * R/reader.R: Factored out reader and parser methods from
813            textdoccol.R.
814    
815            * R/source.R: Factored out Source methods from aobjects.R and
816            textdoccol.R.
817            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
818            feeds.
819    
820            * R/textdoccol.R (DirSource): Added support for recursive
821            traversal of directories.
822    
823    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
824    
825            * R/textdoccol.R ([[): Loads the document corpus automatically
826            into memory upon access.
827            (tm_transform, tm_filter): Removed several checks whether the
828            document is already loaded ([[ ensures this now).
829            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
830            mailing list archive.
831    
832    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
833    
834            * R/aobjects.R (TextDocument): Is now a virtual class.
835            (Source): Is now a virtual class.
836    
837    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
838    
839            * R/textdoccol.R (c): Support for an arbitrary number of document
840            collections.
841    
842    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
843    
844            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
845            append_meta and remove_meta.
846    
847            * R/textdoccol.R: Removed modify_metadata method.
848    
849            * R/textrepo.R: Removed modify_metadata method.
850    
851            * R/textdoccol.R (remove_meta): Supports removal of document
852            collection metadata and document (= in data frame) metadata.
853    
854    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
855    
856            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
857    
858            * data/crude.rda: Rebuilt.
859    
860            * data/acq.rda: Rebuilt.
861    
862            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
863    
864            * R/textdoccol.R ([): Bug fix for subsetting a document
865            collection's data frame.
866    
867    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
870            to s_filter.
871    
872            * R/textdoccol.R: Local text documents' metadata can now be copied
873            to a document collection's data frame with prescind_meta.
874    
875    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
876    
877            * R/: Text documents' slot metadata is now accessible in s_filter.
878    
879            * R/: Rewrote s_filter function (has still some restrictions).
880    
881    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
882    
883            * R/: Various fixes in handling metadata.
884    
885            * R/: Added update mechanism for text document collections.
886    
887    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
888    
889            * R/: Merging of document collections now creates a binary tree
890            for reconstructing merged document collections.
891    
892            * R/: Redesign of metadata for document collections.
893    
894    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
895    
896            * R/: Messages now use \code{ngettext}.
897    
898    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
899    
900            * R/: Added functions for modifying and removing metadata.
901    
902    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
903    
904            * man/: Updated some documentation.
905    
906            * R/: Corrected some connection issues.
907    
908            * inst/doc: Worked on the vignette.
909    
910    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
911    
912            * inst/: Added texts and started vignette.
913    
914            * R/: Final changes based upon David's comments.
915    
916    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
917    
918            * NAMESPACE: Corrected exports (generic methods need exportMethods
919            directives!).
920    
921    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
922    
923            * R/: Modified the TextDocCol constructur and various parsers. It
924            is now modular and supports various file formats via plugins (see
925            the new "Source" class).
926    
927    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
928    
929            * man/: Revised documentation after previous code changes.
930    
931    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
932    
933            * R/: Remaining changes as discussed with David.
934    
935    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
936    
937            * R/: Some changes as suggested by David. The rest will follow
938            within the next days.
939    
940    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
941    
942            * man/: Finished documentation.
943    
944    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
945    
946            * man/: Wrote some documentation.
947    
948    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
949    
950            * R/: Further syntactic sugar in form of additional assignment and
951            accessor methods.
952    
953    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
954    
955            * R/: Syntactic sugar in form of "length", "show" and "summary"
956            operators.
957    
958    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
959    
960            * R/: Diverse updates. Mainly on default operators ("[" or "c")
961            and dissimilarities.
962    
963    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
964    
965            * R/: Added similarity functions.
966    
967            * data/: Added english stopwords.
968    
969    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
970    
971            * data/: Examples compiled for new features
972    
973            * R/: Changes due to new structure.
974    
975            * NAMESPACE: Corrected namespace to reflect new structure.
976    
977            * R/termdocmatrix.R: Adapted for new naming scheme.
978    
979    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
980    
981            * R/textdoccol.R: Adapted code for new class structure. Wrote
982            several transform and filter functions operating on text document
983            collections (alias text document databases).
984    
985            * R/aobjects.R: Adapted class structure with inheritance,
986            repositories and additional meta data. Loading files on demand is
987            now possible.
988    
989    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
990    
991            * R/: Some cosmetic cleanups.
992    
993            * inst/: Removed vignette on clustering. That and much more is now
994            described in the JSS paper on text mining. Based upon that
995            article an elaborated vignette will be incorporated in the future.
996    
997    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
998    
999            * R/: Updated generic S4 methods to comply with signature changes
1000            in newer versions of R (> 2.3)
1001    
1002    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1003    
1004            * ext/R/importRIS.R: Automatic RIS import is now possible.
1005    
1006    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1007    
1008            * R/textdoccol.R: Added RIS HTML input format.
1009    
1010    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1011    
1012            * R/textdoccol.R: Removed bug that caused invalid text document
1013            collections when handling many input files.
1014    
1015    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1016    
1017            * R/textdoccol.R: Restructured and extended file import
1018            mechanism.
1019    
1020            * inst/doc/clustering.Rnw: Adapted vignette for use with
1021            ReutNews.rda
1022    
1023            * man/ReutNews.Rd: Documentation for ReutNews.rda
1024    
1025            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1026    
1027    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1028    
1029            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1030            clustering facilities of this package.
1031    
1032    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1033    
1034            * R/aobjects.R: Changed package document structure to avoid class
1035            dependency problems.
1036    
1037  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1040            data set.
1041    
1042          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
1043          CMD check textmin" works without errors.          CMD check textmin" works without errors.
1044    

Legend:
Removed from v.28  
changed lines
  Added in v.1084

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge