SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 26, Sat Dec 3 15:20:17 2005 UTC pkg/ChangeLog revision 1108, Fri Oct 22 18:32:47 2010 UTC
# Line 1  Line 1 
1    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
4            (instead of a list element).
5    
6    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
9            documents by names (fallback to IDs if names are not set).
10    
11    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
14            \code{recursive} now determines whether existing corpus meta data
15            is used.
16    
17    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
18    
19            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
20    
21    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
22    
23            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
24            remove terms not occurring in the corpus anymore.
25    
26    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
27    
28            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
29            and Heaps' law.
30    
31    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
32    
33            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
34            provided by a source.
35    
36    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
37    
38            * R/source.R (.Source): Provide document names.
39    
40    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/meta.R (`content_or_meta`): Utility function.
43    
44    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
45    
46            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
47            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
48    
49    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
50    
51            * R/weight.R (weightTfIdf): Added normalization option.
52    
53            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
54            analysis.
55    
56    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
57    
58            * R/score.R (tm_tag_score): Compute a score from the number of
59            tags matching in a document.
60    
61    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
62    
63            * R/complete.R (stemCompletion): New completion heuristics.
64    
65    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
66    
67            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
68    
69    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
70    
71            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
72            setOldClass(c(..., "list")) works.
73    
74    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
75    
76            * R/transform.R (stemDocument.character): In case input is a
77            simple character just delegate to the default Snowball stemmer.
78    
79    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
80    
81            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
82            data.
83    
84    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
85    
86            * R/doc.R (`Content<-`): Be careful with names attribute.
87    
88    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
89    
90            * R/source.R (DirSource): Improved implementation especially when
91            handling many (> 1M) files.
92    
93    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
94    
95            * R/source.R (getElem.URISource): Use encoding argument.
96    
97    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/doc.R (setOldClass): Register S3 document classes to be
100            recognized by S4 methods.
101    
102    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/matrix.R (termFreq): Add option to remove punctuation
105            characters.
106    
107    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
110            merging multiple term-document matrices.
111    
112    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/corpus.R (setOldClass): Register S3 corpus classes to be
115            recognized by S4 methods.
116    
117            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
118            that CRAN Mac OS X builds do not fail any longer.
119    
120    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
121    
122            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
123            of RWeka:AlphabeticTokenizer() as default.
124    
125    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
126    
127            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
128            caused words at the beginning or the end of a line not to be removed. Do
129            not delete whitespace anymore.
130    
131    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/source.R (DirSource): Default to working directory if no path
134            is specified.
135    
136    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
137    
138            * R/source.R (DirSource): Stop on empty directories.
139    
140    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
141    
142            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
143            named documents.
144    
145    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
146    
147            * R/transform.R (removeWords): Improve regular expressions.
148    
149    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
150    
151            * R/meta.R (DublinCore): Allow lower case tags.
152    
153    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
154    
155            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
156            instead of x$children.
157    
158    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
159    
160            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
161    
162    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
163    
164            * R/: Use S3 instead of S4 class system.
165    
166    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/reader.R (readMail): Moved to tm.plugin.mail package.
169    
170    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
173            postings are basically e-mails with some extra headers.
174    
175    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/transform.R: Move convertMboxEml, removeCitation,
178            removeMultipart, and removeSignature to the tm.plugin.mail package
179            since they are mainly utility functions (for handling e-mails) and
180            not very framework specific.
181    
182    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
183    
184            * man/: Fix documentation.
185    
186    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
187    
188            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
189            plain text document instead of an XML document for texts of the
190            Reuters-21578 dataset.
191    
192            * R/sparse.R: Removed since the slam package is now available on
193            CRAN.
194    
195            * DESCRIPTION (Depends): Add slam package.
196    
197    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
198    
199            * R/transform.R (stemDoc): Fix character(0) handling.
200    
201    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
202    
203            * R/doc.R (show): Pretty print.
204    
205    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
206    
207            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
208            gracefully.
209    
210    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
211    
212            * R/corpus.R: Make corpus virtual. Implement corpus with standard
213            and permanent storage semantics.
214    
215            * DESCRIPTION: New major release. A *lot* of improvements.
216    
217    2009-05-04   Ingo Feinerer <feinerer@logic.at>
218    
219            * NAMESPACE: Export some simple_triplet_matrix functions.
220    
221    2009-04-28   Ingo Feinerer <feinerer@logic.at>
222    
223            * R/weight.R: Adapt tf-idf to new matrix format.
224    
225    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
226    
227            * R/matrix.R: Create two distinct classes for term-document and
228            document-term matrices.
229    
230    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
231    
232            * R/termdocmatrix.R: No longer use Matrix package. This reduces
233            package start-up time significantly.
234    
235    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
236    
237            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
238    
239    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
240    
241            * R/transform.R (tmReduce): Combine multiple maps into one
242            transformation.
243    
244    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
245    
246            * R/weight.R: Remove weightLogical since it does not return a
247            dgCMatrix.
248    
249            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
250            or TermDocumentMatrix instead.
251    
252    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
253    
254            * inst/doc/extensions.Rnw: Finished vignette.
255    
256    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
257    
258            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
259            DocumentTermMatrix representations.
260    
261    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
262    
263            * R/reader.R (readXML): New reader for arbitrary XML files.
264    
265    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
266    
267            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
268            (XMLSource): New XMLSource class for arbitrary XML files.
269            (Source): New slot Vectorized.
270    
271    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
272    
273            * R/reader.R (readTabular): Experimental reader for tabular data
274            structures which can be customized via user-defined mappings.
275    
276            * R/reader.R: Always use UTC time zone.
277    
278            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
279    
280    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
281    
282            * R/reader.R (readDOC): Options can be passed over to antiword.
283    
284            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
285            pdftotext.
286    
287    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
288    
289            * R/source.R (DirSource): Add pattern and ignore.case arguments
290            which are internally passed over to list.files().
291    
292    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
293    
294            * inst/doc/tm.Rnw: Suppress pointless loading message.
295    
296    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
297    
298            * DESCRIPTION: Speed up package loading (via moving packages not
299            strictly necessary for normal operation to Suggests instead of
300            Depends).
301    
302    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
303    
304            * R/reader.R (readNewsgroup): The date format is now configurable.
305    
306    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
307    
308            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
309    
310    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
311    
312            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
313    
314    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
315    
316            * R/source.R (DataframeSource): New source class for data frames.
317    
318            * R/source.R: Fixed non-standard call evaluation.
319    
320    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
321    
322            * R/source.R (URISource): New source class for a single document.
323    
324    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
325    
326            * R/source.R: Refactoring.
327    
328    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
329    
330            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
331            Rmpi installations more gracefully.
332    
333    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
334    
335            * R/source.R (Source): Add Length slot.
336    
337    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
338    
339            * R/AAA.R: Unify duplicated .onLoad function.
340    
341    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
342    
343            * DESCRIPTION (Suggests): Added Rmpi.
344    
345    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
346    
347            * R/source.R (getElem): Fix 'no visible binding' warning.
348    
349            * man/WeightFunction.Rd: Fix signature.
350    
351    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
352    
353            * R/weight.R: Introduce name abbreviations for weighting functions.
354    
355    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
356    
357            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
358    
359            * R/cluster.R: Provide convenience functions for using a MPI
360            cluster.
361    
362            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
363            available.
364    
365            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
366            available.
367    
368    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
369    
370            * R/textdoccol.R (lapply): Removed debug print out.
371    
372    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
373    
374            * R/reader.R (readRCV1): Improved meta data extraction from
375            Reuters Corpus Volume 1 documents.
376    
377    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
378    
379            * R/transform.R: Ensure that all mappings preserve multiline
380            structures.
381    
382    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
383    
384            * R/filter.R: Every filter has now an attribute indicating whether
385            it sould be applied to document level (doclevel).
386    
387            * R/textdoccol.R (tmFilter): Set searchFullText as new default
388            filter.
389    
390    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
391    
392            * R/transform.R (replacePatterns): Replaced removeWords by
393            replacePatterns. Suggested by Christian Buchta.
394    
395            * R/textdoccol.R (inspect): Improved formatting.
396    
397    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
398    
399            * inst/CITATION: Updated JSS article information.
400    
401            * R/textdoccol.R (setAs): Added coerce method from list to
402            corpus.
403    
404            * R/meta.R (meta): Improved meta data handling.
405    
406    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
407    
408            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
409            Christian Buchta.
410    
411            * inst/CITATION: Added template to include JSS article reference.
412    
413    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
414    
415            * R/textdoccol.R (tmMap): Introduced lazy mapping.
416    
417            * R/source.R: Added VectorSource.
418    
419    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
420    
421            * man/: Language codes should be in ISO 639-1 format.
422    
423            * R/textdoccol.R (asPlain): Preserve local meta data.
424    
425    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
426    
427            * R/textdoccol.R (writeCorpus): Function for writing a corpus
428            containing plain text documents to disk.
429    
430    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
431    
432            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
433            always set correctly.
434    
435            * R/textdoccol.R: Set load = TRUE as default for load on demand
436            since in most cases this is the wanted behaviour.
437    
438    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
439    
440            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
441    
442            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
443    
444    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
445    
446            * R/meta.R (meta): New function for consistent access to meta data
447            of document collections, repositories, and texts.
448    
449    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
450    
451            * R/: Better support for encodings.
452    
453    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
456            selection when no reader argument is given.
457    
458    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/source.R (CSVSource): Now uses read.csv instead of scan
461            internally.
462    
463    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/reader.R (getReaders): Returns available reader functions.
466    
467            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
468            as default.
469    
470    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
471    
472            * R/stopwords.R (stopwords): Shortened code, removed codetools
473            variable warnings.
474    
475            * man/: Documentation for showMeta, added an example for tmMap.
476    
477            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
478            some minor typos fixed.
479    
480    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
481    
482            * R/aobjects.R (showMeta): Added method for pretty printing a
483            text document's meta data.
484    
485    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
486    
487            * R/textdoccol.R (TextDocCol): Better handling of empty
488            arguments.
489    
490            * NAMESPACE: Exported readDOC.
491    
492            * man/completeStems.Rd: Added an example.
493    
494    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * R/stopwords.R (stopwords): Look up .dat files at every
497            call. Allows users to modify stopword .dat files interactively.
498    
499    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
500    
501            * R/termdocmatrix.R (termFreq): Correct processing of empty
502            documents.
503    
504    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
505    
506            * man/: Updated documentation.
507    
508    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * R/complete.R (completeStems): Completes (heuristically) word
511            stems.
512    
513            * R/termdocmatrix.R (TermDocMatrix2): New modular
514            constructor.
515    
516            * NAMESPACE: Exported termFreq.
517    
518    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
519    
520            * R/reader.R (readDOC): Added MS Word reader (using antiword).
521    
522    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
523    
524            * R/weight.R: Weighting functions for TermDocMatrix.
525    
526    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
529            functions for accessing dimension, column, and row names.
530    
531            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
532    
533    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
534    
535            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
536    
537    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
538    
539            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
540    
541    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
542    
543            * R/reader.R (readPDF): Removed manual checks for pdftotext and
544            pdfinfo. The system call gives a warning anyway.
545    
546    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
547    
548            * R/textdoccol.R (asPlain): Conversion from
549            StructuredTextDocuments to PlainTextDocuments.
550    
551    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
552    
553            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
554            for accessing term-document matrices.
555    
556            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
557            are installed.
558    
559    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
560    
561            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
562            Christian Buchta.
563    
564    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
565    
566            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
567    
568    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
571    
572            * R/reader.R (readPDF): Added PDF reader.
573    
574    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
575    
576            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
577    
578            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
579    
580            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
581    
582            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
583    
584    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
585    
586            * R/distmeasure.R (dissimilarity): Replaced dists call from
587            package cba by new dist call from package proxy.
588    
589    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
590    
591            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
592    
593    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
594    
595            * R/termdocmatrix.R: require() uses the quietly option to suppress
596            loading messages.
597    
598    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
599    
600            * R/dictionary.R: Added dictionary support.
601    
602    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
603    
604            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
605            documents. This simplifies some functions, e.g., asPlain.
606    
607    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
608    
609            * inst/doc/tm.Rnw: Fixed some typos in vignette.
610    
611    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
612    
613            * R/textdoccol.R (replaceWords): Added method to replace a set of
614            words by a single word. Useful for synonyms.
615    
616    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
617    
618            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
619    
620    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
621    
622            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
623            vectors. Thanks to Ariel Maguyon for his error report.
624            (removeSparseTerms): New function to remove columns from a
625            term-document matrix exceeding a sparse factor.
626    
627    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
628    
629            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
630    
631    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
632    
633            * man/sFilter.Rd: Corrected documentation on statement format (use
634            '==' instead of '=').
635    
636    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
637    
638            * R/aobjects.R (StructuredTextDocument): Inherits from
639            TextDocument.
640    
641    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
642    
643            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
644            on sparse matrices as proposed by Martin Maechler.
645    
646    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
647    
648            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
649            \pkg{filehash} version makes them deprecated.
650    
651    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
652    
653            * R/termdocmatrix.R (textvector): Stemming is now performed before
654            erasing stopwords.
655            (weightMatrix): Adapted to handle sparse matrices.
656            (TermDocMatrix): Sparse matrix is now efficiently built by
657            direct stepwise insertion of row values into it.
658    
659    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
660    
661            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
662            due to ongoing problems. For our purposes the latter is as useful
663            as the replaced package.
664    
665    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
666    
667            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
668    
669            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
670    
671    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
672    
673            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
674            languages with available stopwords.
675    
676    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
677    
678            * inst/doc/tm.Rnw: Minor corrections in the vignette.
679    
680    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
681    
682            * DESCRIPTION: Update to version 0.2, since a lot of new features
683            have been integrated.
684    
685            * inst/stopwords: Updated existing stopwords and added stopwords
686            for various other languages.
687    
688    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
689    
690            * man/: Updated documentation.
691    
692            * Work/testDb.R: Script to test database stuff.
693    
694            * R/: Fixed various database related bugs. Seems to be rather
695            useable now, i.e., consider as alpha status for now.
696    
697    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
698    
699            * R/: Fixed some bugs related to database support.
700    
701    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
702    
703            * man/: Added a lot of examples to the manuals.
704    
705    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
706    
707            * man/: Updated parts of the documentation.
708    
709            * R/textdoccol.R (asPlain): Added conversion from newsgroup
710            documents to plain text documents.
711    
712    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
713    
714            * R/textdoccol.R: Finished experimental database support. Not yet
715            intensively tested.
716    
717            * R/source.R: Now each source has a default reader.
718    
719            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
720            class anymore.
721    
722            * R/plaintextdoc.R: Custom show method for plain text documents.
723    
724            * R/aobjects.R: Added a class for structured text documents.
725    
726            * R/reader.R: Replaced remaining \code{parser} occurrences with
727            \code{reader}.
728    
729            * R/textdoccol.R (summary): Indent tags.
730    
731            * R/textdoccol.R (removePunctuation): Transform method to remove
732            punctuation marks.
733    
734    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
735    
736            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
737            using prescindMeta().
738    
739    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
740    
741            * R/textdoccol.R: Improved database support.
742    
743    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
744    
745            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
746    
747            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
748            language code.
749    
750            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
751            into parserControl argument.
752    
753            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
754    
755    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
756    
757            * Work/tmDataSetup.R: The datasets acq and crude can now be
758            created on the fly.
759    
760            * R/stopwords.R: Introduced a function returning the stopwords for
761            a given language (English, German and French at the moment)
762    
763            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
764            otherwise falls back to Snowball package.
765    
766    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
767    
768            * man/dissimilarity-methods.Rd: Make clear that any method offered
769            by "dists" from package "cba" can be used.
770    
771    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
772    
773            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
774            to Kurt's latex suggestion. Removed points and underscores in
775            variable names for consistent naming.
776    
777            * DESCRIPTION: Update to version 0.1-2.
778    
779            * man/TextRepository.Rd: Fixed bug in documentation.
780    
781    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
782    
783            * DESCRIPTION: Update to version 0.1-1.
784    
785    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
786    
787            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
788            wordStem.
789    
790    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
791    
792            * R/: Changes due to Kurt's review.
793    
794    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
795    
796            * R/: Implemented improvements based upon comments by David
797            Meyer.
798    
799    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
800    
801            * inst/doc/: Rewrote vignette.
802    
803            * man/: Improved documentation.
804    
805    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
806    
807            * man/: Updated documentation.
808    
809            * DESCRIPTION: Changed package name to "tm". Updated version to
810            0.1 for first CRAN release.
811    
812            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
813            list archive example.
814    
815            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
816            archive example.
817    
818            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
819            from (several mails per box) mbox format to (single mail per file)
820            eml format.
821    
822    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
823    
824            * data/crude.rda: Rebuilt.
825    
826            * data/acq.rda: Rebuilt.
827    
828            * R/reader.R: Factored out reader and parser methods from
829            textdoccol.R.
830    
831            * R/source.R: Factored out Source methods from aobjects.R and
832            textdoccol.R.
833            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
834            feeds.
835    
836            * R/textdoccol.R (DirSource): Added support for recursive
837            traversal of directories.
838    
839    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
840    
841            * R/textdoccol.R ([[): Loads the document corpus automatically
842            into memory upon access.
843            (tm_transform, tm_filter): Removed several checks whether the
844            document is already loaded ([[ ensures this now).
845            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
846            mailing list archive.
847    
848    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
849    
850            * R/aobjects.R (TextDocument): Is now a virtual class.
851            (Source): Is now a virtual class.
852    
853    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
854    
855            * R/textdoccol.R (c): Support for an arbitrary number of document
856            collections.
857    
858    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
859    
860            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
861            append_meta and remove_meta.
862    
863            * R/textdoccol.R: Removed modify_metadata method.
864    
865            * R/textrepo.R: Removed modify_metadata method.
866    
867            * R/textdoccol.R (remove_meta): Supports removal of document
868            collection metadata and document (= in data frame) metadata.
869    
870    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
871    
872            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
873    
874            * data/crude.rda: Rebuilt.
875    
876            * data/acq.rda: Rebuilt.
877    
878            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
879    
880            * R/textdoccol.R ([): Bug fix for subsetting a document
881            collection's data frame.
882    
883    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
884    
885            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
886            to s_filter.
887    
888            * R/textdoccol.R: Local text documents' metadata can now be copied
889            to a document collection's data frame with prescind_meta.
890    
891    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
892    
893            * R/: Text documents' slot metadata is now accessible in s_filter.
894    
895            * R/: Rewrote s_filter function (has still some restrictions).
896    
897    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
898    
899            * R/: Various fixes in handling metadata.
900    
901            * R/: Added update mechanism for text document collections.
902    
903    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
904    
905            * R/: Merging of document collections now creates a binary tree
906            for reconstructing merged document collections.
907    
908            * R/: Redesign of metadata for document collections.
909    
910    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
911    
912            * R/: Messages now use \code{ngettext}.
913    
914    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
915    
916            * R/: Added functions for modifying and removing metadata.
917    
918    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
919    
920            * man/: Updated some documentation.
921    
922            * R/: Corrected some connection issues.
923    
924            * inst/doc: Worked on the vignette.
925    
926    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
927    
928            * inst/: Added texts and started vignette.
929    
930            * R/: Final changes based upon David's comments.
931    
932    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
933    
934            * NAMESPACE: Corrected exports (generic methods need exportMethods
935            directives!).
936    
937    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
938    
939            * R/: Modified the TextDocCol constructur and various parsers. It
940            is now modular and supports various file formats via plugins (see
941            the new "Source" class).
942    
943    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
944    
945            * man/: Revised documentation after previous code changes.
946    
947    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
948    
949            * R/: Remaining changes as discussed with David.
950    
951    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
952    
953            * R/: Some changes as suggested by David. The rest will follow
954            within the next days.
955    
956    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
957    
958            * man/: Finished documentation.
959    
960    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
961    
962            * man/: Wrote some documentation.
963    
964    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
965    
966            * R/: Further syntactic sugar in form of additional assignment and
967            accessor methods.
968    
969    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
970    
971            * R/: Syntactic sugar in form of "length", "show" and "summary"
972            operators.
973    
974    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
975    
976            * R/: Diverse updates. Mainly on default operators ("[" or "c")
977            and dissimilarities.
978    
979    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
980    
981            * R/: Added similarity functions.
982    
983            * data/: Added english stopwords.
984    
985    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
986    
987            * data/: Examples compiled for new features
988    
989            * R/: Changes due to new structure.
990    
991            * NAMESPACE: Corrected namespace to reflect new structure.
992    
993            * R/termdocmatrix.R: Adapted for new naming scheme.
994    
995    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
996    
997            * R/textdoccol.R: Adapted code for new class structure. Wrote
998            several transform and filter functions operating on text document
999            collections (alias text document databases).
1000    
1001            * R/aobjects.R: Adapted class structure with inheritance,
1002            repositories and additional meta data. Loading files on demand is
1003            now possible.
1004    
1005    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1006    
1007            * R/: Some cosmetic cleanups.
1008    
1009            * inst/: Removed vignette on clustering. That and much more is now
1010            described in the JSS paper on text mining. Based upon that
1011            article an elaborated vignette will be incorporated in the future.
1012    
1013    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1014    
1015            * R/: Updated generic S4 methods to comply with signature changes
1016            in newer versions of R (> 2.3)
1017    
1018    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1019    
1020            * ext/R/importRIS.R: Automatic RIS import is now possible.
1021    
1022    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1023    
1024            * R/textdoccol.R: Added RIS HTML input format.
1025    
1026    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1027    
1028            * R/textdoccol.R: Removed bug that caused invalid text document
1029            collections when handling many input files.
1030    
1031    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1032    
1033            * R/textdoccol.R: Restructured and extended file import
1034            mechanism.
1035    
1036            * inst/doc/clustering.Rnw: Adapted vignette for use with
1037            ReutNews.rda
1038    
1039            * man/ReutNews.Rd: Documentation for ReutNews.rda
1040    
1041            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1042    
1043    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1044    
1045            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1046            clustering facilities of this package.
1047    
1048    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1049    
1050            * R/aobjects.R: Changed package document structure to avoid class
1051            dependency problems.
1052    
1053    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1054    
1055            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1056            data set.
1057    
1058            *  Finished documentation and reordered directory structure. Now "R
1059            CMD check textmin" works without errors.
1060    
1061    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1062    
1063            * src/: Various splits can now be easily created for the
1064            Reuters21578 data set.
1065    
1066  2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1067    
1068          * Updated documentation          * Updated documentation

Legend:
Removed from v.26  
changed lines
  Added in v.1108

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge