SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC pkg/ChangeLog revision 1136, Fri May 27 11:50:39 2011 UTC
# Line 1  Line 1 
1    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
4            normalization.
5    
6    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/transform.R (stemDocument.PlainTextDocument): Use language
9            argument.
10    
11    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/source.R: Store strings and connections instead of unevaluated
14            calls.
15    
16    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
17    
18            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
19    
20    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
21    
22            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
23            (instead of a list element).
24    
25    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
26    
27            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
28            documents by names (fallback to IDs if names are not set).
29    
30    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
31    
32            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
33            \code{recursive} now determines whether existing corpus meta data
34            is used.
35    
36    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
37    
38            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
39    
40    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
43            remove terms not occurring in the corpus anymore.
44    
45    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
46    
47            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
48            and Heaps' law.
49    
50    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
51    
52            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
53            provided by a source.
54    
55    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
56    
57            * R/source.R (.Source): Provide document names.
58    
59    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
60    
61            * R/meta.R (`content_or_meta`): Utility function.
62    
63    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
66            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
67    
68    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
69    
70            * R/weight.R (weightTfIdf): Added normalization option.
71    
72            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
73            analysis.
74    
75    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
76    
77            * R/score.R (tm_tag_score): Compute a score from the number of
78            tags matching in a document.
79    
80    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
81    
82            * R/complete.R (stemCompletion): New completion heuristics.
83    
84    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
85    
86            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
87    
88    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
89    
90            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
91            setOldClass(c(..., "list")) works.
92    
93    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
94    
95            * R/transform.R (stemDocument.character): In case input is a
96            simple character just delegate to the default Snowball stemmer.
97    
98    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
99    
100            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
101            data.
102    
103    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
104    
105            * R/doc.R (`Content<-`): Be careful with names attribute.
106    
107    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
108    
109            * R/source.R (DirSource): Improved implementation especially when
110            handling many (> 1M) files.
111    
112    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/source.R (getElem.URISource): Use encoding argument.
115    
116    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
117    
118            * R/doc.R (setOldClass): Register S3 document classes to be
119            recognized by S4 methods.
120    
121    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
122    
123            * R/matrix.R (termFreq): Add option to remove punctuation
124            characters.
125    
126    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
127    
128            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
129            merging multiple term-document matrices.
130    
131    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/corpus.R (setOldClass): Register S3 corpus classes to be
134            recognized by S4 methods.
135    
136            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
137            that CRAN Mac OS X builds do not fail any longer.
138    
139    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
140    
141            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
142            of RWeka:AlphabeticTokenizer() as default.
143    
144    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
145    
146            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
147            caused words at the beginning or the end of a line not to be removed. Do
148            not delete whitespace anymore.
149    
150    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
151    
152            * R/source.R (DirSource): Default to working directory if no path
153            is specified.
154    
155    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
156    
157            * R/source.R (DirSource): Stop on empty directories.
158    
159    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
160    
161            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
162            named documents.
163    
164    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
165    
166            * R/transform.R (removeWords): Improve regular expressions.
167    
168    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
169    
170            * R/meta.R (DublinCore): Allow lower case tags.
171    
172    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
173    
174            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
175            instead of x$children.
176    
177    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
178    
179            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
180    
181    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
182    
183            * R/: Use S3 instead of S4 class system.
184    
185    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
186    
187            * R/reader.R (readMail): Moved to tm.plugin.mail package.
188    
189    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
190    
191            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
192            postings are basically e-mails with some extra headers.
193    
194    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
195    
196            * R/transform.R: Move convertMboxEml, removeCitation,
197            removeMultipart, and removeSignature to the tm.plugin.mail package
198            since they are mainly utility functions (for handling e-mails) and
199            not very framework specific.
200    
201    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
202    
203            * man/: Fix documentation.
204    
205    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
206    
207            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
208            plain text document instead of an XML document for texts of the
209            Reuters-21578 dataset.
210    
211            * R/sparse.R: Removed since the slam package is now available on
212            CRAN.
213    
214            * DESCRIPTION (Depends): Add slam package.
215    
216    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
217    
218            * R/transform.R (stemDoc): Fix character(0) handling.
219    
220    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
221    
222            * R/doc.R (show): Pretty print.
223    
224    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
225    
226            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
227            gracefully.
228    
229    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
230    
231            * R/corpus.R: Make corpus virtual. Implement corpus with standard
232            and permanent storage semantics.
233    
234            * DESCRIPTION: New major release. A *lot* of improvements.
235    
236    2009-05-04   Ingo Feinerer <feinerer@logic.at>
237    
238            * NAMESPACE: Export some simple_triplet_matrix functions.
239    
240    2009-04-28   Ingo Feinerer <feinerer@logic.at>
241    
242            * R/weight.R: Adapt tf-idf to new matrix format.
243    
244    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
245    
246            * R/matrix.R: Create two distinct classes for term-document and
247            document-term matrices.
248    
249    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
250    
251            * R/termdocmatrix.R: No longer use Matrix package. This reduces
252            package start-up time significantly.
253    
254    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
255    
256            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
257    
258    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
259    
260            * R/transform.R (tmReduce): Combine multiple maps into one
261            transformation.
262    
263    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
264    
265            * R/weight.R: Remove weightLogical since it does not return a
266            dgCMatrix.
267    
268            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
269            or TermDocumentMatrix instead.
270    
271    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
272    
273            * inst/doc/extensions.Rnw: Finished vignette.
274    
275    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
276    
277            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
278            DocumentTermMatrix representations.
279    
280    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
281    
282            * R/reader.R (readXML): New reader for arbitrary XML files.
283    
284    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
285    
286            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
287            (XMLSource): New XMLSource class for arbitrary XML files.
288            (Source): New slot Vectorized.
289    
290    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
291    
292            * R/reader.R (readTabular): Experimental reader for tabular data
293            structures which can be customized via user-defined mappings.
294    
295            * R/reader.R: Always use UTC time zone.
296    
297            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
298    
299    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
300    
301            * R/reader.R (readDOC): Options can be passed over to antiword.
302    
303            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
304            pdftotext.
305    
306    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
307    
308            * R/source.R (DirSource): Add pattern and ignore.case arguments
309            which are internally passed over to list.files().
310    
311    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
312    
313            * inst/doc/tm.Rnw: Suppress pointless loading message.
314    
315    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
316    
317            * DESCRIPTION: Speed up package loading (via moving packages not
318            strictly necessary for normal operation to Suggests instead of
319            Depends).
320    
321    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/reader.R (readNewsgroup): The date format is now configurable.
324    
325    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
326    
327            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
328    
329    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
330    
331            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
332    
333    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
334    
335            * R/source.R (DataframeSource): New source class for data frames.
336    
337            * R/source.R: Fixed non-standard call evaluation.
338    
339    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
340    
341            * R/source.R (URISource): New source class for a single document.
342    
343    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
344    
345            * R/source.R: Refactoring.
346    
347    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
348    
349            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
350            Rmpi installations more gracefully.
351    
352    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
353    
354            * R/source.R (Source): Add Length slot.
355    
356    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
357    
358            * R/AAA.R: Unify duplicated .onLoad function.
359    
360    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
361    
362            * DESCRIPTION (Suggests): Added Rmpi.
363    
364    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
365    
366            * R/source.R (getElem): Fix 'no visible binding' warning.
367    
368            * man/WeightFunction.Rd: Fix signature.
369    
370    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
371    
372            * R/weight.R: Introduce name abbreviations for weighting functions.
373    
374    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
375    
376            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
377    
378            * R/cluster.R: Provide convenience functions for using a MPI
379            cluster.
380    
381            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
382            available.
383    
384            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
385            available.
386    
387    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
388    
389            * R/textdoccol.R (lapply): Removed debug print out.
390    
391    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
392    
393            * R/reader.R (readRCV1): Improved meta data extraction from
394            Reuters Corpus Volume 1 documents.
395    
396    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
397    
398            * R/transform.R: Ensure that all mappings preserve multiline
399            structures.
400    
401    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * R/filter.R: Every filter has now an attribute indicating whether
404            it sould be applied to document level (doclevel).
405    
406            * R/textdoccol.R (tmFilter): Set searchFullText as new default
407            filter.
408    
409    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
410    
411            * R/transform.R (replacePatterns): Replaced removeWords by
412            replacePatterns. Suggested by Christian Buchta.
413    
414            * R/textdoccol.R (inspect): Improved formatting.
415    
416    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * inst/CITATION: Updated JSS article information.
419    
420            * R/textdoccol.R (setAs): Added coerce method from list to
421            corpus.
422    
423            * R/meta.R (meta): Improved meta data handling.
424    
425    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
426    
427            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
428            Christian Buchta.
429    
430            * inst/CITATION: Added template to include JSS article reference.
431    
432    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
433    
434            * R/textdoccol.R (tmMap): Introduced lazy mapping.
435    
436            * R/source.R: Added VectorSource.
437    
438    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
439    
440            * man/: Language codes should be in ISO 639-1 format.
441    
442            * R/textdoccol.R (asPlain): Preserve local meta data.
443    
444    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
445    
446            * R/textdoccol.R (writeCorpus): Function for writing a corpus
447            containing plain text documents to disk.
448    
449    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
450    
451            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
452            always set correctly.
453    
454            * R/textdoccol.R: Set load = TRUE as default for load on demand
455            since in most cases this is the wanted behaviour.
456    
457    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
458    
459            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
460    
461            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
462    
463    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/meta.R (meta): New function for consistent access to meta data
466            of document collections, repositories, and texts.
467    
468    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/: Better support for encodings.
471    
472    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
473    
474            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
475            selection when no reader argument is given.
476    
477    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
478    
479            * R/source.R (CSVSource): Now uses read.csv instead of scan
480            internally.
481    
482    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * R/reader.R (getReaders): Returns available reader functions.
485    
486            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
487            as default.
488    
489    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
490    
491            * R/stopwords.R (stopwords): Shortened code, removed codetools
492            variable warnings.
493    
494            * man/: Documentation for showMeta, added an example for tmMap.
495    
496            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
497            some minor typos fixed.
498    
499    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
500    
501            * R/aobjects.R (showMeta): Added method for pretty printing a
502            text document's meta data.
503    
504    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
505    
506            * R/textdoccol.R (TextDocCol): Better handling of empty
507            arguments.
508    
509            * NAMESPACE: Exported readDOC.
510    
511            * man/completeStems.Rd: Added an example.
512    
513    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
514    
515            * R/stopwords.R (stopwords): Look up .dat files at every
516            call. Allows users to modify stopword .dat files interactively.
517    
518    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
519    
520            * R/termdocmatrix.R (termFreq): Correct processing of empty
521            documents.
522    
523    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
524    
525            * man/: Updated documentation.
526    
527    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
528    
529            * R/complete.R (completeStems): Completes (heuristically) word
530            stems.
531    
532            * R/termdocmatrix.R (TermDocMatrix2): New modular
533            constructor.
534    
535            * NAMESPACE: Exported termFreq.
536    
537    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
538    
539            * R/reader.R (readDOC): Added MS Word reader (using antiword).
540    
541    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
542    
543            * R/weight.R: Weighting functions for TermDocMatrix.
544    
545    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
546    
547            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
548            functions for accessing dimension, column, and row names.
549    
550            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
551    
552    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
553    
554            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
555    
556    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
557    
558            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
559    
560    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
561    
562            * R/reader.R (readPDF): Removed manual checks for pdftotext and
563            pdfinfo. The system call gives a warning anyway.
564    
565    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
566    
567            * R/textdoccol.R (asPlain): Conversion from
568            StructuredTextDocuments to PlainTextDocuments.
569    
570    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
571    
572            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
573            for accessing term-document matrices.
574    
575            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
576            are installed.
577    
578    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
579    
580            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
581            Christian Buchta.
582    
583    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
584    
585            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
586    
587    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
588    
589            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
590    
591            * R/reader.R (readPDF): Added PDF reader.
592    
593    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
594    
595            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
596    
597            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
598    
599            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
600    
601            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
602    
603    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
604    
605            * R/distmeasure.R (dissimilarity): Replaced dists call from
606            package cba by new dist call from package proxy.
607    
608    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
609    
610            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
611    
612    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
613    
614            * R/termdocmatrix.R: require() uses the quietly option to suppress
615            loading messages.
616    
617    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
618    
619            * R/dictionary.R: Added dictionary support.
620    
621    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
622    
623            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
624            documents. This simplifies some functions, e.g., asPlain.
625    
626    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
627    
628            * inst/doc/tm.Rnw: Fixed some typos in vignette.
629    
630    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
631    
632            * R/textdoccol.R (replaceWords): Added method to replace a set of
633            words by a single word. Useful for synonyms.
634    
635    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
636    
637            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
638    
639    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
640    
641            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
642            vectors. Thanks to Ariel Maguyon for his error report.
643            (removeSparseTerms): New function to remove columns from a
644            term-document matrix exceeding a sparse factor.
645    
646    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
647    
648            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
649    
650    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
651    
652            * man/sFilter.Rd: Corrected documentation on statement format (use
653            '==' instead of '=').
654    
655    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * R/aobjects.R (StructuredTextDocument): Inherits from
658            TextDocument.
659    
660    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
661    
662            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
663            on sparse matrices as proposed by Martin Maechler.
664    
665    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
666    
667            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
668            \pkg{filehash} version makes them deprecated.
669    
670    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * R/termdocmatrix.R (textvector): Stemming is now performed before
673            erasing stopwords.
674            (weightMatrix): Adapted to handle sparse matrices.
675            (TermDocMatrix): Sparse matrix is now efficiently built by
676            direct stepwise insertion of row values into it.
677    
678    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
679    
680            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
681            due to ongoing problems. For our purposes the latter is as useful
682            as the replaced package.
683    
684    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
687    
688            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
689    
690    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
691    
692            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
693            languages with available stopwords.
694    
695    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
696    
697            * inst/doc/tm.Rnw: Minor corrections in the vignette.
698    
699    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
700    
701            * DESCRIPTION: Update to version 0.2, since a lot of new features
702            have been integrated.
703    
704            * inst/stopwords: Updated existing stopwords and added stopwords
705            for various other languages.
706    
707    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
708    
709            * man/: Updated documentation.
710    
711            * Work/testDb.R: Script to test database stuff.
712    
713            * R/: Fixed various database related bugs. Seems to be rather
714            useable now, i.e., consider as alpha status for now.
715    
716    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
717    
718            * R/: Fixed some bugs related to database support.
719    
720    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
721    
722            * man/: Added a lot of examples to the manuals.
723    
724    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
725    
726            * man/: Updated parts of the documentation.
727    
728            * R/textdoccol.R (asPlain): Added conversion from newsgroup
729            documents to plain text documents.
730    
731    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
732    
733            * R/textdoccol.R: Finished experimental database support. Not yet
734            intensively tested.
735    
736            * R/source.R: Now each source has a default reader.
737    
738            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
739            class anymore.
740    
741            * R/plaintextdoc.R: Custom show method for plain text documents.
742    
743            * R/aobjects.R: Added a class for structured text documents.
744    
745            * R/reader.R: Replaced remaining \code{parser} occurrences with
746            \code{reader}.
747    
748            * R/textdoccol.R (summary): Indent tags.
749    
750            * R/textdoccol.R (removePunctuation): Transform method to remove
751            punctuation marks.
752    
753    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
754    
755            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
756            using prescindMeta().
757    
758    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
759    
760            * R/textdoccol.R: Improved database support.
761    
762    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
763    
764            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
765    
766            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
767            language code.
768    
769            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
770            into parserControl argument.
771    
772            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
773    
774    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
775    
776            * Work/tmDataSetup.R: The datasets acq and crude can now be
777            created on the fly.
778    
779            * R/stopwords.R: Introduced a function returning the stopwords for
780            a given language (English, German and French at the moment)
781    
782            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
783            otherwise falls back to Snowball package.
784    
785    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
786    
787            * man/dissimilarity-methods.Rd: Make clear that any method offered
788            by "dists" from package "cba" can be used.
789    
790    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
791    
792            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
793            to Kurt's latex suggestion. Removed points and underscores in
794            variable names for consistent naming.
795    
796            * DESCRIPTION: Update to version 0.1-2.
797    
798            * man/TextRepository.Rd: Fixed bug in documentation.
799    
800    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
801    
802            * DESCRIPTION: Update to version 0.1-1.
803    
804    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
805    
806            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
807            wordStem.
808    
809    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
810    
811            * R/: Changes due to Kurt's review.
812    
813    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
814    
815            * R/: Implemented improvements based upon comments by David
816            Meyer.
817    
818    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
819    
820            * inst/doc/: Rewrote vignette.
821    
822            * man/: Improved documentation.
823    
824    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
825    
826            * man/: Updated documentation.
827    
828            * DESCRIPTION: Changed package name to "tm". Updated version to
829            0.1 for first CRAN release.
830    
831            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
832            list archive example.
833    
834            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
835            archive example.
836    
837            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
838            from (several mails per box) mbox format to (single mail per file)
839            eml format.
840    
841    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
842    
843            * data/crude.rda: Rebuilt.
844    
845            * data/acq.rda: Rebuilt.
846    
847            * R/reader.R: Factored out reader and parser methods from
848            textdoccol.R.
849    
850            * R/source.R: Factored out Source methods from aobjects.R and
851            textdoccol.R.
852            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
853            feeds.
854    
855            * R/textdoccol.R (DirSource): Added support for recursive
856            traversal of directories.
857    
858    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
859    
860            * R/textdoccol.R ([[): Loads the document corpus automatically
861            into memory upon access.
862            (tm_transform, tm_filter): Removed several checks whether the
863            document is already loaded ([[ ensures this now).
864            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
865            mailing list archive.
866    
867    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * R/aobjects.R (TextDocument): Is now a virtual class.
870            (Source): Is now a virtual class.
871    
872    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
873    
874            * R/textdoccol.R (c): Support for an arbitrary number of document
875            collections.
876    
877    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
878    
879            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
880            append_meta and remove_meta.
881    
882            * R/textdoccol.R: Removed modify_metadata method.
883    
884            * R/textrepo.R: Removed modify_metadata method.
885    
886            * R/textdoccol.R (remove_meta): Supports removal of document
887            collection metadata and document (= in data frame) metadata.
888    
889    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
890    
891            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
892    
893            * data/crude.rda: Rebuilt.
894    
895            * data/acq.rda: Rebuilt.
896    
897            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
898    
899            * R/textdoccol.R ([): Bug fix for subsetting a document
900            collection's data frame.
901    
902    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
903    
904            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
905            to s_filter.
906    
907            * R/textdoccol.R: Local text documents' metadata can now be copied
908            to a document collection's data frame with prescind_meta.
909    
910    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
911    
912            * R/: Text documents' slot metadata is now accessible in s_filter.
913    
914            * R/: Rewrote s_filter function (has still some restrictions).
915    
916    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
917    
918            * R/: Various fixes in handling metadata.
919    
920            * R/: Added update mechanism for text document collections.
921    
922    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
923    
924            * R/: Merging of document collections now creates a binary tree
925            for reconstructing merged document collections.
926    
927            * R/: Redesign of metadata for document collections.
928    
929    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
930    
931            * R/: Messages now use \code{ngettext}.
932    
933    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
934    
935            * R/: Added functions for modifying and removing metadata.
936    
937    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
938    
939            * man/: Updated some documentation.
940    
941            * R/: Corrected some connection issues.
942    
943            * inst/doc: Worked on the vignette.
944    
945    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
946    
947            * inst/: Added texts and started vignette.
948    
949            * R/: Final changes based upon David's comments.
950    
951    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
952    
953            * NAMESPACE: Corrected exports (generic methods need exportMethods
954            directives!).
955    
956    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
957    
958            * R/: Modified the TextDocCol constructur and various parsers. It
959            is now modular and supports various file formats via plugins (see
960            the new "Source" class).
961    
962    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
963    
964            * man/: Revised documentation after previous code changes.
965    
966    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
967    
968            * R/: Remaining changes as discussed with David.
969    
970    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
971    
972            * R/: Some changes as suggested by David. The rest will follow
973            within the next days.
974    
975    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
976    
977            * man/: Finished documentation.
978    
979    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
980    
981            * man/: Wrote some documentation.
982    
983    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * R/: Further syntactic sugar in form of additional assignment and
986            accessor methods.
987    
988    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
989    
990            * R/: Syntactic sugar in form of "length", "show" and "summary"
991            operators.
992    
993    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
994    
995            * R/: Diverse updates. Mainly on default operators ("[" or "c")
996            and dissimilarities.
997    
998    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
999    
1000            * R/: Added similarity functions.
1001    
1002            * data/: Added english stopwords.
1003    
1004    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1005    
1006            * data/: Examples compiled for new features
1007    
1008            * R/: Changes due to new structure.
1009    
1010            * NAMESPACE: Corrected namespace to reflect new structure.
1011    
1012            * R/termdocmatrix.R: Adapted for new naming scheme.
1013    
1014    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1015    
1016            * R/textdoccol.R: Adapted code for new class structure. Wrote
1017            several transform and filter functions operating on text document
1018            collections (alias text document databases).
1019    
1020            * R/aobjects.R: Adapted class structure with inheritance,
1021            repositories and additional meta data. Loading files on demand is
1022            now possible.
1023    
1024    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1025    
1026            * R/: Some cosmetic cleanups.
1027    
1028            * inst/: Removed vignette on clustering. That and much more is now
1029            described in the JSS paper on text mining. Based upon that
1030            article an elaborated vignette will be incorporated in the future.
1031    
1032    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1033    
1034            * R/: Updated generic S4 methods to comply with signature changes
1035            in newer versions of R (> 2.3)
1036    
1037    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            * ext/R/importRIS.R: Automatic RIS import is now possible.
1040    
1041    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1042    
1043            * R/textdoccol.R: Added RIS HTML input format.
1044    
1045    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1046    
1047            * R/textdoccol.R: Removed bug that caused invalid text document
1048            collections when handling many input files.
1049    
1050    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1051    
1052            * R/textdoccol.R: Restructured and extended file import
1053            mechanism.
1054    
1055            * inst/doc/clustering.Rnw: Adapted vignette for use with
1056            ReutNews.rda
1057    
1058            * man/ReutNews.Rd: Documentation for ReutNews.rda
1059    
1060            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1061    
1062  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1063    
1064          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.1136

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge