SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC pkg/ChangeLog revision 1121, Thu Feb 17 17:13:45 2011 UTC
# Line 1  Line 1 
1    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/transform.R (stemDocument.PlainTextDocument): Use language
4            argument.
5    
6    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/source.R: Store strings and connections instead of unevaluated
9            calls.
10    
11    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
14    
15    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
16    
17            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
18            (instead of a list element).
19    
20    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
21    
22            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
23            documents by names (fallback to IDs if names are not set).
24    
25    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
26    
27            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
28            \code{recursive} now determines whether existing corpus meta data
29            is used.
30    
31    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
32    
33            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
34    
35    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
38            remove terms not occurring in the corpus anymore.
39    
40    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
43            and Heaps' law.
44    
45    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
46    
47            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
48            provided by a source.
49    
50    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
51    
52            * R/source.R (.Source): Provide document names.
53    
54    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
55    
56            * R/meta.R (`content_or_meta`): Utility function.
57    
58    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
59    
60            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
61            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
62    
63    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/weight.R (weightTfIdf): Added normalization option.
66    
67            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
68            analysis.
69    
70    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
71    
72            * R/score.R (tm_tag_score): Compute a score from the number of
73            tags matching in a document.
74    
75    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
76    
77            * R/complete.R (stemCompletion): New completion heuristics.
78    
79    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
80    
81            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
82    
83    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
84    
85            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
86            setOldClass(c(..., "list")) works.
87    
88    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/transform.R (stemDocument.character): In case input is a
91            simple character just delegate to the default Snowball stemmer.
92    
93    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
94    
95            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
96            data.
97    
98    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
99    
100            * R/doc.R (`Content<-`): Be careful with names attribute.
101    
102    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
103    
104            * R/source.R (DirSource): Improved implementation especially when
105            handling many (> 1M) files.
106    
107    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/source.R (getElem.URISource): Use encoding argument.
110    
111    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
112    
113            * R/doc.R (setOldClass): Register S3 document classes to be
114            recognized by S4 methods.
115    
116    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
117    
118            * R/matrix.R (termFreq): Add option to remove punctuation
119            characters.
120    
121    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
122    
123            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
124            merging multiple term-document matrices.
125    
126    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
127    
128            * R/corpus.R (setOldClass): Register S3 corpus classes to be
129            recognized by S4 methods.
130    
131            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
132            that CRAN Mac OS X builds do not fail any longer.
133    
134    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
135    
136            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
137            of RWeka:AlphabeticTokenizer() as default.
138    
139    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
140    
141            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
142            caused words at the beginning or the end of a line not to be removed. Do
143            not delete whitespace anymore.
144    
145    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
146    
147            * R/source.R (DirSource): Default to working directory if no path
148            is specified.
149    
150    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
151    
152            * R/source.R (DirSource): Stop on empty directories.
153    
154    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
155    
156            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
157            named documents.
158    
159    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
160    
161            * R/transform.R (removeWords): Improve regular expressions.
162    
163    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
164    
165            * R/meta.R (DublinCore): Allow lower case tags.
166    
167    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
168    
169            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
170            instead of x$children.
171    
172    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
173    
174            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
175    
176    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
177    
178            * R/: Use S3 instead of S4 class system.
179    
180    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
181    
182            * R/reader.R (readMail): Moved to tm.plugin.mail package.
183    
184    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
185    
186            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
187            postings are basically e-mails with some extra headers.
188    
189    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
190    
191            * R/transform.R: Move convertMboxEml, removeCitation,
192            removeMultipart, and removeSignature to the tm.plugin.mail package
193            since they are mainly utility functions (for handling e-mails) and
194            not very framework specific.
195    
196    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
197    
198            * man/: Fix documentation.
199    
200    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
201    
202            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
203            plain text document instead of an XML document for texts of the
204            Reuters-21578 dataset.
205    
206            * R/sparse.R: Removed since the slam package is now available on
207            CRAN.
208    
209            * DESCRIPTION (Depends): Add slam package.
210    
211    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
212    
213            * R/transform.R (stemDoc): Fix character(0) handling.
214    
215    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
216    
217            * R/doc.R (show): Pretty print.
218    
219    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
220    
221            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
222            gracefully.
223    
224    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
225    
226            * R/corpus.R: Make corpus virtual. Implement corpus with standard
227            and permanent storage semantics.
228    
229            * DESCRIPTION: New major release. A *lot* of improvements.
230    
231    2009-05-04   Ingo Feinerer <feinerer@logic.at>
232    
233            * NAMESPACE: Export some simple_triplet_matrix functions.
234    
235    2009-04-28   Ingo Feinerer <feinerer@logic.at>
236    
237            * R/weight.R: Adapt tf-idf to new matrix format.
238    
239    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
240    
241            * R/matrix.R: Create two distinct classes for term-document and
242            document-term matrices.
243    
244    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
245    
246            * R/termdocmatrix.R: No longer use Matrix package. This reduces
247            package start-up time significantly.
248    
249    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
250    
251            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
252    
253    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
254    
255            * R/transform.R (tmReduce): Combine multiple maps into one
256            transformation.
257    
258    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
259    
260            * R/weight.R: Remove weightLogical since it does not return a
261            dgCMatrix.
262    
263            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
264            or TermDocumentMatrix instead.
265    
266    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
267    
268            * inst/doc/extensions.Rnw: Finished vignette.
269    
270    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
271    
272            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
273            DocumentTermMatrix representations.
274    
275    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
276    
277            * R/reader.R (readXML): New reader for arbitrary XML files.
278    
279    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
280    
281            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
282            (XMLSource): New XMLSource class for arbitrary XML files.
283            (Source): New slot Vectorized.
284    
285    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
286    
287            * R/reader.R (readTabular): Experimental reader for tabular data
288            structures which can be customized via user-defined mappings.
289    
290            * R/reader.R: Always use UTC time zone.
291    
292            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
293    
294    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
295    
296            * R/reader.R (readDOC): Options can be passed over to antiword.
297    
298            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
299            pdftotext.
300    
301    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
302    
303            * R/source.R (DirSource): Add pattern and ignore.case arguments
304            which are internally passed over to list.files().
305    
306    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
307    
308            * inst/doc/tm.Rnw: Suppress pointless loading message.
309    
310    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
311    
312            * DESCRIPTION: Speed up package loading (via moving packages not
313            strictly necessary for normal operation to Suggests instead of
314            Depends).
315    
316    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
317    
318            * R/reader.R (readNewsgroup): The date format is now configurable.
319    
320    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
321    
322            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
323    
324    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
325    
326            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
327    
328    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
329    
330            * R/source.R (DataframeSource): New source class for data frames.
331    
332            * R/source.R: Fixed non-standard call evaluation.
333    
334    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
335    
336            * R/source.R (URISource): New source class for a single document.
337    
338    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
339    
340            * R/source.R: Refactoring.
341    
342    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
343    
344            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
345            Rmpi installations more gracefully.
346    
347    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
348    
349            * R/source.R (Source): Add Length slot.
350    
351    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
352    
353            * R/AAA.R: Unify duplicated .onLoad function.
354    
355    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
356    
357            * DESCRIPTION (Suggests): Added Rmpi.
358    
359    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
360    
361            * R/source.R (getElem): Fix 'no visible binding' warning.
362    
363            * man/WeightFunction.Rd: Fix signature.
364    
365    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
366    
367            * R/weight.R: Introduce name abbreviations for weighting functions.
368    
369    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
370    
371            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
372    
373            * R/cluster.R: Provide convenience functions for using a MPI
374            cluster.
375    
376            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
377            available.
378    
379            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
380            available.
381    
382    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
383    
384            * R/textdoccol.R (lapply): Removed debug print out.
385    
386    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
387    
388            * R/reader.R (readRCV1): Improved meta data extraction from
389            Reuters Corpus Volume 1 documents.
390    
391    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
392    
393            * R/transform.R: Ensure that all mappings preserve multiline
394            structures.
395    
396    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
397    
398            * R/filter.R: Every filter has now an attribute indicating whether
399            it sould be applied to document level (doclevel).
400    
401            * R/textdoccol.R (tmFilter): Set searchFullText as new default
402            filter.
403    
404    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
405    
406            * R/transform.R (replacePatterns): Replaced removeWords by
407            replacePatterns. Suggested by Christian Buchta.
408    
409            * R/textdoccol.R (inspect): Improved formatting.
410    
411    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * inst/CITATION: Updated JSS article information.
414    
415            * R/textdoccol.R (setAs): Added coerce method from list to
416            corpus.
417    
418            * R/meta.R (meta): Improved meta data handling.
419    
420    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
421    
422            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
423            Christian Buchta.
424    
425            * inst/CITATION: Added template to include JSS article reference.
426    
427    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
428    
429            * R/textdoccol.R (tmMap): Introduced lazy mapping.
430    
431            * R/source.R: Added VectorSource.
432    
433    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
434    
435            * man/: Language codes should be in ISO 639-1 format.
436    
437            * R/textdoccol.R (asPlain): Preserve local meta data.
438    
439    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * R/textdoccol.R (writeCorpus): Function for writing a corpus
442            containing plain text documents to disk.
443    
444    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
445    
446            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
447            always set correctly.
448    
449            * R/textdoccol.R: Set load = TRUE as default for load on demand
450            since in most cases this is the wanted behaviour.
451    
452    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
453    
454            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
455    
456            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
457    
458    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/meta.R (meta): New function for consistent access to meta data
461            of document collections, repositories, and texts.
462    
463    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/: Better support for encodings.
466    
467    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
468    
469            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
470            selection when no reader argument is given.
471    
472    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
473    
474            * R/source.R (CSVSource): Now uses read.csv instead of scan
475            internally.
476    
477    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
478    
479            * R/reader.R (getReaders): Returns available reader functions.
480    
481            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
482            as default.
483    
484    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
485    
486            * R/stopwords.R (stopwords): Shortened code, removed codetools
487            variable warnings.
488    
489            * man/: Documentation for showMeta, added an example for tmMap.
490    
491            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
492            some minor typos fixed.
493    
494    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * R/aobjects.R (showMeta): Added method for pretty printing a
497            text document's meta data.
498    
499    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
500    
501            * R/textdoccol.R (TextDocCol): Better handling of empty
502            arguments.
503    
504            * NAMESPACE: Exported readDOC.
505    
506            * man/completeStems.Rd: Added an example.
507    
508    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * R/stopwords.R (stopwords): Look up .dat files at every
511            call. Allows users to modify stopword .dat files interactively.
512    
513    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
514    
515            * R/termdocmatrix.R (termFreq): Correct processing of empty
516            documents.
517    
518    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
519    
520            * man/: Updated documentation.
521    
522    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
523    
524            * R/complete.R (completeStems): Completes (heuristically) word
525            stems.
526    
527            * R/termdocmatrix.R (TermDocMatrix2): New modular
528            constructor.
529    
530            * NAMESPACE: Exported termFreq.
531    
532    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
533    
534            * R/reader.R (readDOC): Added MS Word reader (using antiword).
535    
536    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
537    
538            * R/weight.R: Weighting functions for TermDocMatrix.
539    
540    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
541    
542            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
543            functions for accessing dimension, column, and row names.
544    
545            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
546    
547    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
548    
549            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
550    
551    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
552    
553            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
554    
555    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
556    
557            * R/reader.R (readPDF): Removed manual checks for pdftotext and
558            pdfinfo. The system call gives a warning anyway.
559    
560    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
561    
562            * R/textdoccol.R (asPlain): Conversion from
563            StructuredTextDocuments to PlainTextDocuments.
564    
565    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
566    
567            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
568            for accessing term-document matrices.
569    
570            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
571            are installed.
572    
573    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
576            Christian Buchta.
577    
578    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
579    
580            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
581    
582    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
583    
584            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
585    
586            * R/reader.R (readPDF): Added PDF reader.
587    
588    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
589    
590            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
591    
592            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
593    
594            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
595    
596            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
597    
598    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
599    
600            * R/distmeasure.R (dissimilarity): Replaced dists call from
601            package cba by new dist call from package proxy.
602    
603    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
604    
605            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
606    
607    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
608    
609            * R/termdocmatrix.R: require() uses the quietly option to suppress
610            loading messages.
611    
612    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
613    
614            * R/dictionary.R: Added dictionary support.
615    
616    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
617    
618            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
619            documents. This simplifies some functions, e.g., asPlain.
620    
621    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
622    
623            * inst/doc/tm.Rnw: Fixed some typos in vignette.
624    
625    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
626    
627            * R/textdoccol.R (replaceWords): Added method to replace a set of
628            words by a single word. Useful for synonyms.
629    
630    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
631    
632            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
633    
634    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
635    
636            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
637            vectors. Thanks to Ariel Maguyon for his error report.
638            (removeSparseTerms): New function to remove columns from a
639            term-document matrix exceeding a sparse factor.
640    
641    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
642    
643            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
644    
645    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
646    
647            * man/sFilter.Rd: Corrected documentation on statement format (use
648            '==' instead of '=').
649    
650    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
651    
652            * R/aobjects.R (StructuredTextDocument): Inherits from
653            TextDocument.
654    
655    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
658            on sparse matrices as proposed by Martin Maechler.
659    
660    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
661    
662            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
663            \pkg{filehash} version makes them deprecated.
664    
665    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
666    
667            * R/termdocmatrix.R (textvector): Stemming is now performed before
668            erasing stopwords.
669            (weightMatrix): Adapted to handle sparse matrices.
670            (TermDocMatrix): Sparse matrix is now efficiently built by
671            direct stepwise insertion of row values into it.
672    
673    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
674    
675            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
676            due to ongoing problems. For our purposes the latter is as useful
677            as the replaced package.
678    
679    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
680    
681            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
682    
683            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
684    
685    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
686    
687            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
688            languages with available stopwords.
689    
690    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
691    
692            * inst/doc/tm.Rnw: Minor corrections in the vignette.
693    
694    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
695    
696            * DESCRIPTION: Update to version 0.2, since a lot of new features
697            have been integrated.
698    
699            * inst/stopwords: Updated existing stopwords and added stopwords
700            for various other languages.
701    
702    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
703    
704            * man/: Updated documentation.
705    
706            * Work/testDb.R: Script to test database stuff.
707    
708            * R/: Fixed various database related bugs. Seems to be rather
709            useable now, i.e., consider as alpha status for now.
710    
711    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
712    
713            * R/: Fixed some bugs related to database support.
714    
715    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
716    
717            * man/: Added a lot of examples to the manuals.
718    
719    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
720    
721            * man/: Updated parts of the documentation.
722    
723            * R/textdoccol.R (asPlain): Added conversion from newsgroup
724            documents to plain text documents.
725    
726    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
727    
728            * R/textdoccol.R: Finished experimental database support. Not yet
729            intensively tested.
730    
731            * R/source.R: Now each source has a default reader.
732    
733            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
734            class anymore.
735    
736            * R/plaintextdoc.R: Custom show method for plain text documents.
737    
738            * R/aobjects.R: Added a class for structured text documents.
739    
740            * R/reader.R: Replaced remaining \code{parser} occurrences with
741            \code{reader}.
742    
743            * R/textdoccol.R (summary): Indent tags.
744    
745            * R/textdoccol.R (removePunctuation): Transform method to remove
746            punctuation marks.
747    
748    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
749    
750            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
751            using prescindMeta().
752    
753    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
754    
755            * R/textdoccol.R: Improved database support.
756    
757    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
758    
759            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
760    
761            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
762            language code.
763    
764            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
765            into parserControl argument.
766    
767            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
768    
769    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
770    
771            * Work/tmDataSetup.R: The datasets acq and crude can now be
772            created on the fly.
773    
774            * R/stopwords.R: Introduced a function returning the stopwords for
775            a given language (English, German and French at the moment)
776    
777            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
778            otherwise falls back to Snowball package.
779    
780    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
781    
782            * man/dissimilarity-methods.Rd: Make clear that any method offered
783            by "dists" from package "cba" can be used.
784    
785    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
786    
787            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
788            to Kurt's latex suggestion. Removed points and underscores in
789            variable names for consistent naming.
790    
791            * DESCRIPTION: Update to version 0.1-2.
792    
793            * man/TextRepository.Rd: Fixed bug in documentation.
794    
795    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
796    
797            * DESCRIPTION: Update to version 0.1-1.
798    
799    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
800    
801            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
802            wordStem.
803    
804    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
805    
806            * R/: Changes due to Kurt's review.
807    
808    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
809    
810            * R/: Implemented improvements based upon comments by David
811            Meyer.
812    
813    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
814    
815            * inst/doc/: Rewrote vignette.
816    
817            * man/: Improved documentation.
818    
819    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
820    
821            * man/: Updated documentation.
822    
823            * DESCRIPTION: Changed package name to "tm". Updated version to
824            0.1 for first CRAN release.
825    
826            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
827            list archive example.
828    
829            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
830            archive example.
831    
832            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
833            from (several mails per box) mbox format to (single mail per file)
834            eml format.
835    
836    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
837    
838            * data/crude.rda: Rebuilt.
839    
840            * data/acq.rda: Rebuilt.
841    
842            * R/reader.R: Factored out reader and parser methods from
843            textdoccol.R.
844    
845            * R/source.R: Factored out Source methods from aobjects.R and
846            textdoccol.R.
847            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
848            feeds.
849    
850            * R/textdoccol.R (DirSource): Added support for recursive
851            traversal of directories.
852    
853    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
854    
855            * R/textdoccol.R ([[): Loads the document corpus automatically
856            into memory upon access.
857            (tm_transform, tm_filter): Removed several checks whether the
858            document is already loaded ([[ ensures this now).
859            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
860            mailing list archive.
861    
862    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
863    
864            * R/aobjects.R (TextDocument): Is now a virtual class.
865            (Source): Is now a virtual class.
866    
867    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * R/textdoccol.R (c): Support for an arbitrary number of document
870            collections.
871    
872    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
873    
874            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
875            append_meta and remove_meta.
876    
877            * R/textdoccol.R: Removed modify_metadata method.
878    
879            * R/textrepo.R: Removed modify_metadata method.
880    
881            * R/textdoccol.R (remove_meta): Supports removal of document
882            collection metadata and document (= in data frame) metadata.
883    
884    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
885    
886            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
887    
888            * data/crude.rda: Rebuilt.
889    
890            * data/acq.rda: Rebuilt.
891    
892            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
893    
894            * R/textdoccol.R ([): Bug fix for subsetting a document
895            collection's data frame.
896    
897    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
898    
899            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
900            to s_filter.
901    
902            * R/textdoccol.R: Local text documents' metadata can now be copied
903            to a document collection's data frame with prescind_meta.
904    
905    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
906    
907            * R/: Text documents' slot metadata is now accessible in s_filter.
908    
909            * R/: Rewrote s_filter function (has still some restrictions).
910    
911    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
912    
913            * R/: Various fixes in handling metadata.
914    
915            * R/: Added update mechanism for text document collections.
916    
917    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
918    
919            * R/: Merging of document collections now creates a binary tree
920            for reconstructing merged document collections.
921    
922            * R/: Redesign of metadata for document collections.
923    
924    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
925    
926            * R/: Messages now use \code{ngettext}.
927    
928    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
929    
930            * R/: Added functions for modifying and removing metadata.
931    
932    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
933    
934            * man/: Updated some documentation.
935    
936            * R/: Corrected some connection issues.
937    
938            * inst/doc: Worked on the vignette.
939    
940    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
941    
942            * inst/: Added texts and started vignette.
943    
944            * R/: Final changes based upon David's comments.
945    
946    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
947    
948            * NAMESPACE: Corrected exports (generic methods need exportMethods
949            directives!).
950    
951    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
952    
953            * R/: Modified the TextDocCol constructur and various parsers. It
954            is now modular and supports various file formats via plugins (see
955            the new "Source" class).
956    
957    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
958    
959            * man/: Revised documentation after previous code changes.
960    
961    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
962    
963            * R/: Remaining changes as discussed with David.
964    
965    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
966    
967            * R/: Some changes as suggested by David. The rest will follow
968            within the next days.
969    
970    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
971    
972            * man/: Finished documentation.
973    
974    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
975    
976            * man/: Wrote some documentation.
977    
978    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
979    
980            * R/: Further syntactic sugar in form of additional assignment and
981            accessor methods.
982    
983    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * R/: Syntactic sugar in form of "length", "show" and "summary"
986            operators.
987    
988    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
989    
990            * R/: Diverse updates. Mainly on default operators ("[" or "c")
991            and dissimilarities.
992    
993    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
994    
995            * R/: Added similarity functions.
996    
997            * data/: Added english stopwords.
998    
999    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1000    
1001            * data/: Examples compiled for new features
1002    
1003            * R/: Changes due to new structure.
1004    
1005            * NAMESPACE: Corrected namespace to reflect new structure.
1006    
1007            * R/termdocmatrix.R: Adapted for new naming scheme.
1008    
1009    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1010    
1011            * R/textdoccol.R: Adapted code for new class structure. Wrote
1012            several transform and filter functions operating on text document
1013            collections (alias text document databases).
1014    
1015            * R/aobjects.R: Adapted class structure with inheritance,
1016            repositories and additional meta data. Loading files on demand is
1017            now possible.
1018    
1019    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1020    
1021            * R/: Some cosmetic cleanups.
1022    
1023            * inst/: Removed vignette on clustering. That and much more is now
1024            described in the JSS paper on text mining. Based upon that
1025            article an elaborated vignette will be incorporated in the future.
1026    
1027    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1028    
1029            * R/: Updated generic S4 methods to comply with signature changes
1030            in newer versions of R (> 2.3)
1031    
1032    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1033    
1034            * ext/R/importRIS.R: Automatic RIS import is now possible.
1035    
1036    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1037    
1038            * R/textdoccol.R: Added RIS HTML input format.
1039    
1040    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1041    
1042            * R/textdoccol.R: Removed bug that caused invalid text document
1043            collections when handling many input files.
1044    
1045    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1046    
1047            * R/textdoccol.R: Restructured and extended file import
1048            mechanism.
1049    
1050            * inst/doc/clustering.Rnw: Adapted vignette for use with
1051            ReutNews.rda
1052    
1053            * man/ReutNews.Rd: Documentation for ReutNews.rda
1054    
1055            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1056    
1057  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1058    
1059          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.1121

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge