SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC pkg/ChangeLog revision 1139, Wed Aug 24 15:21:02 2011 UTC
# Line 1  Line 1 
1    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/stopwords.R (stopwords): Raise an error if no stopwords are
4            available for requested language. Suggested by Derek M Jones.
5    
6    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
9            normalization.
10    
11    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/transform.R (stemDocument.PlainTextDocument): Use language
14            argument.
15    
16    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
17    
18            * R/source.R: Store strings and connections instead of unevaluated
19            calls.
20    
21    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
22    
23            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
24    
25    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
26    
27            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
28            (instead of a list element).
29    
30    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
31    
32            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
33            documents by names (fallback to IDs if names are not set).
34    
35    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
38            \code{recursive} now determines whether existing corpus meta data
39            is used.
40    
41    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
42    
43            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
44    
45    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
46    
47            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
48            remove terms not occurring in the corpus anymore.
49    
50    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
51    
52            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
53            and Heaps' law.
54    
55    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
56    
57            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
58            provided by a source.
59    
60    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
61    
62            * R/source.R (.Source): Provide document names.
63    
64    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
65    
66            * R/meta.R (`content_or_meta`): Utility function.
67    
68    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
69    
70            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
71            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
72    
73    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
74    
75            * R/weight.R (weightTfIdf): Added normalization option.
76    
77            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
78            analysis.
79    
80    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
81    
82            * R/score.R (tm_tag_score): Compute a score from the number of
83            tags matching in a document.
84    
85    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
86    
87            * R/complete.R (stemCompletion): New completion heuristics.
88    
89    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
90    
91            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
92    
93    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
94    
95            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
96            setOldClass(c(..., "list")) works.
97    
98    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
99    
100            * R/transform.R (stemDocument.character): In case input is a
101            simple character just delegate to the default Snowball stemmer.
102    
103    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
104    
105            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
106            data.
107    
108    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
109    
110            * R/doc.R (`Content<-`): Be careful with names attribute.
111    
112    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
113    
114            * R/source.R (DirSource): Improved implementation especially when
115            handling many (> 1M) files.
116    
117    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/source.R (getElem.URISource): Use encoding argument.
120    
121    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
122    
123            * R/doc.R (setOldClass): Register S3 document classes to be
124            recognized by S4 methods.
125    
126    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
127    
128            * R/matrix.R (termFreq): Add option to remove punctuation
129            characters.
130    
131    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
134            merging multiple term-document matrices.
135    
136    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
137    
138            * R/corpus.R (setOldClass): Register S3 corpus classes to be
139            recognized by S4 methods.
140    
141            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
142            that CRAN Mac OS X builds do not fail any longer.
143    
144    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
145    
146            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
147            of RWeka:AlphabeticTokenizer() as default.
148    
149    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
150    
151            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
152            caused words at the beginning or the end of a line not to be removed. Do
153            not delete whitespace anymore.
154    
155    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
156    
157            * R/source.R (DirSource): Default to working directory if no path
158            is specified.
159    
160    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
161    
162            * R/source.R (DirSource): Stop on empty directories.
163    
164    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
165    
166            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
167            named documents.
168    
169    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
170    
171            * R/transform.R (removeWords): Improve regular expressions.
172    
173    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
174    
175            * R/meta.R (DublinCore): Allow lower case tags.
176    
177    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
178    
179            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
180            instead of x$children.
181    
182    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
183    
184            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
185    
186    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
187    
188            * R/: Use S3 instead of S4 class system.
189    
190    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
191    
192            * R/reader.R (readMail): Moved to tm.plugin.mail package.
193    
194    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
195    
196            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
197            postings are basically e-mails with some extra headers.
198    
199    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
200    
201            * R/transform.R: Move convertMboxEml, removeCitation,
202            removeMultipart, and removeSignature to the tm.plugin.mail package
203            since they are mainly utility functions (for handling e-mails) and
204            not very framework specific.
205    
206    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
207    
208            * man/: Fix documentation.
209    
210    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
211    
212            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
213            plain text document instead of an XML document for texts of the
214            Reuters-21578 dataset.
215    
216            * R/sparse.R: Removed since the slam package is now available on
217            CRAN.
218    
219            * DESCRIPTION (Depends): Add slam package.
220    
221    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
222    
223            * R/transform.R (stemDoc): Fix character(0) handling.
224    
225    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
226    
227            * R/doc.R (show): Pretty print.
228    
229    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
230    
231            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
232            gracefully.
233    
234    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
235    
236            * R/corpus.R: Make corpus virtual. Implement corpus with standard
237            and permanent storage semantics.
238    
239            * DESCRIPTION: New major release. A *lot* of improvements.
240    
241    2009-05-04   Ingo Feinerer <feinerer@logic.at>
242    
243            * NAMESPACE: Export some simple_triplet_matrix functions.
244    
245    2009-04-28   Ingo Feinerer <feinerer@logic.at>
246    
247            * R/weight.R: Adapt tf-idf to new matrix format.
248    
249    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
250    
251            * R/matrix.R: Create two distinct classes for term-document and
252            document-term matrices.
253    
254    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
255    
256            * R/termdocmatrix.R: No longer use Matrix package. This reduces
257            package start-up time significantly.
258    
259    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
260    
261            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
262    
263    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
264    
265            * R/transform.R (tmReduce): Combine multiple maps into one
266            transformation.
267    
268    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
269    
270            * R/weight.R: Remove weightLogical since it does not return a
271            dgCMatrix.
272    
273            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
274            or TermDocumentMatrix instead.
275    
276    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
277    
278            * inst/doc/extensions.Rnw: Finished vignette.
279    
280    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
281    
282            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
283            DocumentTermMatrix representations.
284    
285    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
286    
287            * R/reader.R (readXML): New reader for arbitrary XML files.
288    
289    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
290    
291            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
292            (XMLSource): New XMLSource class for arbitrary XML files.
293            (Source): New slot Vectorized.
294    
295    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
296    
297            * R/reader.R (readTabular): Experimental reader for tabular data
298            structures which can be customized via user-defined mappings.
299    
300            * R/reader.R: Always use UTC time zone.
301    
302            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
303    
304    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
305    
306            * R/reader.R (readDOC): Options can be passed over to antiword.
307    
308            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
309            pdftotext.
310    
311    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
312    
313            * R/source.R (DirSource): Add pattern and ignore.case arguments
314            which are internally passed over to list.files().
315    
316    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
317    
318            * inst/doc/tm.Rnw: Suppress pointless loading message.
319    
320    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
321    
322            * DESCRIPTION: Speed up package loading (via moving packages not
323            strictly necessary for normal operation to Suggests instead of
324            Depends).
325    
326    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
327    
328            * R/reader.R (readNewsgroup): The date format is now configurable.
329    
330    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
331    
332            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
333    
334    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
335    
336            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
337    
338    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
339    
340            * R/source.R (DataframeSource): New source class for data frames.
341    
342            * R/source.R: Fixed non-standard call evaluation.
343    
344    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
345    
346            * R/source.R (URISource): New source class for a single document.
347    
348    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
349    
350            * R/source.R: Refactoring.
351    
352    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
353    
354            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
355            Rmpi installations more gracefully.
356    
357    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
358    
359            * R/source.R (Source): Add Length slot.
360    
361    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
362    
363            * R/AAA.R: Unify duplicated .onLoad function.
364    
365    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
366    
367            * DESCRIPTION (Suggests): Added Rmpi.
368    
369    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
370    
371            * R/source.R (getElem): Fix 'no visible binding' warning.
372    
373            * man/WeightFunction.Rd: Fix signature.
374    
375    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
376    
377            * R/weight.R: Introduce name abbreviations for weighting functions.
378    
379    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
380    
381            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
382    
383            * R/cluster.R: Provide convenience functions for using a MPI
384            cluster.
385    
386            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
387            available.
388    
389            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
390            available.
391    
392    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
393    
394            * R/textdoccol.R (lapply): Removed debug print out.
395    
396    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
397    
398            * R/reader.R (readRCV1): Improved meta data extraction from
399            Reuters Corpus Volume 1 documents.
400    
401    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * R/transform.R: Ensure that all mappings preserve multiline
404            structures.
405    
406    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
407    
408            * R/filter.R: Every filter has now an attribute indicating whether
409            it sould be applied to document level (doclevel).
410    
411            * R/textdoccol.R (tmFilter): Set searchFullText as new default
412            filter.
413    
414    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
415    
416            * R/transform.R (replacePatterns): Replaced removeWords by
417            replacePatterns. Suggested by Christian Buchta.
418    
419            * R/textdoccol.R (inspect): Improved formatting.
420    
421    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
422    
423            * inst/CITATION: Updated JSS article information.
424    
425            * R/textdoccol.R (setAs): Added coerce method from list to
426            corpus.
427    
428            * R/meta.R (meta): Improved meta data handling.
429    
430    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
431    
432            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
433            Christian Buchta.
434    
435            * inst/CITATION: Added template to include JSS article reference.
436    
437    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
438    
439            * R/textdoccol.R (tmMap): Introduced lazy mapping.
440    
441            * R/source.R: Added VectorSource.
442    
443    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
444    
445            * man/: Language codes should be in ISO 639-1 format.
446    
447            * R/textdoccol.R (asPlain): Preserve local meta data.
448    
449    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
450    
451            * R/textdoccol.R (writeCorpus): Function for writing a corpus
452            containing plain text documents to disk.
453    
454    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
457            always set correctly.
458    
459            * R/textdoccol.R: Set load = TRUE as default for load on demand
460            since in most cases this is the wanted behaviour.
461    
462    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
463    
464            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
465    
466            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
467    
468    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/meta.R (meta): New function for consistent access to meta data
471            of document collections, repositories, and texts.
472    
473    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/: Better support for encodings.
476    
477    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
478    
479            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
480            selection when no reader argument is given.
481    
482    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * R/source.R (CSVSource): Now uses read.csv instead of scan
485            internally.
486    
487    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
488    
489            * R/reader.R (getReaders): Returns available reader functions.
490    
491            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
492            as default.
493    
494    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * R/stopwords.R (stopwords): Shortened code, removed codetools
497            variable warnings.
498    
499            * man/: Documentation for showMeta, added an example for tmMap.
500    
501            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
502            some minor typos fixed.
503    
504    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
505    
506            * R/aobjects.R (showMeta): Added method for pretty printing a
507            text document's meta data.
508    
509    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
510    
511            * R/textdoccol.R (TextDocCol): Better handling of empty
512            arguments.
513    
514            * NAMESPACE: Exported readDOC.
515    
516            * man/completeStems.Rd: Added an example.
517    
518    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
519    
520            * R/stopwords.R (stopwords): Look up .dat files at every
521            call. Allows users to modify stopword .dat files interactively.
522    
523    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
524    
525            * R/termdocmatrix.R (termFreq): Correct processing of empty
526            documents.
527    
528    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
529    
530            * man/: Updated documentation.
531    
532    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
533    
534            * R/complete.R (completeStems): Completes (heuristically) word
535            stems.
536    
537            * R/termdocmatrix.R (TermDocMatrix2): New modular
538            constructor.
539    
540            * NAMESPACE: Exported termFreq.
541    
542    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
543    
544            * R/reader.R (readDOC): Added MS Word reader (using antiword).
545    
546    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
547    
548            * R/weight.R: Weighting functions for TermDocMatrix.
549    
550    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
551    
552            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
553            functions for accessing dimension, column, and row names.
554    
555            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
556    
557    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
558    
559            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
560    
561    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
562    
563            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
564    
565    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
566    
567            * R/reader.R (readPDF): Removed manual checks for pdftotext and
568            pdfinfo. The system call gives a warning anyway.
569    
570    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
571    
572            * R/textdoccol.R (asPlain): Conversion from
573            StructuredTextDocuments to PlainTextDocuments.
574    
575    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
576    
577            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
578            for accessing term-document matrices.
579    
580            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
581            are installed.
582    
583    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
584    
585            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
586            Christian Buchta.
587    
588    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
589    
590            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
591    
592    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
593    
594            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
595    
596            * R/reader.R (readPDF): Added PDF reader.
597    
598    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
599    
600            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
601    
602            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
603    
604            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
605    
606            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
607    
608    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
609    
610            * R/distmeasure.R (dissimilarity): Replaced dists call from
611            package cba by new dist call from package proxy.
612    
613    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
614    
615            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
616    
617    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
618    
619            * R/termdocmatrix.R: require() uses the quietly option to suppress
620            loading messages.
621    
622    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
623    
624            * R/dictionary.R: Added dictionary support.
625    
626    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
627    
628            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
629            documents. This simplifies some functions, e.g., asPlain.
630    
631    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
632    
633            * inst/doc/tm.Rnw: Fixed some typos in vignette.
634    
635    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
636    
637            * R/textdoccol.R (replaceWords): Added method to replace a set of
638            words by a single word. Useful for synonyms.
639    
640    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
641    
642            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
643    
644    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
645    
646            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
647            vectors. Thanks to Ariel Maguyon for his error report.
648            (removeSparseTerms): New function to remove columns from a
649            term-document matrix exceeding a sparse factor.
650    
651    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
652    
653            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
654    
655    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * man/sFilter.Rd: Corrected documentation on statement format (use
658            '==' instead of '=').
659    
660    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
661    
662            * R/aobjects.R (StructuredTextDocument): Inherits from
663            TextDocument.
664    
665    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
666    
667            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
668            on sparse matrices as proposed by Martin Maechler.
669    
670    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
673            \pkg{filehash} version makes them deprecated.
674    
675    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * R/termdocmatrix.R (textvector): Stemming is now performed before
678            erasing stopwords.
679            (weightMatrix): Adapted to handle sparse matrices.
680            (TermDocMatrix): Sparse matrix is now efficiently built by
681            direct stepwise insertion of row values into it.
682    
683    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
684    
685            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
686            due to ongoing problems. For our purposes the latter is as useful
687            as the replaced package.
688    
689    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
690    
691            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
692    
693            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
694    
695    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
696    
697            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
698            languages with available stopwords.
699    
700    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
701    
702            * inst/doc/tm.Rnw: Minor corrections in the vignette.
703    
704    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
705    
706            * DESCRIPTION: Update to version 0.2, since a lot of new features
707            have been integrated.
708    
709            * inst/stopwords: Updated existing stopwords and added stopwords
710            for various other languages.
711    
712    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
713    
714            * man/: Updated documentation.
715    
716            * Work/testDb.R: Script to test database stuff.
717    
718            * R/: Fixed various database related bugs. Seems to be rather
719            useable now, i.e., consider as alpha status for now.
720    
721    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
722    
723            * R/: Fixed some bugs related to database support.
724    
725    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
726    
727            * man/: Added a lot of examples to the manuals.
728    
729    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
730    
731            * man/: Updated parts of the documentation.
732    
733            * R/textdoccol.R (asPlain): Added conversion from newsgroup
734            documents to plain text documents.
735    
736    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
737    
738            * R/textdoccol.R: Finished experimental database support. Not yet
739            intensively tested.
740    
741            * R/source.R: Now each source has a default reader.
742    
743            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
744            class anymore.
745    
746            * R/plaintextdoc.R: Custom show method for plain text documents.
747    
748            * R/aobjects.R: Added a class for structured text documents.
749    
750            * R/reader.R: Replaced remaining \code{parser} occurrences with
751            \code{reader}.
752    
753            * R/textdoccol.R (summary): Indent tags.
754    
755            * R/textdoccol.R (removePunctuation): Transform method to remove
756            punctuation marks.
757    
758    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
759    
760            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
761            using prescindMeta().
762    
763    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
764    
765            * R/textdoccol.R: Improved database support.
766    
767    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
768    
769            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
770    
771            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
772            language code.
773    
774            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
775            into parserControl argument.
776    
777            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
778    
779    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
780    
781            * Work/tmDataSetup.R: The datasets acq and crude can now be
782            created on the fly.
783    
784            * R/stopwords.R: Introduced a function returning the stopwords for
785            a given language (English, German and French at the moment)
786    
787            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
788            otherwise falls back to Snowball package.
789    
790    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
791    
792            * man/dissimilarity-methods.Rd: Make clear that any method offered
793            by "dists" from package "cba" can be used.
794    
795    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
796    
797            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
798            to Kurt's latex suggestion. Removed points and underscores in
799            variable names for consistent naming.
800    
801            * DESCRIPTION: Update to version 0.1-2.
802    
803            * man/TextRepository.Rd: Fixed bug in documentation.
804    
805    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
806    
807            * DESCRIPTION: Update to version 0.1-1.
808    
809    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
810    
811            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
812            wordStem.
813    
814    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
815    
816            * R/: Changes due to Kurt's review.
817    
818    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
819    
820            * R/: Implemented improvements based upon comments by David
821            Meyer.
822    
823    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
824    
825            * inst/doc/: Rewrote vignette.
826    
827            * man/: Improved documentation.
828    
829    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
830    
831            * man/: Updated documentation.
832    
833            * DESCRIPTION: Changed package name to "tm". Updated version to
834            0.1 for first CRAN release.
835    
836            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
837            list archive example.
838    
839            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
840            archive example.
841    
842            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
843            from (several mails per box) mbox format to (single mail per file)
844            eml format.
845    
846    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
847    
848            * data/crude.rda: Rebuilt.
849    
850            * data/acq.rda: Rebuilt.
851    
852            * R/reader.R: Factored out reader and parser methods from
853            textdoccol.R.
854    
855            * R/source.R: Factored out Source methods from aobjects.R and
856            textdoccol.R.
857            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
858            feeds.
859    
860            * R/textdoccol.R (DirSource): Added support for recursive
861            traversal of directories.
862    
863    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
864    
865            * R/textdoccol.R ([[): Loads the document corpus automatically
866            into memory upon access.
867            (tm_transform, tm_filter): Removed several checks whether the
868            document is already loaded ([[ ensures this now).
869            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
870            mailing list archive.
871    
872    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
873    
874            * R/aobjects.R (TextDocument): Is now a virtual class.
875            (Source): Is now a virtual class.
876    
877    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
878    
879            * R/textdoccol.R (c): Support for an arbitrary number of document
880            collections.
881    
882    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
883    
884            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
885            append_meta and remove_meta.
886    
887            * R/textdoccol.R: Removed modify_metadata method.
888    
889            * R/textrepo.R: Removed modify_metadata method.
890    
891            * R/textdoccol.R (remove_meta): Supports removal of document
892            collection metadata and document (= in data frame) metadata.
893    
894    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
895    
896            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
897    
898            * data/crude.rda: Rebuilt.
899    
900            * data/acq.rda: Rebuilt.
901    
902            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
903    
904            * R/textdoccol.R ([): Bug fix for subsetting a document
905            collection's data frame.
906    
907    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
908    
909            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
910            to s_filter.
911    
912            * R/textdoccol.R: Local text documents' metadata can now be copied
913            to a document collection's data frame with prescind_meta.
914    
915    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
916    
917            * R/: Text documents' slot metadata is now accessible in s_filter.
918    
919            * R/: Rewrote s_filter function (has still some restrictions).
920    
921    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
922    
923            * R/: Various fixes in handling metadata.
924    
925            * R/: Added update mechanism for text document collections.
926    
927    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
928    
929            * R/: Merging of document collections now creates a binary tree
930            for reconstructing merged document collections.
931    
932            * R/: Redesign of metadata for document collections.
933    
934    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
935    
936            * R/: Messages now use \code{ngettext}.
937    
938    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
939    
940            * R/: Added functions for modifying and removing metadata.
941    
942    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
943    
944            * man/: Updated some documentation.
945    
946            * R/: Corrected some connection issues.
947    
948            * inst/doc: Worked on the vignette.
949    
950    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
951    
952            * inst/: Added texts and started vignette.
953    
954            * R/: Final changes based upon David's comments.
955    
956    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
957    
958            * NAMESPACE: Corrected exports (generic methods need exportMethods
959            directives!).
960    
961    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
962    
963            * R/: Modified the TextDocCol constructur and various parsers. It
964            is now modular and supports various file formats via plugins (see
965            the new "Source" class).
966    
967    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
968    
969            * man/: Revised documentation after previous code changes.
970    
971    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
972    
973            * R/: Remaining changes as discussed with David.
974    
975    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
976    
977            * R/: Some changes as suggested by David. The rest will follow
978            within the next days.
979    
980    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
981    
982            * man/: Finished documentation.
983    
984    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
985    
986            * man/: Wrote some documentation.
987    
988    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
989    
990            * R/: Further syntactic sugar in form of additional assignment and
991            accessor methods.
992    
993    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
994    
995            * R/: Syntactic sugar in form of "length", "show" and "summary"
996            operators.
997    
998    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
999    
1000            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1001            and dissimilarities.
1002    
1003    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1004    
1005            * R/: Added similarity functions.
1006    
1007            * data/: Added english stopwords.
1008    
1009    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1010    
1011            * data/: Examples compiled for new features
1012    
1013            * R/: Changes due to new structure.
1014    
1015            * NAMESPACE: Corrected namespace to reflect new structure.
1016    
1017            * R/termdocmatrix.R: Adapted for new naming scheme.
1018    
1019    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1020    
1021            * R/textdoccol.R: Adapted code for new class structure. Wrote
1022            several transform and filter functions operating on text document
1023            collections (alias text document databases).
1024    
1025            * R/aobjects.R: Adapted class structure with inheritance,
1026            repositories and additional meta data. Loading files on demand is
1027            now possible.
1028    
1029    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1030    
1031            * R/: Some cosmetic cleanups.
1032    
1033            * inst/: Removed vignette on clustering. That and much more is now
1034            described in the JSS paper on text mining. Based upon that
1035            article an elaborated vignette will be incorporated in the future.
1036    
1037    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            * R/: Updated generic S4 methods to comply with signature changes
1040            in newer versions of R (> 2.3)
1041    
1042    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1043    
1044            * ext/R/importRIS.R: Automatic RIS import is now possible.
1045    
1046    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1047    
1048            * R/textdoccol.R: Added RIS HTML input format.
1049    
1050    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1051    
1052            * R/textdoccol.R: Removed bug that caused invalid text document
1053            collections when handling many input files.
1054    
1055    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1056    
1057            * R/textdoccol.R: Restructured and extended file import
1058            mechanism.
1059    
1060            * inst/doc/clustering.Rnw: Adapted vignette for use with
1061            ReutNews.rda
1062    
1063            * man/ReutNews.Rd: Documentation for ReutNews.rda
1064    
1065            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1066    
1067  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1068    
1069          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.1139

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge