SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1147, Tue Oct 11 15:41:49 2011 UTC
# Line 1  Line 1 
1    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
4            can be accessed via '[' and not '[['.
5    
6    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/stopwords.R (stopwords): Raise an error if no stopwords are
9            available for requested language. Suggested by Derek M Jones.
10    
11    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
14            normalization.
15    
16    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
17    
18            * R/transform.R (stemDocument.PlainTextDocument): Use language
19            argument.
20    
21    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
22    
23            * R/source.R: Store strings and connections instead of unevaluated
24            calls.
25    
26    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
27    
28            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
29    
30    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
31    
32            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
33            (instead of a list element).
34    
35    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
38            documents by names (fallback to IDs if names are not set).
39    
40    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
43            \code{recursive} now determines whether existing corpus meta data
44            is used.
45    
46    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
47    
48            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
49    
50    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
51    
52            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
53            remove terms not occurring in the corpus anymore.
54    
55    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
56    
57            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
58            and Heaps' law.
59    
60    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
61    
62            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
63            provided by a source.
64    
65    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
66    
67            * R/source.R (.Source): Provide document names.
68    
69    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
70    
71            * R/meta.R (`content_or_meta`): Utility function.
72    
73    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
74    
75            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
76            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
77    
78    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
79    
80            * R/weight.R (weightTfIdf): Added normalization option.
81    
82            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
83            analysis.
84    
85    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
86    
87            * R/score.R (tm_tag_score): Compute a score from the number of
88            tags matching in a document.
89    
90    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
91    
92            * R/complete.R (stemCompletion): New completion heuristics.
93    
94    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
95    
96            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
97    
98    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
99    
100            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
101            setOldClass(c(..., "list")) works.
102    
103    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
104    
105            * R/transform.R (stemDocument.character): In case input is a
106            simple character just delegate to the default Snowball stemmer.
107    
108    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
109    
110            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
111            data.
112    
113    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
114    
115            * R/doc.R (`Content<-`): Be careful with names attribute.
116    
117    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
118    
119            * R/source.R (DirSource): Improved implementation especially when
120            handling many (> 1M) files.
121    
122    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/source.R (getElem.URISource): Use encoding argument.
125    
126    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
127    
128            * R/doc.R (setOldClass): Register S3 document classes to be
129            recognized by S4 methods.
130    
131    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/matrix.R (termFreq): Add option to remove punctuation
134            characters.
135    
136    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
137    
138            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
139            merging multiple term-document matrices.
140    
141    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
142    
143            * R/corpus.R (setOldClass): Register S3 corpus classes to be
144            recognized by S4 methods.
145    
146            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
147            that CRAN Mac OS X builds do not fail any longer.
148    
149    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
150    
151            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
152            of RWeka:AlphabeticTokenizer() as default.
153    
154    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
155    
156            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
157            caused words at the beginning or the end of a line not to be removed. Do
158            not delete whitespace anymore.
159    
160    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
161    
162            * R/source.R (DirSource): Default to working directory if no path
163            is specified.
164    
165    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
166    
167            * R/source.R (DirSource): Stop on empty directories.
168    
169    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
170    
171            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
172            named documents.
173    
174    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
175    
176            * R/transform.R (removeWords): Improve regular expressions.
177    
178    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
179    
180            * R/meta.R (DublinCore): Allow lower case tags.
181    
182    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
183    
184            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
185            instead of x$children.
186    
187    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
188    
189            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
190    
191    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
192    
193            * R/: Use S3 instead of S4 class system.
194    
195    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
196    
197            * R/reader.R (readMail): Moved to tm.plugin.mail package.
198    
199    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
200    
201            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
202            postings are basically e-mails with some extra headers.
203    
204    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
205    
206            * R/transform.R: Move convertMboxEml, removeCitation,
207            removeMultipart, and removeSignature to the tm.plugin.mail package
208            since they are mainly utility functions (for handling e-mails) and
209            not very framework specific.
210    
211    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
212    
213            * man/: Fix documentation.
214    
215    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
216    
217            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
218            plain text document instead of an XML document for texts of the
219            Reuters-21578 dataset.
220    
221            * R/sparse.R: Removed since the slam package is now available on
222            CRAN.
223    
224            * DESCRIPTION (Depends): Add slam package.
225    
226    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
227    
228            * R/transform.R (stemDoc): Fix character(0) handling.
229    
230    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
231    
232            * R/doc.R (show): Pretty print.
233    
234    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
235    
236            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
237            gracefully.
238    
239    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
240    
241            * R/corpus.R: Make corpus virtual. Implement corpus with standard
242            and permanent storage semantics.
243    
244            * DESCRIPTION: New major release. A *lot* of improvements.
245    
246    2009-05-04   Ingo Feinerer <feinerer@logic.at>
247    
248            * NAMESPACE: Export some simple_triplet_matrix functions.
249    
250    2009-04-28   Ingo Feinerer <feinerer@logic.at>
251    
252            * R/weight.R: Adapt tf-idf to new matrix format.
253    
254    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
255    
256            * R/matrix.R: Create two distinct classes for term-document and
257            document-term matrices.
258    
259    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
260    
261            * R/termdocmatrix.R: No longer use Matrix package. This reduces
262            package start-up time significantly.
263    
264    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
265    
266            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
267    
268    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
269    
270            * R/transform.R (tmReduce): Combine multiple maps into one
271            transformation.
272    
273    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
274    
275            * R/weight.R: Remove weightLogical since it does not return a
276            dgCMatrix.
277    
278            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
279            or TermDocumentMatrix instead.
280    
281    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
282    
283            * inst/doc/extensions.Rnw: Finished vignette.
284    
285    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
286    
287            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
288            DocumentTermMatrix representations.
289    
290    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
291    
292            * R/reader.R (readXML): New reader for arbitrary XML files.
293    
294    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
295    
296            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
297            (XMLSource): New XMLSource class for arbitrary XML files.
298            (Source): New slot Vectorized.
299    
300    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
301    
302            * R/reader.R (readTabular): Experimental reader for tabular data
303            structures which can be customized via user-defined mappings.
304    
305            * R/reader.R: Always use UTC time zone.
306    
307            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
308    
309    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
310    
311            * R/reader.R (readDOC): Options can be passed over to antiword.
312    
313            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
314            pdftotext.
315    
316    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
317    
318            * R/source.R (DirSource): Add pattern and ignore.case arguments
319            which are internally passed over to list.files().
320    
321    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
322    
323            * inst/doc/tm.Rnw: Suppress pointless loading message.
324    
325    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
326    
327            * DESCRIPTION: Speed up package loading (via moving packages not
328            strictly necessary for normal operation to Suggests instead of
329            Depends).
330    
331    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
332    
333            * R/reader.R (readNewsgroup): The date format is now configurable.
334    
335    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
336    
337            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
338    
339    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
340    
341            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
342    
343    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
344    
345            * R/source.R (DataframeSource): New source class for data frames.
346    
347            * R/source.R: Fixed non-standard call evaluation.
348    
349    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
350    
351            * R/source.R (URISource): New source class for a single document.
352    
353    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
354    
355            * R/source.R: Refactoring.
356    
357    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
358    
359            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
360            Rmpi installations more gracefully.
361    
362    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
363    
364            * R/source.R (Source): Add Length slot.
365    
366    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
367    
368            * R/AAA.R: Unify duplicated .onLoad function.
369    
370    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
371    
372            * DESCRIPTION (Suggests): Added Rmpi.
373    
374    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
375    
376            * R/source.R (getElem): Fix 'no visible binding' warning.
377    
378            * man/WeightFunction.Rd: Fix signature.
379    
380    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
381    
382            * R/weight.R: Introduce name abbreviations for weighting functions.
383    
384    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
385    
386            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
387    
388            * R/cluster.R: Provide convenience functions for using a MPI
389            cluster.
390    
391            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
392            available.
393    
394            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
395            available.
396    
397    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
398    
399            * R/textdoccol.R (lapply): Removed debug print out.
400    
401    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * R/reader.R (readRCV1): Improved meta data extraction from
404            Reuters Corpus Volume 1 documents.
405    
406    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
407    
408            * R/transform.R: Ensure that all mappings preserve multiline
409            structures.
410    
411    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
412    
413            * R/filter.R: Every filter has now an attribute indicating whether
414            it sould be applied to document level (doclevel).
415    
416            * R/textdoccol.R (tmFilter): Set searchFullText as new default
417            filter.
418    
419    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
420    
421            * R/transform.R (replacePatterns): Replaced removeWords by
422            replacePatterns. Suggested by Christian Buchta.
423    
424            * R/textdoccol.R (inspect): Improved formatting.
425    
426    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
427    
428            * inst/CITATION: Updated JSS article information.
429    
430            * R/textdoccol.R (setAs): Added coerce method from list to
431            corpus.
432    
433            * R/meta.R (meta): Improved meta data handling.
434    
435    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
436    
437            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
438            Christian Buchta.
439    
440            * inst/CITATION: Added template to include JSS article reference.
441    
442    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
443    
444            * R/textdoccol.R (tmMap): Introduced lazy mapping.
445    
446            * R/source.R: Added VectorSource.
447    
448    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
449    
450            * man/: Language codes should be in ISO 639-1 format.
451    
452            * R/textdoccol.R (asPlain): Preserve local meta data.
453    
454    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * R/textdoccol.R (writeCorpus): Function for writing a corpus
457            containing plain text documents to disk.
458    
459    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
460    
461            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
462            always set correctly.
463    
464            * R/textdoccol.R: Set load = TRUE as default for load on demand
465            since in most cases this is the wanted behaviour.
466    
467    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
468    
469            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
470    
471            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
472    
473    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/meta.R (meta): New function for consistent access to meta data
476            of document collections, repositories, and texts.
477    
478    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * R/: Better support for encodings.
481    
482    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
485            selection when no reader argument is given.
486    
487    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
488    
489            * R/source.R (CSVSource): Now uses read.csv instead of scan
490            internally.
491    
492    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * R/reader.R (getReaders): Returns available reader functions.
495    
496            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
497            as default.
498    
499    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
500    
501            * R/stopwords.R (stopwords): Shortened code, removed codetools
502            variable warnings.
503    
504            * man/: Documentation for showMeta, added an example for tmMap.
505    
506            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
507            some minor typos fixed.
508    
509    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
510    
511            * R/aobjects.R (showMeta): Added method for pretty printing a
512            text document's meta data.
513    
514    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
515    
516            * R/textdoccol.R (TextDocCol): Better handling of empty
517            arguments.
518    
519            * NAMESPACE: Exported readDOC.
520    
521            * man/completeStems.Rd: Added an example.
522    
523    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
524    
525            * R/stopwords.R (stopwords): Look up .dat files at every
526            call. Allows users to modify stopword .dat files interactively.
527    
528    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
529    
530            * R/termdocmatrix.R (termFreq): Correct processing of empty
531            documents.
532    
533    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
534    
535            * man/: Updated documentation.
536    
537    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
538    
539            * R/complete.R (completeStems): Completes (heuristically) word
540            stems.
541    
542            * R/termdocmatrix.R (TermDocMatrix2): New modular
543            constructor.
544    
545            * NAMESPACE: Exported termFreq.
546    
547    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
548    
549            * R/reader.R (readDOC): Added MS Word reader (using antiword).
550    
551    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
552    
553            * R/weight.R: Weighting functions for TermDocMatrix.
554    
555    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
556    
557            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
558            functions for accessing dimension, column, and row names.
559    
560            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
561    
562    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
563    
564            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
565    
566    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
567    
568            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
569    
570    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
571    
572            * R/reader.R (readPDF): Removed manual checks for pdftotext and
573            pdfinfo. The system call gives a warning anyway.
574    
575    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
576    
577            * R/textdoccol.R (asPlain): Conversion from
578            StructuredTextDocuments to PlainTextDocuments.
579    
580    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
581    
582            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
583            for accessing term-document matrices.
584    
585            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
586            are installed.
587    
588    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
589    
590            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
591            Christian Buchta.
592    
593    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
594    
595            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
596    
597    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
598    
599            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
600    
601            * R/reader.R (readPDF): Added PDF reader.
602    
603    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
604    
605            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
606    
607            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
608    
609            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
610    
611            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
612    
613    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
614    
615            * R/distmeasure.R (dissimilarity): Replaced dists call from
616            package cba by new dist call from package proxy.
617    
618    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
619    
620            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
621    
622    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
623    
624            * R/termdocmatrix.R: require() uses the quietly option to suppress
625            loading messages.
626    
627    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
628    
629            * R/dictionary.R: Added dictionary support.
630    
631    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
632    
633            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
634            documents. This simplifies some functions, e.g., asPlain.
635    
636    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
637    
638            * inst/doc/tm.Rnw: Fixed some typos in vignette.
639    
640    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
641    
642            * R/textdoccol.R (replaceWords): Added method to replace a set of
643            words by a single word. Useful for synonyms.
644    
645    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
646    
647            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
648    
649    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
650    
651            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
652            vectors. Thanks to Ariel Maguyon for his error report.
653            (removeSparseTerms): New function to remove columns from a
654            term-document matrix exceeding a sparse factor.
655    
656    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
657    
658            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
659    
660    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
661    
662            * man/sFilter.Rd: Corrected documentation on statement format (use
663            '==' instead of '=').
664    
665    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
666    
667            * R/aobjects.R (StructuredTextDocument): Inherits from
668            TextDocument.
669    
670    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
673            on sparse matrices as proposed by Martin Maechler.
674    
675    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
678            \pkg{filehash} version makes them deprecated.
679    
680    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
681    
682            * R/termdocmatrix.R (textvector): Stemming is now performed before
683            erasing stopwords.
684            (weightMatrix): Adapted to handle sparse matrices.
685            (TermDocMatrix): Sparse matrix is now efficiently built by
686            direct stepwise insertion of row values into it.
687    
688    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
689    
690            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
691            due to ongoing problems. For our purposes the latter is as useful
692            as the replaced package.
693    
694    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
695    
696            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
697    
698            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
699    
700    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
701    
702            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
703            languages with available stopwords.
704    
705    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
706    
707            * inst/doc/tm.Rnw: Minor corrections in the vignette.
708    
709    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
710    
711            * DESCRIPTION: Update to version 0.2, since a lot of new features
712            have been integrated.
713    
714            * inst/stopwords: Updated existing stopwords and added stopwords
715            for various other languages.
716    
717    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
718    
719            * man/: Updated documentation.
720    
721            * Work/testDb.R: Script to test database stuff.
722    
723            * R/: Fixed various database related bugs. Seems to be rather
724            useable now, i.e., consider as alpha status for now.
725    
726    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
727    
728            * R/: Fixed some bugs related to database support.
729    
730    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
731    
732            * man/: Added a lot of examples to the manuals.
733    
734    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
735    
736            * man/: Updated parts of the documentation.
737    
738            * R/textdoccol.R (asPlain): Added conversion from newsgroup
739            documents to plain text documents.
740    
741    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
742    
743            * R/textdoccol.R: Finished experimental database support. Not yet
744            intensively tested.
745    
746            * R/source.R: Now each source has a default reader.
747    
748            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
749            class anymore.
750    
751            * R/plaintextdoc.R: Custom show method for plain text documents.
752    
753            * R/aobjects.R: Added a class for structured text documents.
754    
755            * R/reader.R: Replaced remaining \code{parser} occurrences with
756            \code{reader}.
757    
758            * R/textdoccol.R (summary): Indent tags.
759    
760            * R/textdoccol.R (removePunctuation): Transform method to remove
761            punctuation marks.
762    
763    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
764    
765            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
766            using prescindMeta().
767    
768    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
769    
770            * R/textdoccol.R: Improved database support.
771    
772    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
773    
774            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
775    
776            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
777            language code.
778    
779            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
780            into parserControl argument.
781    
782            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
783    
784    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
785    
786            * Work/tmDataSetup.R: The datasets acq and crude can now be
787            created on the fly.
788    
789            * R/stopwords.R: Introduced a function returning the stopwords for
790            a given language (English, German and French at the moment)
791    
792            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
793            otherwise falls back to Snowball package.
794    
795    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
796    
797            * man/dissimilarity-methods.Rd: Make clear that any method offered
798            by "dists" from package "cba" can be used.
799    
800    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
801    
802            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
803            to Kurt's latex suggestion. Removed points and underscores in
804            variable names for consistent naming.
805    
806            * DESCRIPTION: Update to version 0.1-2.
807    
808            * man/TextRepository.Rd: Fixed bug in documentation.
809    
810    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
811    
812            * DESCRIPTION: Update to version 0.1-1.
813    
814    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
815    
816            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
817            wordStem.
818    
819    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
820    
821            * R/: Changes due to Kurt's review.
822    
823    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
824    
825            * R/: Implemented improvements based upon comments by David
826            Meyer.
827    
828    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
829    
830            * inst/doc/: Rewrote vignette.
831    
832            * man/: Improved documentation.
833    
834    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
835    
836            * man/: Updated documentation.
837    
838            * DESCRIPTION: Changed package name to "tm". Updated version to
839            0.1 for first CRAN release.
840    
841            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
842            list archive example.
843    
844            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
845            archive example.
846    
847            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
848            from (several mails per box) mbox format to (single mail per file)
849            eml format.
850    
851    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
852    
853            * data/crude.rda: Rebuilt.
854    
855            * data/acq.rda: Rebuilt.
856    
857            * R/reader.R: Factored out reader and parser methods from
858            textdoccol.R.
859    
860            * R/source.R: Factored out Source methods from aobjects.R and
861            textdoccol.R.
862            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
863            feeds.
864    
865            * R/textdoccol.R (DirSource): Added support for recursive
866            traversal of directories.
867    
868    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
869    
870            * R/textdoccol.R ([[): Loads the document corpus automatically
871            into memory upon access.
872            (tm_transform, tm_filter): Removed several checks whether the
873            document is already loaded ([[ ensures this now).
874            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
875            mailing list archive.
876    
877    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
878    
879            * R/aobjects.R (TextDocument): Is now a virtual class.
880            (Source): Is now a virtual class.
881    
882    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
883    
884            * R/textdoccol.R (c): Support for an arbitrary number of document
885            collections.
886    
887    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
888    
889            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
890            append_meta and remove_meta.
891    
892            * R/textdoccol.R: Removed modify_metadata method.
893    
894            * R/textrepo.R: Removed modify_metadata method.
895    
896            * R/textdoccol.R (remove_meta): Supports removal of document
897            collection metadata and document (= in data frame) metadata.
898    
899    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
900    
901            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
902    
903            * data/crude.rda: Rebuilt.
904    
905            * data/acq.rda: Rebuilt.
906    
907            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
908    
909            * R/textdoccol.R ([): Bug fix for subsetting a document
910            collection's data frame.
911    
912    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
913    
914            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
915            to s_filter.
916    
917            * R/textdoccol.R: Local text documents' metadata can now be copied
918            to a document collection's data frame with prescind_meta.
919    
920    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
921    
922            * R/: Text documents' slot metadata is now accessible in s_filter.
923    
924            * R/: Rewrote s_filter function (has still some restrictions).
925    
926    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
927    
928            * R/: Various fixes in handling metadata.
929    
930            * R/: Added update mechanism for text document collections.
931    
932    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
933    
934            * R/: Merging of document collections now creates a binary tree
935            for reconstructing merged document collections.
936    
937            * R/: Redesign of metadata for document collections.
938    
939    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
940    
941            * R/: Messages now use \code{ngettext}.
942    
943    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
944    
945            * R/: Added functions for modifying and removing metadata.
946    
947    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
948    
949            * man/: Updated some documentation.
950    
951            * R/: Corrected some connection issues.
952    
953            * inst/doc: Worked on the vignette.
954    
955    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
956    
957            * inst/: Added texts and started vignette.
958    
959            * R/: Final changes based upon David's comments.
960    
961    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
962    
963            * NAMESPACE: Corrected exports (generic methods need exportMethods
964            directives!).
965    
966    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
967    
968            * R/: Modified the TextDocCol constructur and various parsers. It
969            is now modular and supports various file formats via plugins (see
970            the new "Source" class).
971    
972    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
973    
974            * man/: Revised documentation after previous code changes.
975    
976    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
977    
978            * R/: Remaining changes as discussed with David.
979    
980    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
981    
982            * R/: Some changes as suggested by David. The rest will follow
983            within the next days.
984    
985    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
986    
987            * man/: Finished documentation.
988    
989    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
990    
991            * man/: Wrote some documentation.
992    
993    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
994    
995            * R/: Further syntactic sugar in form of additional assignment and
996            accessor methods.
997    
998    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
999    
1000            * R/: Syntactic sugar in form of "length", "show" and "summary"
1001            operators.
1002    
1003    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1004    
1005            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1006            and dissimilarities.
1007    
1008    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1009    
1010            * R/: Added similarity functions.
1011    
1012            * data/: Added english stopwords.
1013    
1014    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1015    
1016            * data/: Examples compiled for new features
1017    
1018            * R/: Changes due to new structure.
1019    
1020            * NAMESPACE: Corrected namespace to reflect new structure.
1021    
1022            * R/termdocmatrix.R: Adapted for new naming scheme.
1023    
1024    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1025    
1026            * R/textdoccol.R: Adapted code for new class structure. Wrote
1027            several transform and filter functions operating on text document
1028            collections (alias text document databases).
1029    
1030            * R/aobjects.R: Adapted class structure with inheritance,
1031            repositories and additional meta data. Loading files on demand is
1032            now possible.
1033    
1034    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1035    
1036            * R/: Some cosmetic cleanups.
1037    
1038            * inst/: Removed vignette on clustering. That and much more is now
1039            described in the JSS paper on text mining. Based upon that
1040            article an elaborated vignette will be incorporated in the future.
1041    
1042    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1043    
1044            * R/: Updated generic S4 methods to comply with signature changes
1045            in newer versions of R (> 2.3)
1046    
1047    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1048    
1049            * ext/R/importRIS.R: Automatic RIS import is now possible.
1050    
1051    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1052    
1053            * R/textdoccol.R: Added RIS HTML input format.
1054    
1055    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1056    
1057            * R/textdoccol.R: Removed bug that caused invalid text document
1058            collections when handling many input files.
1059    
1060    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1061    
1062            * R/textdoccol.R: Restructured and extended file import
1063            mechanism.
1064    
1065            * inst/doc/clustering.Rnw: Adapted vignette for use with
1066            ReutNews.rda
1067    
1068            * man/ReutNews.Rd: Documentation for ReutNews.rda
1069    
1070            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1071    
1072    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1073    
1074            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1075            clustering facilities of this package.
1076    
1077    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1078    
1079            * R/aobjects.R: Changed package document structure to avoid class
1080            dependency problems.
1081    
1082    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1083    
1084            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1085            data set.
1086    
1087            *  Finished documentation and reordered directory structure. Now "R
1088            CMD check textmin" works without errors.
1089    
1090    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1091    
1092            * src/: Various splits can now be easily created for the
1093            Reuters21578 data set.
1094    
1095    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1096    
1097            *  Updated documentation
1098    
1099    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1100    
1101            *  Wrote R documentation for some classes and methods.
1102    
1103    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1104    
1105            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
1106            files. See the questionnaire data/Umfrage.csv for such an example.
1107            We are now able to import files in Reuters-21578 XML format.
1108    
1109            *  Changed class interfaces in various files. Weighting of the text
1110            matrix is now possible.
1111    
1112    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1113    
1114            * R/textdoccol.R: One can build term-document matrices if
1115            nessecary (with buildTDM(...)) and fill the field tdm from a text
1116            document collection with it.
1117    
1118            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1119    
1120    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1121    
1122            * R/textdoccol.R: We now can read in a whole XML file with several
1123            news items.
1124    
1125  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1126    
1127          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1147

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge