SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC pkg/ChangeLog revision 1114, Fri Nov 26 14:05:54 2010 UTC
# Line 1  Line 1 
1    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
4    
5    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
6    
7            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
8            (instead of a list element).
9    
10    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
11    
12            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
13            documents by names (fallback to IDs if names are not set).
14    
15    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
16    
17            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
18            \code{recursive} now determines whether existing corpus meta data
19            is used.
20    
21    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
22    
23            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
24    
25    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
26    
27            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
28            remove terms not occurring in the corpus anymore.
29    
30    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
31    
32            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
33            and Heaps' law.
34    
35    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
38            provided by a source.
39    
40    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/source.R (.Source): Provide document names.
43    
44    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
45    
46            * R/meta.R (`content_or_meta`): Utility function.
47    
48    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
49    
50            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
51            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
52    
53    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
54    
55            * R/weight.R (weightTfIdf): Added normalization option.
56    
57            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
58            analysis.
59    
60    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
61    
62            * R/score.R (tm_tag_score): Compute a score from the number of
63            tags matching in a document.
64    
65    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
66    
67            * R/complete.R (stemCompletion): New completion heuristics.
68    
69    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
70    
71            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
72    
73    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
74    
75            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
76            setOldClass(c(..., "list")) works.
77    
78    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
79    
80            * R/transform.R (stemDocument.character): In case input is a
81            simple character just delegate to the default Snowball stemmer.
82    
83    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
84    
85            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
86            data.
87    
88    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/doc.R (`Content<-`): Be careful with names attribute.
91    
92    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
93    
94            * R/source.R (DirSource): Improved implementation especially when
95            handling many (> 1M) files.
96    
97    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/source.R (getElem.URISource): Use encoding argument.
100    
101    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
102    
103            * R/doc.R (setOldClass): Register S3 document classes to be
104            recognized by S4 methods.
105    
106    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
107    
108            * R/matrix.R (termFreq): Add option to remove punctuation
109            characters.
110    
111    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
112    
113            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
114            merging multiple term-document matrices.
115    
116    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
117    
118            * R/corpus.R (setOldClass): Register S3 corpus classes to be
119            recognized by S4 methods.
120    
121            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
122            that CRAN Mac OS X builds do not fail any longer.
123    
124    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
125    
126            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
127            of RWeka:AlphabeticTokenizer() as default.
128    
129    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
130    
131            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
132            caused words at the beginning or the end of a line not to be removed. Do
133            not delete whitespace anymore.
134    
135    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
136    
137            * R/source.R (DirSource): Default to working directory if no path
138            is specified.
139    
140    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
141    
142            * R/source.R (DirSource): Stop on empty directories.
143    
144    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
145    
146            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
147            named documents.
148    
149    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
150    
151            * R/transform.R (removeWords): Improve regular expressions.
152    
153    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
154    
155            * R/meta.R (DublinCore): Allow lower case tags.
156    
157    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
158    
159            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
160            instead of x$children.
161    
162    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
163    
164            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
165    
166    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/: Use S3 instead of S4 class system.
169    
170    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/reader.R (readMail): Moved to tm.plugin.mail package.
173    
174    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
175    
176            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
177            postings are basically e-mails with some extra headers.
178    
179    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
180    
181            * R/transform.R: Move convertMboxEml, removeCitation,
182            removeMultipart, and removeSignature to the tm.plugin.mail package
183            since they are mainly utility functions (for handling e-mails) and
184            not very framework specific.
185    
186    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
187    
188            * man/: Fix documentation.
189    
190    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
191    
192            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
193            plain text document instead of an XML document for texts of the
194            Reuters-21578 dataset.
195    
196            * R/sparse.R: Removed since the slam package is now available on
197            CRAN.
198    
199            * DESCRIPTION (Depends): Add slam package.
200    
201    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
202    
203            * R/transform.R (stemDoc): Fix character(0) handling.
204    
205    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
206    
207            * R/doc.R (show): Pretty print.
208    
209    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
210    
211            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
212            gracefully.
213    
214    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
215    
216            * R/corpus.R: Make corpus virtual. Implement corpus with standard
217            and permanent storage semantics.
218    
219            * DESCRIPTION: New major release. A *lot* of improvements.
220    
221    2009-05-04   Ingo Feinerer <feinerer@logic.at>
222    
223            * NAMESPACE: Export some simple_triplet_matrix functions.
224    
225    2009-04-28   Ingo Feinerer <feinerer@logic.at>
226    
227            * R/weight.R: Adapt tf-idf to new matrix format.
228    
229    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
230    
231            * R/matrix.R: Create two distinct classes for term-document and
232            document-term matrices.
233    
234    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
235    
236            * R/termdocmatrix.R: No longer use Matrix package. This reduces
237            package start-up time significantly.
238    
239    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
240    
241            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
242    
243    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
244    
245            * R/transform.R (tmReduce): Combine multiple maps into one
246            transformation.
247    
248    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
249    
250            * R/weight.R: Remove weightLogical since it does not return a
251            dgCMatrix.
252    
253            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
254            or TermDocumentMatrix instead.
255    
256    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
257    
258            * inst/doc/extensions.Rnw: Finished vignette.
259    
260    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
261    
262            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
263            DocumentTermMatrix representations.
264    
265    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
266    
267            * R/reader.R (readXML): New reader for arbitrary XML files.
268    
269    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
270    
271            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
272            (XMLSource): New XMLSource class for arbitrary XML files.
273            (Source): New slot Vectorized.
274    
275    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
276    
277            * R/reader.R (readTabular): Experimental reader for tabular data
278            structures which can be customized via user-defined mappings.
279    
280            * R/reader.R: Always use UTC time zone.
281    
282            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
283    
284    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
285    
286            * R/reader.R (readDOC): Options can be passed over to antiword.
287    
288            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
289            pdftotext.
290    
291    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
292    
293            * R/source.R (DirSource): Add pattern and ignore.case arguments
294            which are internally passed over to list.files().
295    
296    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
297    
298            * inst/doc/tm.Rnw: Suppress pointless loading message.
299    
300    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
301    
302            * DESCRIPTION: Speed up package loading (via moving packages not
303            strictly necessary for normal operation to Suggests instead of
304            Depends).
305    
306    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
307    
308            * R/reader.R (readNewsgroup): The date format is now configurable.
309    
310    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
311    
312            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
313    
314    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
315    
316            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
317    
318    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
319    
320            * R/source.R (DataframeSource): New source class for data frames.
321    
322            * R/source.R: Fixed non-standard call evaluation.
323    
324    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
325    
326            * R/source.R (URISource): New source class for a single document.
327    
328    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
329    
330            * R/source.R: Refactoring.
331    
332    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
333    
334            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
335            Rmpi installations more gracefully.
336    
337    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
338    
339            * R/source.R (Source): Add Length slot.
340    
341    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
342    
343            * R/AAA.R: Unify duplicated .onLoad function.
344    
345    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
346    
347            * DESCRIPTION (Suggests): Added Rmpi.
348    
349    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
350    
351            * R/source.R (getElem): Fix 'no visible binding' warning.
352    
353            * man/WeightFunction.Rd: Fix signature.
354    
355    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
356    
357            * R/weight.R: Introduce name abbreviations for weighting functions.
358    
359    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
360    
361            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
362    
363            * R/cluster.R: Provide convenience functions for using a MPI
364            cluster.
365    
366            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
367            available.
368    
369            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
370            available.
371    
372    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
373    
374            * R/textdoccol.R (lapply): Removed debug print out.
375    
376    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
377    
378            * R/reader.R (readRCV1): Improved meta data extraction from
379            Reuters Corpus Volume 1 documents.
380    
381    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
382    
383            * R/transform.R: Ensure that all mappings preserve multiline
384            structures.
385    
386    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
387    
388            * R/filter.R: Every filter has now an attribute indicating whether
389            it sould be applied to document level (doclevel).
390    
391            * R/textdoccol.R (tmFilter): Set searchFullText as new default
392            filter.
393    
394    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
395    
396            * R/transform.R (replacePatterns): Replaced removeWords by
397            replacePatterns. Suggested by Christian Buchta.
398    
399            * R/textdoccol.R (inspect): Improved formatting.
400    
401    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * inst/CITATION: Updated JSS article information.
404    
405            * R/textdoccol.R (setAs): Added coerce method from list to
406            corpus.
407    
408            * R/meta.R (meta): Improved meta data handling.
409    
410    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
411    
412            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
413            Christian Buchta.
414    
415            * inst/CITATION: Added template to include JSS article reference.
416    
417    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
418    
419            * R/textdoccol.R (tmMap): Introduced lazy mapping.
420    
421            * R/source.R: Added VectorSource.
422    
423    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
424    
425            * man/: Language codes should be in ISO 639-1 format.
426    
427            * R/textdoccol.R (asPlain): Preserve local meta data.
428    
429    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
430    
431            * R/textdoccol.R (writeCorpus): Function for writing a corpus
432            containing plain text documents to disk.
433    
434    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
437            always set correctly.
438    
439            * R/textdoccol.R: Set load = TRUE as default for load on demand
440            since in most cases this is the wanted behaviour.
441    
442    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
443    
444            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
445    
446            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
447    
448    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
449    
450            * R/meta.R (meta): New function for consistent access to meta data
451            of document collections, repositories, and texts.
452    
453    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/: Better support for encodings.
456    
457    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
458    
459            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
460            selection when no reader argument is given.
461    
462    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
463    
464            * R/source.R (CSVSource): Now uses read.csv instead of scan
465            internally.
466    
467    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
468    
469            * R/reader.R (getReaders): Returns available reader functions.
470    
471            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
472            as default.
473    
474    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
475    
476            * R/stopwords.R (stopwords): Shortened code, removed codetools
477            variable warnings.
478    
479            * man/: Documentation for showMeta, added an example for tmMap.
480    
481            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
482            some minor typos fixed.
483    
484    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
485    
486            * R/aobjects.R (showMeta): Added method for pretty printing a
487            text document's meta data.
488    
489    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
490    
491            * R/textdoccol.R (TextDocCol): Better handling of empty
492            arguments.
493    
494            * NAMESPACE: Exported readDOC.
495    
496            * man/completeStems.Rd: Added an example.
497    
498    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * R/stopwords.R (stopwords): Look up .dat files at every
501            call. Allows users to modify stopword .dat files interactively.
502    
503    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
504    
505            * R/termdocmatrix.R (termFreq): Correct processing of empty
506            documents.
507    
508    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * man/: Updated documentation.
511    
512    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
513    
514            * R/complete.R (completeStems): Completes (heuristically) word
515            stems.
516    
517            * R/termdocmatrix.R (TermDocMatrix2): New modular
518            constructor.
519    
520            * NAMESPACE: Exported termFreq.
521    
522    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
523    
524            * R/reader.R (readDOC): Added MS Word reader (using antiword).
525    
526    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/weight.R: Weighting functions for TermDocMatrix.
529    
530    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
531    
532            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
533            functions for accessing dimension, column, and row names.
534    
535            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
536    
537    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
538    
539            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
540    
541    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
542    
543            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
544    
545    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
546    
547            * R/reader.R (readPDF): Removed manual checks for pdftotext and
548            pdfinfo. The system call gives a warning anyway.
549    
550    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
551    
552            * R/textdoccol.R (asPlain): Conversion from
553            StructuredTextDocuments to PlainTextDocuments.
554    
555    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
556    
557            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
558            for accessing term-document matrices.
559    
560            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
561            are installed.
562    
563    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
564    
565            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
566            Christian Buchta.
567    
568    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
571    
572    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
573    
574            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
575    
576            * R/reader.R (readPDF): Added PDF reader.
577    
578    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
579    
580            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
581    
582            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
583    
584            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
585    
586            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
587    
588    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
589    
590            * R/distmeasure.R (dissimilarity): Replaced dists call from
591            package cba by new dist call from package proxy.
592    
593    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
594    
595            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
596    
597    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
598    
599            * R/termdocmatrix.R: require() uses the quietly option to suppress
600            loading messages.
601    
602    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
603    
604            * R/dictionary.R: Added dictionary support.
605    
606    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
607    
608            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
609            documents. This simplifies some functions, e.g., asPlain.
610    
611    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
612    
613            * inst/doc/tm.Rnw: Fixed some typos in vignette.
614    
615    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
616    
617            * R/textdoccol.R (replaceWords): Added method to replace a set of
618            words by a single word. Useful for synonyms.
619    
620    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
621    
622            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
623    
624    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
625    
626            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
627            vectors. Thanks to Ariel Maguyon for his error report.
628            (removeSparseTerms): New function to remove columns from a
629            term-document matrix exceeding a sparse factor.
630    
631    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
632    
633            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
634    
635    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
636    
637            * man/sFilter.Rd: Corrected documentation on statement format (use
638            '==' instead of '=').
639    
640    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
641    
642            * R/aobjects.R (StructuredTextDocument): Inherits from
643            TextDocument.
644    
645    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
646    
647            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
648            on sparse matrices as proposed by Martin Maechler.
649    
650    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
651    
652            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
653            \pkg{filehash} version makes them deprecated.
654    
655    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * R/termdocmatrix.R (textvector): Stemming is now performed before
658            erasing stopwords.
659            (weightMatrix): Adapted to handle sparse matrices.
660            (TermDocMatrix): Sparse matrix is now efficiently built by
661            direct stepwise insertion of row values into it.
662    
663    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
664    
665            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
666            due to ongoing problems. For our purposes the latter is as useful
667            as the replaced package.
668    
669    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
670    
671            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
672    
673            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
674    
675    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
678            languages with available stopwords.
679    
680    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
681    
682            * inst/doc/tm.Rnw: Minor corrections in the vignette.
683    
684    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * DESCRIPTION: Update to version 0.2, since a lot of new features
687            have been integrated.
688    
689            * inst/stopwords: Updated existing stopwords and added stopwords
690            for various other languages.
691    
692    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
693    
694            * man/: Updated documentation.
695    
696            * Work/testDb.R: Script to test database stuff.
697    
698            * R/: Fixed various database related bugs. Seems to be rather
699            useable now, i.e., consider as alpha status for now.
700    
701    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
702    
703            * R/: Fixed some bugs related to database support.
704    
705    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
706    
707            * man/: Added a lot of examples to the manuals.
708    
709    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
710    
711            * man/: Updated parts of the documentation.
712    
713            * R/textdoccol.R (asPlain): Added conversion from newsgroup
714            documents to plain text documents.
715    
716    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
717    
718            * R/textdoccol.R: Finished experimental database support. Not yet
719            intensively tested.
720    
721            * R/source.R: Now each source has a default reader.
722    
723            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
724            class anymore.
725    
726            * R/plaintextdoc.R: Custom show method for plain text documents.
727    
728            * R/aobjects.R: Added a class for structured text documents.
729    
730            * R/reader.R: Replaced remaining \code{parser} occurrences with
731            \code{reader}.
732    
733            * R/textdoccol.R (summary): Indent tags.
734    
735            * R/textdoccol.R (removePunctuation): Transform method to remove
736            punctuation marks.
737    
738    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
739    
740            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
741            using prescindMeta().
742    
743    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
744    
745            * R/textdoccol.R: Improved database support.
746    
747    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
748    
749            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
750    
751            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
752            language code.
753    
754            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
755            into parserControl argument.
756    
757            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
758    
759    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
760    
761            * Work/tmDataSetup.R: The datasets acq and crude can now be
762            created on the fly.
763    
764            * R/stopwords.R: Introduced a function returning the stopwords for
765            a given language (English, German and French at the moment)
766    
767            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
768            otherwise falls back to Snowball package.
769    
770    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
771    
772            * man/dissimilarity-methods.Rd: Make clear that any method offered
773            by "dists" from package "cba" can be used.
774    
775    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
776    
777            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
778            to Kurt's latex suggestion. Removed points and underscores in
779            variable names for consistent naming.
780    
781            * DESCRIPTION: Update to version 0.1-2.
782    
783            * man/TextRepository.Rd: Fixed bug in documentation.
784    
785    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
786    
787            * DESCRIPTION: Update to version 0.1-1.
788    
789    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
790    
791            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
792            wordStem.
793    
794    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
795    
796            * R/: Changes due to Kurt's review.
797    
798    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
799    
800            * R/: Implemented improvements based upon comments by David
801            Meyer.
802    
803    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
804    
805            * inst/doc/: Rewrote vignette.
806    
807            * man/: Improved documentation.
808    
809    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
810    
811            * man/: Updated documentation.
812    
813            * DESCRIPTION: Changed package name to "tm". Updated version to
814            0.1 for first CRAN release.
815    
816            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
817            list archive example.
818    
819            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
820            archive example.
821    
822            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
823            from (several mails per box) mbox format to (single mail per file)
824            eml format.
825    
826    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
827    
828            * data/crude.rda: Rebuilt.
829    
830            * data/acq.rda: Rebuilt.
831    
832            * R/reader.R: Factored out reader and parser methods from
833            textdoccol.R.
834    
835            * R/source.R: Factored out Source methods from aobjects.R and
836            textdoccol.R.
837            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
838            feeds.
839    
840            * R/textdoccol.R (DirSource): Added support for recursive
841            traversal of directories.
842    
843    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
844    
845            * R/textdoccol.R ([[): Loads the document corpus automatically
846            into memory upon access.
847            (tm_transform, tm_filter): Removed several checks whether the
848            document is already loaded ([[ ensures this now).
849            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
850            mailing list archive.
851    
852    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
853    
854            * R/aobjects.R (TextDocument): Is now a virtual class.
855            (Source): Is now a virtual class.
856    
857    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
858    
859            * R/textdoccol.R (c): Support for an arbitrary number of document
860            collections.
861    
862    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
863    
864            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
865            append_meta and remove_meta.
866    
867            * R/textdoccol.R: Removed modify_metadata method.
868    
869            * R/textrepo.R: Removed modify_metadata method.
870    
871            * R/textdoccol.R (remove_meta): Supports removal of document
872            collection metadata and document (= in data frame) metadata.
873    
874    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
875    
876            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
877    
878            * data/crude.rda: Rebuilt.
879    
880            * data/acq.rda: Rebuilt.
881    
882            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
883    
884            * R/textdoccol.R ([): Bug fix for subsetting a document
885            collection's data frame.
886    
887    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
888    
889            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
890            to s_filter.
891    
892            * R/textdoccol.R: Local text documents' metadata can now be copied
893            to a document collection's data frame with prescind_meta.
894    
895    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
896    
897            * R/: Text documents' slot metadata is now accessible in s_filter.
898    
899            * R/: Rewrote s_filter function (has still some restrictions).
900    
901    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
902    
903            * R/: Various fixes in handling metadata.
904    
905            * R/: Added update mechanism for text document collections.
906    
907    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
908    
909            * R/: Merging of document collections now creates a binary tree
910            for reconstructing merged document collections.
911    
912            * R/: Redesign of metadata for document collections.
913    
914    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
915    
916            * R/: Messages now use \code{ngettext}.
917    
918    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
919    
920            * R/: Added functions for modifying and removing metadata.
921    
922    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
923    
924            * man/: Updated some documentation.
925    
926            * R/: Corrected some connection issues.
927    
928            * inst/doc: Worked on the vignette.
929    
930    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
931    
932            * inst/: Added texts and started vignette.
933    
934            * R/: Final changes based upon David's comments.
935    
936    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
937    
938            * NAMESPACE: Corrected exports (generic methods need exportMethods
939            directives!).
940    
941    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
942    
943            * R/: Modified the TextDocCol constructur and various parsers. It
944            is now modular and supports various file formats via plugins (see
945            the new "Source" class).
946    
947    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
948    
949            * man/: Revised documentation after previous code changes.
950    
951    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
952    
953            * R/: Remaining changes as discussed with David.
954    
955    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
956    
957            * R/: Some changes as suggested by David. The rest will follow
958            within the next days.
959    
960    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
961    
962            * man/: Finished documentation.
963    
964    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
965    
966            * man/: Wrote some documentation.
967    
968    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
969    
970            * R/: Further syntactic sugar in form of additional assignment and
971            accessor methods.
972    
973    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
974    
975            * R/: Syntactic sugar in form of "length", "show" and "summary"
976            operators.
977    
978    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
979    
980            * R/: Diverse updates. Mainly on default operators ("[" or "c")
981            and dissimilarities.
982    
983    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * R/: Added similarity functions.
986    
987            * data/: Added english stopwords.
988    
989    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
990    
991            * data/: Examples compiled for new features
992    
993            * R/: Changes due to new structure.
994    
995            * NAMESPACE: Corrected namespace to reflect new structure.
996    
997            * R/termdocmatrix.R: Adapted for new naming scheme.
998    
999    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1000    
1001            * R/textdoccol.R: Adapted code for new class structure. Wrote
1002            several transform and filter functions operating on text document
1003            collections (alias text document databases).
1004    
1005            * R/aobjects.R: Adapted class structure with inheritance,
1006            repositories and additional meta data. Loading files on demand is
1007            now possible.
1008    
1009    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1010    
1011            * R/: Some cosmetic cleanups.
1012    
1013            * inst/: Removed vignette on clustering. That and much more is now
1014            described in the JSS paper on text mining. Based upon that
1015            article an elaborated vignette will be incorporated in the future.
1016    
1017    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1018    
1019            * R/: Updated generic S4 methods to comply with signature changes
1020            in newer versions of R (> 2.3)
1021    
1022    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1023    
1024            * ext/R/importRIS.R: Automatic RIS import is now possible.
1025    
1026    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1027    
1028            * R/textdoccol.R: Added RIS HTML input format.
1029    
1030    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1031    
1032            * R/textdoccol.R: Removed bug that caused invalid text document
1033            collections when handling many input files.
1034    
1035    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1036    
1037            * R/textdoccol.R: Restructured and extended file import
1038            mechanism.
1039    
1040            * inst/doc/clustering.Rnw: Adapted vignette for use with
1041            ReutNews.rda
1042    
1043            * man/ReutNews.Rd: Documentation for ReutNews.rda
1044    
1045            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1046    
1047    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1048    
1049            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1050            clustering facilities of this package.
1051    
1052    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1053    
1054            * R/aobjects.R: Changed package document structure to avoid class
1055            dependency problems.
1056    
1057  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1058    
1059            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1060            data set.
1061    
1062          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
1063          CMD check textmin" works without errors.          CMD check textmin" works without errors.
1064    

Legend:
Removed from v.28  
changed lines
  Added in v.1114

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge