SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC pkg/ChangeLog revision 1070, Tue May 18 08:58:22 2010 UTC
# Line 1  Line 1 
1    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
4            provided by a source.
5    
6    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/source.R (.Source): Provide document names.
9    
10    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
11    
12            * R/meta.R (`content_or_meta`): Utility function.
13    
14    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
15    
16            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
17            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
18    
19    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
20    
21            * R/weight.R (weightTfIdf): Added normalization option.
22    
23            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
24            analysis.
25    
26    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
27    
28            * R/score.R (tm_tag_score): Compute a score from the number of
29            tags matching in a document.
30    
31    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
32    
33            * R/complete.R (stemCompletion): New completion heuristics.
34    
35    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
38    
39    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
40    
41            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
42            setOldClass(c(..., "list")) works.
43    
44    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
45    
46            * R/transform.R (stemDocument.character): In case input is a
47            simple character just delegate to the default Snowball stemmer.
48    
49    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
50    
51            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
52            data.
53    
54    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
55    
56            * R/doc.R (`Content<-`): Be careful with names attribute.
57    
58    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
59    
60            * R/source.R (DirSource): Improved implementation especially when
61            handling many (> 1M) files.
62    
63    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/source.R (getElem.URISource): Use encoding argument.
66    
67    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
68    
69            * R/doc.R (setOldClass): Register S3 document classes to be
70            recognized by S4 methods.
71    
72    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
73    
74            * R/matrix.R (termFreq): Add option to remove punctuation
75            characters.
76    
77    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
78    
79            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
80            merging multiple term-document matrices.
81    
82    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
83    
84            * R/corpus.R (setOldClass): Register S3 corpus classes to be
85            recognized by S4 methods.
86    
87            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
88            that CRAN Mac OS X builds do not fail any longer.
89    
90    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
91    
92            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
93            of RWeka:AlphabeticTokenizer() as default.
94    
95    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
96    
97            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
98            caused words at the beginning or the end of a line not to be removed. Do
99            not delete whitespace anymore.
100    
101    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
102    
103            * R/source.R (DirSource): Default to working directory if no path
104            is specified.
105    
106    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
107    
108            * R/source.R (DirSource): Stop on empty directories.
109    
110    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
111    
112            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
113            named documents.
114    
115    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
116    
117            * R/transform.R (removeWords): Improve regular expressions.
118    
119    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
120    
121            * R/meta.R (DublinCore): Allow lower case tags.
122    
123    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
124    
125            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
126            instead of x$children.
127    
128    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
129    
130            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
131    
132    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
133    
134            * R/: Use S3 instead of S4 class system.
135    
136    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
137    
138            * R/reader.R (readMail): Moved to tm.plugin.mail package.
139    
140    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
141    
142            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
143            postings are basically e-mails with some extra headers.
144    
145    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
146    
147            * R/transform.R: Move convertMboxEml, removeCitation,
148            removeMultipart, and removeSignature to the tm.plugin.mail package
149            since they are mainly utility functions (for handling e-mails) and
150            not very framework specific.
151    
152    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
153    
154            * man/: Fix documentation.
155    
156    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
157    
158            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
159            plain text document instead of an XML document for texts of the
160            Reuters-21578 dataset.
161    
162            * R/sparse.R: Removed since the slam package is now available on
163            CRAN.
164    
165            * DESCRIPTION (Depends): Add slam package.
166    
167    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
168    
169            * R/transform.R (stemDoc): Fix character(0) handling.
170    
171    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
172    
173            * R/doc.R (show): Pretty print.
174    
175    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
178            gracefully.
179    
180    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
181    
182            * R/corpus.R: Make corpus virtual. Implement corpus with standard
183            and permanent storage semantics.
184    
185            * DESCRIPTION: New major release. A *lot* of improvements.
186    
187    2009-05-04   Ingo Feinerer <feinerer@logic.at>
188    
189            * NAMESPACE: Export some simple_triplet_matrix functions.
190    
191    2009-04-28   Ingo Feinerer <feinerer@logic.at>
192    
193            * R/weight.R: Adapt tf-idf to new matrix format.
194    
195    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
196    
197            * R/matrix.R: Create two distinct classes for term-document and
198            document-term matrices.
199    
200    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
201    
202            * R/termdocmatrix.R: No longer use Matrix package. This reduces
203            package start-up time significantly.
204    
205    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
206    
207            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
208    
209    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
210    
211            * R/transform.R (tmReduce): Combine multiple maps into one
212            transformation.
213    
214    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
215    
216            * R/weight.R: Remove weightLogical since it does not return a
217            dgCMatrix.
218    
219            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
220            or TermDocumentMatrix instead.
221    
222    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
223    
224            * inst/doc/extensions.Rnw: Finished vignette.
225    
226    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
227    
228            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
229            DocumentTermMatrix representations.
230    
231    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
232    
233            * R/reader.R (readXML): New reader for arbitrary XML files.
234    
235    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
236    
237            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
238            (XMLSource): New XMLSource class for arbitrary XML files.
239            (Source): New slot Vectorized.
240    
241    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
242    
243            * R/reader.R (readTabular): Experimental reader for tabular data
244            structures which can be customized via user-defined mappings.
245    
246            * R/reader.R: Always use UTC time zone.
247    
248            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
249    
250    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
251    
252            * R/reader.R (readDOC): Options can be passed over to antiword.
253    
254            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
255            pdftotext.
256    
257    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
258    
259            * R/source.R (DirSource): Add pattern and ignore.case arguments
260            which are internally passed over to list.files().
261    
262    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
263    
264            * inst/doc/tm.Rnw: Suppress pointless loading message.
265    
266    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
267    
268            * DESCRIPTION: Speed up package loading (via moving packages not
269            strictly necessary for normal operation to Suggests instead of
270            Depends).
271    
272    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
273    
274            * R/reader.R (readNewsgroup): The date format is now configurable.
275    
276    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
277    
278            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
279    
280    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
281    
282            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
283    
284    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
285    
286            * R/source.R (DataframeSource): New source class for data frames.
287    
288            * R/source.R: Fixed non-standard call evaluation.
289    
290    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
291    
292            * R/source.R (URISource): New source class for a single document.
293    
294    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
295    
296            * R/source.R: Refactoring.
297    
298    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
299    
300            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
301            Rmpi installations more gracefully.
302    
303    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
304    
305            * R/source.R (Source): Add Length slot.
306    
307    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
308    
309            * R/AAA.R: Unify duplicated .onLoad function.
310    
311    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
312    
313            * DESCRIPTION (Suggests): Added Rmpi.
314    
315    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
316    
317            * R/source.R (getElem): Fix 'no visible binding' warning.
318    
319            * man/WeightFunction.Rd: Fix signature.
320    
321    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/weight.R: Introduce name abbreviations for weighting functions.
324    
325    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
326    
327            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
328    
329            * R/cluster.R: Provide convenience functions for using a MPI
330            cluster.
331    
332            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
333            available.
334    
335            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
336            available.
337    
338    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
339    
340            * R/textdoccol.R (lapply): Removed debug print out.
341    
342    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
343    
344            * R/reader.R (readRCV1): Improved meta data extraction from
345            Reuters Corpus Volume 1 documents.
346    
347    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
348    
349            * R/transform.R: Ensure that all mappings preserve multiline
350            structures.
351    
352    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
353    
354            * R/filter.R: Every filter has now an attribute indicating whether
355            it sould be applied to document level (doclevel).
356    
357            * R/textdoccol.R (tmFilter): Set searchFullText as new default
358            filter.
359    
360    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
361    
362            * R/transform.R (replacePatterns): Replaced removeWords by
363            replacePatterns. Suggested by Christian Buchta.
364    
365            * R/textdoccol.R (inspect): Improved formatting.
366    
367    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
368    
369            * inst/CITATION: Updated JSS article information.
370    
371            * R/textdoccol.R (setAs): Added coerce method from list to
372            corpus.
373    
374            * R/meta.R (meta): Improved meta data handling.
375    
376    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
377    
378            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
379            Christian Buchta.
380    
381            * inst/CITATION: Added template to include JSS article reference.
382    
383    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
384    
385            * R/textdoccol.R (tmMap): Introduced lazy mapping.
386    
387            * R/source.R: Added VectorSource.
388    
389    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
390    
391            * man/: Language codes should be in ISO 639-1 format.
392    
393            * R/textdoccol.R (asPlain): Preserve local meta data.
394    
395    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
396    
397            * R/textdoccol.R (writeCorpus): Function for writing a corpus
398            containing plain text documents to disk.
399    
400    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
401    
402            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
403            always set correctly.
404    
405            * R/textdoccol.R: Set load = TRUE as default for load on demand
406            since in most cases this is the wanted behaviour.
407    
408    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
409    
410            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
411    
412            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
413    
414    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
415    
416            * R/meta.R (meta): New function for consistent access to meta data
417            of document collections, repositories, and texts.
418    
419    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
420    
421            * R/: Better support for encodings.
422    
423    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
424    
425            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
426            selection when no reader argument is given.
427    
428    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
429    
430            * R/source.R (CSVSource): Now uses read.csv instead of scan
431            internally.
432    
433    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
434    
435            * R/reader.R (getReaders): Returns available reader functions.
436    
437            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
438            as default.
439    
440    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
441    
442            * R/stopwords.R (stopwords): Shortened code, removed codetools
443            variable warnings.
444    
445            * man/: Documentation for showMeta, added an example for tmMap.
446    
447            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
448            some minor typos fixed.
449    
450    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
451    
452            * R/aobjects.R (showMeta): Added method for pretty printing a
453            text document's meta data.
454    
455    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
456    
457            * R/textdoccol.R (TextDocCol): Better handling of empty
458            arguments.
459    
460            * NAMESPACE: Exported readDOC.
461    
462            * man/completeStems.Rd: Added an example.
463    
464    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
465    
466            * R/stopwords.R (stopwords): Look up .dat files at every
467            call. Allows users to modify stopword .dat files interactively.
468    
469    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
470    
471            * R/termdocmatrix.R (termFreq): Correct processing of empty
472            documents.
473    
474    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
475    
476            * man/: Updated documentation.
477    
478    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * R/complete.R (completeStems): Completes (heuristically) word
481            stems.
482    
483            * R/termdocmatrix.R (TermDocMatrix2): New modular
484            constructor.
485    
486            * NAMESPACE: Exported termFreq.
487    
488    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
489    
490            * R/reader.R (readDOC): Added MS Word reader (using antiword).
491    
492    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * R/weight.R: Weighting functions for TermDocMatrix.
495    
496    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
497    
498            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
499            functions for accessing dimension, column, and row names.
500    
501            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
502    
503    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
504    
505            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
506    
507    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
508    
509            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
510    
511    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
512    
513            * R/reader.R (readPDF): Removed manual checks for pdftotext and
514            pdfinfo. The system call gives a warning anyway.
515    
516    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/textdoccol.R (asPlain): Conversion from
519            StructuredTextDocuments to PlainTextDocuments.
520    
521    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
524            for accessing term-document matrices.
525    
526            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
527            are installed.
528    
529    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
530    
531            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
532            Christian Buchta.
533    
534    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
535    
536            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
537    
538    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
539    
540            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
541    
542            * R/reader.R (readPDF): Added PDF reader.
543    
544    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
545    
546            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
547    
548            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
549    
550            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
551    
552            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
553    
554    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * R/distmeasure.R (dissimilarity): Replaced dists call from
557            package cba by new dist call from package proxy.
558    
559    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
560    
561            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
562    
563    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
564    
565            * R/termdocmatrix.R: require() uses the quietly option to suppress
566            loading messages.
567    
568    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/dictionary.R: Added dictionary support.
571    
572    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
573    
574            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
575            documents. This simplifies some functions, e.g., asPlain.
576    
577    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
578    
579            * inst/doc/tm.Rnw: Fixed some typos in vignette.
580    
581    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
582    
583            * R/textdoccol.R (replaceWords): Added method to replace a set of
584            words by a single word. Useful for synonyms.
585    
586    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
587    
588            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
589    
590    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
591    
592            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
593            vectors. Thanks to Ariel Maguyon for his error report.
594            (removeSparseTerms): New function to remove columns from a
595            term-document matrix exceeding a sparse factor.
596    
597    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
598    
599            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
600    
601    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
602    
603            * man/sFilter.Rd: Corrected documentation on statement format (use
604            '==' instead of '=').
605    
606    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
607    
608            * R/aobjects.R (StructuredTextDocument): Inherits from
609            TextDocument.
610    
611    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
612    
613            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
614            on sparse matrices as proposed by Martin Maechler.
615    
616    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
617    
618            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
619            \pkg{filehash} version makes them deprecated.
620    
621    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
622    
623            * R/termdocmatrix.R (textvector): Stemming is now performed before
624            erasing stopwords.
625            (weightMatrix): Adapted to handle sparse matrices.
626            (TermDocMatrix): Sparse matrix is now efficiently built by
627            direct stepwise insertion of row values into it.
628    
629    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
630    
631            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
632            due to ongoing problems. For our purposes the latter is as useful
633            as the replaced package.
634    
635    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
636    
637            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
638    
639            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
640    
641    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
642    
643            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
644            languages with available stopwords.
645    
646    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
647    
648            * inst/doc/tm.Rnw: Minor corrections in the vignette.
649    
650    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
651    
652            * DESCRIPTION: Update to version 0.2, since a lot of new features
653            have been integrated.
654    
655            * inst/stopwords: Updated existing stopwords and added stopwords
656            for various other languages.
657    
658    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
659    
660            * man/: Updated documentation.
661    
662            * Work/testDb.R: Script to test database stuff.
663    
664            * R/: Fixed various database related bugs. Seems to be rather
665            useable now, i.e., consider as alpha status for now.
666    
667    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
668    
669            * R/: Fixed some bugs related to database support.
670    
671    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
672    
673            * man/: Added a lot of examples to the manuals.
674    
675    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * man/: Updated parts of the documentation.
678    
679            * R/textdoccol.R (asPlain): Added conversion from newsgroup
680            documents to plain text documents.
681    
682    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
683    
684            * R/textdoccol.R: Finished experimental database support. Not yet
685            intensively tested.
686    
687            * R/source.R: Now each source has a default reader.
688    
689            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
690            class anymore.
691    
692            * R/plaintextdoc.R: Custom show method for plain text documents.
693    
694            * R/aobjects.R: Added a class for structured text documents.
695    
696            * R/reader.R: Replaced remaining \code{parser} occurrences with
697            \code{reader}.
698    
699            * R/textdoccol.R (summary): Indent tags.
700    
701            * R/textdoccol.R (removePunctuation): Transform method to remove
702            punctuation marks.
703    
704    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
705    
706            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
707            using prescindMeta().
708    
709    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
710    
711            * R/textdoccol.R: Improved database support.
712    
713    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
714    
715            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
716    
717            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
718            language code.
719    
720            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
721            into parserControl argument.
722    
723            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
724    
725    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
726    
727            * Work/tmDataSetup.R: The datasets acq and crude can now be
728            created on the fly.
729    
730            * R/stopwords.R: Introduced a function returning the stopwords for
731            a given language (English, German and French at the moment)
732    
733            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
734            otherwise falls back to Snowball package.
735    
736    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
737    
738            * man/dissimilarity-methods.Rd: Make clear that any method offered
739            by "dists" from package "cba" can be used.
740    
741    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
742    
743            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
744            to Kurt's latex suggestion. Removed points and underscores in
745            variable names for consistent naming.
746    
747            * DESCRIPTION: Update to version 0.1-2.
748    
749            * man/TextRepository.Rd: Fixed bug in documentation.
750    
751    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
752    
753            * DESCRIPTION: Update to version 0.1-1.
754    
755    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
756    
757            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
758            wordStem.
759    
760    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
761    
762            * R/: Changes due to Kurt's review.
763    
764    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
765    
766            * R/: Implemented improvements based upon comments by David
767            Meyer.
768    
769    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
770    
771            * inst/doc/: Rewrote vignette.
772    
773            * man/: Improved documentation.
774    
775    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
776    
777            * man/: Updated documentation.
778    
779            * DESCRIPTION: Changed package name to "tm". Updated version to
780            0.1 for first CRAN release.
781    
782            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
783            list archive example.
784    
785            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
786            archive example.
787    
788            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
789            from (several mails per box) mbox format to (single mail per file)
790            eml format.
791    
792    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
793    
794            * data/crude.rda: Rebuilt.
795    
796            * data/acq.rda: Rebuilt.
797    
798            * R/reader.R: Factored out reader and parser methods from
799            textdoccol.R.
800    
801            * R/source.R: Factored out Source methods from aobjects.R and
802            textdoccol.R.
803            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
804            feeds.
805    
806            * R/textdoccol.R (DirSource): Added support for recursive
807            traversal of directories.
808    
809    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
810    
811            * R/textdoccol.R ([[): Loads the document corpus automatically
812            into memory upon access.
813            (tm_transform, tm_filter): Removed several checks whether the
814            document is already loaded ([[ ensures this now).
815            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
816            mailing list archive.
817    
818    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
819    
820            * R/aobjects.R (TextDocument): Is now a virtual class.
821            (Source): Is now a virtual class.
822    
823    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
824    
825            * R/textdoccol.R (c): Support for an arbitrary number of document
826            collections.
827    
828    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
829    
830            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
831            append_meta and remove_meta.
832    
833            * R/textdoccol.R: Removed modify_metadata method.
834    
835            * R/textrepo.R: Removed modify_metadata method.
836    
837            * R/textdoccol.R (remove_meta): Supports removal of document
838            collection metadata and document (= in data frame) metadata.
839    
840    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
841    
842            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
843    
844            * data/crude.rda: Rebuilt.
845    
846            * data/acq.rda: Rebuilt.
847    
848            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
849    
850            * R/textdoccol.R ([): Bug fix for subsetting a document
851            collection's data frame.
852    
853    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
854    
855            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
856            to s_filter.
857    
858            * R/textdoccol.R: Local text documents' metadata can now be copied
859            to a document collection's data frame with prescind_meta.
860    
861    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
862    
863            * R/: Text documents' slot metadata is now accessible in s_filter.
864    
865            * R/: Rewrote s_filter function (has still some restrictions).
866    
867    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * R/: Various fixes in handling metadata.
870    
871            * R/: Added update mechanism for text document collections.
872    
873    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
874    
875            * R/: Merging of document collections now creates a binary tree
876            for reconstructing merged document collections.
877    
878            * R/: Redesign of metadata for document collections.
879    
880    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
881    
882            * R/: Messages now use \code{ngettext}.
883    
884    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
885    
886            * R/: Added functions for modifying and removing metadata.
887    
888    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
889    
890            * man/: Updated some documentation.
891    
892            * R/: Corrected some connection issues.
893    
894            * inst/doc: Worked on the vignette.
895    
896    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
897    
898            * inst/: Added texts and started vignette.
899    
900            * R/: Final changes based upon David's comments.
901    
902    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
903    
904            * NAMESPACE: Corrected exports (generic methods need exportMethods
905            directives!).
906    
907    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
908    
909            * R/: Modified the TextDocCol constructur and various parsers. It
910            is now modular and supports various file formats via plugins (see
911            the new "Source" class).
912    
913    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
914    
915            * man/: Revised documentation after previous code changes.
916    
917    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
918    
919            * R/: Remaining changes as discussed with David.
920    
921    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
922    
923            * R/: Some changes as suggested by David. The rest will follow
924            within the next days.
925    
926    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
927    
928            * man/: Finished documentation.
929    
930    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
931    
932            * man/: Wrote some documentation.
933    
934    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
935    
936            * R/: Further syntactic sugar in form of additional assignment and
937            accessor methods.
938    
939    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
940    
941            * R/: Syntactic sugar in form of "length", "show" and "summary"
942            operators.
943    
944    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
945    
946            * R/: Diverse updates. Mainly on default operators ("[" or "c")
947            and dissimilarities.
948    
949    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
950    
951            * R/: Added similarity functions.
952    
953            * data/: Added english stopwords.
954    
955    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
956    
957            * data/: Examples compiled for new features
958    
959            * R/: Changes due to new structure.
960    
961            * NAMESPACE: Corrected namespace to reflect new structure.
962    
963            * R/termdocmatrix.R: Adapted for new naming scheme.
964    
965    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
966    
967            * R/textdoccol.R: Adapted code for new class structure. Wrote
968            several transform and filter functions operating on text document
969            collections (alias text document databases).
970    
971            * R/aobjects.R: Adapted class structure with inheritance,
972            repositories and additional meta data. Loading files on demand is
973            now possible.
974    
975    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
976    
977            * R/: Some cosmetic cleanups.
978    
979            * inst/: Removed vignette on clustering. That and much more is now
980            described in the JSS paper on text mining. Based upon that
981            article an elaborated vignette will be incorporated in the future.
982    
983    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * R/: Updated generic S4 methods to comply with signature changes
986            in newer versions of R (> 2.3)
987    
988    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
989    
990            * ext/R/importRIS.R: Automatic RIS import is now possible.
991    
992    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
993    
994            * R/textdoccol.R: Added RIS HTML input format.
995    
996    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
997    
998            * R/textdoccol.R: Removed bug that caused invalid text document
999            collections when handling many input files.
1000    
1001    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1002    
1003            * R/textdoccol.R: Restructured and extended file import
1004            mechanism.
1005    
1006            * inst/doc/clustering.Rnw: Adapted vignette for use with
1007            ReutNews.rda
1008    
1009            * man/ReutNews.Rd: Documentation for ReutNews.rda
1010    
1011            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1012    
1013  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1014    
1015          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.1070

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge