SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1029, Tue Dec 22 13:40:25 2009 UTC
# Line 1  Line 1 
1    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/source.R (getElem.URISource): Use encoding argument.
4    
5    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
6    
7            * R/doc.R (setOldClass): Register S3 document classes to be
8            recognized by S4 methods.
9    
10    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
11    
12            * R/matrix.R (termFreq): Add option to remove punctuation
13            characters.
14    
15    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
16    
17            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
18            merging multiple term-document matrices.
19    
20    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
21    
22            * R/corpus.R (setOldClass): Register S3 corpus classes to be
23            recognized by S4 methods.
24    
25            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
26            that CRAN Mac OS X builds do not fail any longer.
27    
28    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
29    
30            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
31            of RWeka:AlphabeticTokenizer() as default.
32    
33    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
34    
35            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
36            caused words at the beginning or the end of a line not to be removed. Do
37            not delete whitespace anymore.
38    
39    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
40    
41            * R/source.R (DirSource): Default to working directory if no path
42            is specified.
43    
44    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
45    
46            * R/source.R (DirSource): Stop on empty directories.
47    
48    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
49    
50            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
51            named documents.
52    
53    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
54    
55            * R/transform.R (removeWords): Improve regular expressions.
56    
57    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
58    
59            * R/meta.R (DublinCore): Allow lower case tags.
60    
61    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
62    
63            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
64            instead of x$children.
65    
66    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
67    
68            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
69    
70    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
71    
72            * R/: Use S3 instead of S4 class system.
73    
74    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
75    
76            * R/reader.R (readMail): Moved to tm.plugin.mail package.
77    
78    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
79    
80            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
81            postings are basically e-mails with some extra headers.
82    
83    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
84    
85            * R/transform.R: Move convertMboxEml, removeCitation,
86            removeMultipart, and removeSignature to the tm.plugin.mail package
87            since they are mainly utility functions (for handling e-mails) and
88            not very framework specific.
89    
90    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
91    
92            * man/: Fix documentation.
93    
94    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
95    
96            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
97            plain text document instead of an XML document for texts of the
98            Reuters-21578 dataset.
99    
100            * R/sparse.R: Removed since the slam package is now available on
101            CRAN.
102    
103            * DESCRIPTION (Depends): Add slam package.
104    
105    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
106    
107            * R/transform.R (stemDoc): Fix character(0) handling.
108    
109    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
110    
111            * R/doc.R (show): Pretty print.
112    
113    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
114    
115            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
116            gracefully.
117    
118    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
119    
120            * R/corpus.R: Make corpus virtual. Implement corpus with standard
121            and permanent storage semantics.
122    
123            * DESCRIPTION: New major release. A *lot* of improvements.
124    
125    2009-05-04   Ingo Feinerer <feinerer@logic.at>
126    
127            * NAMESPACE: Export some simple_triplet_matrix functions.
128    
129    2009-04-28   Ingo Feinerer <feinerer@logic.at>
130    
131            * R/weight.R: Adapt tf-idf to new matrix format.
132    
133    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
134    
135            * R/matrix.R: Create two distinct classes for term-document and
136            document-term matrices.
137    
138    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
139    
140            * R/termdocmatrix.R: No longer use Matrix package. This reduces
141            package start-up time significantly.
142    
143    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
144    
145            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
146    
147    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
148    
149            * R/transform.R (tmReduce): Combine multiple maps into one
150            transformation.
151    
152    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
153    
154            * R/weight.R: Remove weightLogical since it does not return a
155            dgCMatrix.
156    
157            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
158            or TermDocumentMatrix instead.
159    
160    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
161    
162            * inst/doc/extensions.Rnw: Finished vignette.
163    
164    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
165    
166            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
167            DocumentTermMatrix representations.
168    
169    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
170    
171            * R/reader.R (readXML): New reader for arbitrary XML files.
172    
173    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
174    
175            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
176            (XMLSource): New XMLSource class for arbitrary XML files.
177            (Source): New slot Vectorized.
178    
179    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
180    
181            * R/reader.R (readTabular): Experimental reader for tabular data
182            structures which can be customized via user-defined mappings.
183    
184            * R/reader.R: Always use UTC time zone.
185    
186            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
187    
188    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
189    
190            * R/reader.R (readDOC): Options can be passed over to antiword.
191    
192            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
193            pdftotext.
194    
195    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
196    
197            * R/source.R (DirSource): Add pattern and ignore.case arguments
198            which are internally passed over to list.files().
199    
200    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
201    
202            * inst/doc/tm.Rnw: Suppress pointless loading message.
203    
204    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
205    
206            * DESCRIPTION: Speed up package loading (via moving packages not
207            strictly necessary for normal operation to Suggests instead of
208            Depends).
209    
210    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
211    
212            * R/reader.R (readNewsgroup): The date format is now configurable.
213    
214    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
215    
216            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
217    
218    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
219    
220            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
221    
222    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
223    
224            * R/source.R (DataframeSource): New source class for data frames.
225    
226            * R/source.R: Fixed non-standard call evaluation.
227    
228    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
229    
230            * R/source.R (URISource): New source class for a single document.
231    
232    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
233    
234            * R/source.R: Refactoring.
235    
236    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
237    
238            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
239            Rmpi installations more gracefully.
240    
241    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
242    
243            * R/source.R (Source): Add Length slot.
244    
245    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
246    
247            * R/AAA.R: Unify duplicated .onLoad function.
248    
249    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
250    
251            * DESCRIPTION (Suggests): Added Rmpi.
252    
253    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
254    
255            * R/source.R (getElem): Fix 'no visible binding' warning.
256    
257            * man/WeightFunction.Rd: Fix signature.
258    
259    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
260    
261            * R/weight.R: Introduce name abbreviations for weighting functions.
262    
263    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
264    
265            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
266    
267            * R/cluster.R: Provide convenience functions for using a MPI
268            cluster.
269    
270            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
271            available.
272    
273            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
274            available.
275    
276    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
277    
278            * R/textdoccol.R (lapply): Removed debug print out.
279    
280    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
281    
282            * R/reader.R (readRCV1): Improved meta data extraction from
283            Reuters Corpus Volume 1 documents.
284    
285    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
286    
287            * R/transform.R: Ensure that all mappings preserve multiline
288            structures.
289    
290    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
291    
292            * R/filter.R: Every filter has now an attribute indicating whether
293            it sould be applied to document level (doclevel).
294    
295            * R/textdoccol.R (tmFilter): Set searchFullText as new default
296            filter.
297    
298    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
299    
300            * R/transform.R (replacePatterns): Replaced removeWords by
301            replacePatterns. Suggested by Christian Buchta.
302    
303            * R/textdoccol.R (inspect): Improved formatting.
304    
305    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
306    
307            * inst/CITATION: Updated JSS article information.
308    
309            * R/textdoccol.R (setAs): Added coerce method from list to
310            corpus.
311    
312            * R/meta.R (meta): Improved meta data handling.
313    
314    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
315    
316            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
317            Christian Buchta.
318    
319            * inst/CITATION: Added template to include JSS article reference.
320    
321    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
322    
323            * R/textdoccol.R (tmMap): Introduced lazy mapping.
324    
325            * R/source.R: Added VectorSource.
326    
327    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
328    
329            * man/: Language codes should be in ISO 639-1 format.
330    
331            * R/textdoccol.R (asPlain): Preserve local meta data.
332    
333    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
334    
335            * R/textdoccol.R (writeCorpus): Function for writing a corpus
336            containing plain text documents to disk.
337    
338    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
339    
340            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
341            always set correctly.
342    
343            * R/textdoccol.R: Set load = TRUE as default for load on demand
344            since in most cases this is the wanted behaviour.
345    
346    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
347    
348            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
349    
350            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
351    
352    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
353    
354            * R/meta.R (meta): New function for consistent access to meta data
355            of document collections, repositories, and texts.
356    
357    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
358    
359            * R/: Better support for encodings.
360    
361    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
362    
363            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
364            selection when no reader argument is given.
365    
366    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
367    
368            * R/source.R (CSVSource): Now uses read.csv instead of scan
369            internally.
370    
371    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
372    
373            * R/reader.R (getReaders): Returns available reader functions.
374    
375            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
376            as default.
377    
378    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
379    
380            * R/stopwords.R (stopwords): Shortened code, removed codetools
381            variable warnings.
382    
383            * man/: Documentation for showMeta, added an example for tmMap.
384    
385            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
386            some minor typos fixed.
387    
388    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
389    
390            * R/aobjects.R (showMeta): Added method for pretty printing a
391            text document's meta data.
392    
393    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
394    
395            * R/textdoccol.R (TextDocCol): Better handling of empty
396            arguments.
397    
398            * NAMESPACE: Exported readDOC.
399    
400            * man/completeStems.Rd: Added an example.
401    
402    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
403    
404            * R/stopwords.R (stopwords): Look up .dat files at every
405            call. Allows users to modify stopword .dat files interactively.
406    
407    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
408    
409            * R/termdocmatrix.R (termFreq): Correct processing of empty
410            documents.
411    
412    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
413    
414            * man/: Updated documentation.
415    
416    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
417    
418            * R/complete.R (completeStems): Completes (heuristically) word
419            stems.
420    
421            * R/termdocmatrix.R (TermDocMatrix2): New modular
422            constructor.
423    
424            * NAMESPACE: Exported termFreq.
425    
426    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
427    
428            * R/reader.R (readDOC): Added MS Word reader (using antiword).
429    
430    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
431    
432            * R/weight.R: Weighting functions for TermDocMatrix.
433    
434    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
437            functions for accessing dimension, column, and row names.
438    
439            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
440    
441    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
442    
443            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
444    
445    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
446    
447            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
448    
449    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
450    
451            * R/reader.R (readPDF): Removed manual checks for pdftotext and
452            pdfinfo. The system call gives a warning anyway.
453    
454    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * R/textdoccol.R (asPlain): Conversion from
457            StructuredTextDocuments to PlainTextDocuments.
458    
459    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
460    
461            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
462            for accessing term-document matrices.
463    
464            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
465            are installed.
466    
467    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
468    
469            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
470            Christian Buchta.
471    
472    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
473    
474            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
475    
476    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
477    
478            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
479    
480            * R/reader.R (readPDF): Added PDF reader.
481    
482    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
485    
486            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
487    
488            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
489    
490            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
491    
492    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * R/distmeasure.R (dissimilarity): Replaced dists call from
495            package cba by new dist call from package proxy.
496    
497    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
498    
499            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
500    
501    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
502    
503            * R/termdocmatrix.R: require() uses the quietly option to suppress
504            loading messages.
505    
506    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * R/dictionary.R: Added dictionary support.
509    
510    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
511    
512            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
513            documents. This simplifies some functions, e.g., asPlain.
514    
515    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
516    
517            * inst/doc/tm.Rnw: Fixed some typos in vignette.
518    
519    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
520    
521            * R/textdoccol.R (replaceWords): Added method to replace a set of
522            words by a single word. Useful for synonyms.
523    
524    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
525    
526            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
527    
528    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
529    
530            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
531            vectors. Thanks to Ariel Maguyon for his error report.
532            (removeSparseTerms): New function to remove columns from a
533            term-document matrix exceeding a sparse factor.
534    
535    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
536    
537            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
538    
539    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
540    
541            * man/sFilter.Rd: Corrected documentation on statement format (use
542            '==' instead of '=').
543    
544    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
545    
546            * R/aobjects.R (StructuredTextDocument): Inherits from
547            TextDocument.
548    
549    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
550    
551            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
552            on sparse matrices as proposed by Martin Maechler.
553    
554    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
557            \pkg{filehash} version makes them deprecated.
558    
559    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
560    
561            * R/termdocmatrix.R (textvector): Stemming is now performed before
562            erasing stopwords.
563            (weightMatrix): Adapted to handle sparse matrices.
564            (TermDocMatrix): Sparse matrix is now efficiently built by
565            direct stepwise insertion of row values into it.
566    
567    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
568    
569            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
570            due to ongoing problems. For our purposes the latter is as useful
571            as the replaced package.
572    
573    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
576    
577            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
578    
579    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
580    
581            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
582            languages with available stopwords.
583    
584    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
585    
586            * inst/doc/tm.Rnw: Minor corrections in the vignette.
587    
588    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
589    
590            * DESCRIPTION: Update to version 0.2, since a lot of new features
591            have been integrated.
592    
593            * inst/stopwords: Updated existing stopwords and added stopwords
594            for various other languages.
595    
596    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
597    
598            * man/: Updated documentation.
599    
600            * Work/testDb.R: Script to test database stuff.
601    
602            * R/: Fixed various database related bugs. Seems to be rather
603            useable now, i.e., consider as alpha status for now.
604    
605    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
606    
607            * R/: Fixed some bugs related to database support.
608    
609    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
610    
611            * man/: Added a lot of examples to the manuals.
612    
613    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
614    
615            * man/: Updated parts of the documentation.
616    
617            * R/textdoccol.R (asPlain): Added conversion from newsgroup
618            documents to plain text documents.
619    
620    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
621    
622            * R/textdoccol.R: Finished experimental database support. Not yet
623            intensively tested.
624    
625            * R/source.R: Now each source has a default reader.
626    
627            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
628            class anymore.
629    
630            * R/plaintextdoc.R: Custom show method for plain text documents.
631    
632            * R/aobjects.R: Added a class for structured text documents.
633    
634            * R/reader.R: Replaced remaining \code{parser} occurrences with
635            \code{reader}.
636    
637            * R/textdoccol.R (summary): Indent tags.
638    
639            * R/textdoccol.R (removePunctuation): Transform method to remove
640            punctuation marks.
641    
642    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
643    
644            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
645            using prescindMeta().
646    
647    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
648    
649            * R/textdoccol.R: Improved database support.
650    
651    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
652    
653            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
654    
655            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
656            language code.
657    
658            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
659            into parserControl argument.
660    
661            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
662    
663    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
664    
665            * Work/tmDataSetup.R: The datasets acq and crude can now be
666            created on the fly.
667    
668            * R/stopwords.R: Introduced a function returning the stopwords for
669            a given language (English, German and French at the moment)
670    
671            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
672            otherwise falls back to Snowball package.
673    
674    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
675    
676            * man/dissimilarity-methods.Rd: Make clear that any method offered
677            by "dists" from package "cba" can be used.
678    
679    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
680    
681            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
682            to Kurt's latex suggestion. Removed points and underscores in
683            variable names for consistent naming.
684    
685            * DESCRIPTION: Update to version 0.1-2.
686    
687            * man/TextRepository.Rd: Fixed bug in documentation.
688    
689    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
690    
691            * DESCRIPTION: Update to version 0.1-1.
692    
693    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
694    
695            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
696            wordStem.
697    
698    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
699    
700            * R/: Changes due to Kurt's review.
701    
702    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
703    
704            * R/: Implemented improvements based upon comments by David
705            Meyer.
706    
707    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
708    
709            * inst/doc/: Rewrote vignette.
710    
711            * man/: Improved documentation.
712    
713    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
714    
715            * man/: Updated documentation.
716    
717            * DESCRIPTION: Changed package name to "tm". Updated version to
718            0.1 for first CRAN release.
719    
720            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
721            list archive example.
722    
723            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
724            archive example.
725    
726            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
727            from (several mails per box) mbox format to (single mail per file)
728            eml format.
729    
730    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
731    
732            * data/crude.rda: Rebuilt.
733    
734            * data/acq.rda: Rebuilt.
735    
736            * R/reader.R: Factored out reader and parser methods from
737            textdoccol.R.
738    
739            * R/source.R: Factored out Source methods from aobjects.R and
740            textdoccol.R.
741            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
742            feeds.
743    
744            * R/textdoccol.R (DirSource): Added support for recursive
745            traversal of directories.
746    
747    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
748    
749            * R/textdoccol.R ([[): Loads the document corpus automatically
750            into memory upon access.
751            (tm_transform, tm_filter): Removed several checks whether the
752            document is already loaded ([[ ensures this now).
753            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
754            mailing list archive.
755    
756    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
757    
758            * R/aobjects.R (TextDocument): Is now a virtual class.
759            (Source): Is now a virtual class.
760    
761    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * R/textdoccol.R (c): Support for an arbitrary number of document
764            collections.
765    
766    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
767    
768            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
769            append_meta and remove_meta.
770    
771            * R/textdoccol.R: Removed modify_metadata method.
772    
773            * R/textrepo.R: Removed modify_metadata method.
774    
775            * R/textdoccol.R (remove_meta): Supports removal of document
776            collection metadata and document (= in data frame) metadata.
777    
778    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
779    
780            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
781    
782            * data/crude.rda: Rebuilt.
783    
784            * data/acq.rda: Rebuilt.
785    
786            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
787    
788            * R/textdoccol.R ([): Bug fix for subsetting a document
789            collection's data frame.
790    
791    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
792    
793            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
794            to s_filter.
795    
796            * R/textdoccol.R: Local text documents' metadata can now be copied
797            to a document collection's data frame with prescind_meta.
798    
799    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
800    
801            * R/: Text documents' slot metadata is now accessible in s_filter.
802    
803            * R/: Rewrote s_filter function (has still some restrictions).
804    
805    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
806    
807            * R/: Various fixes in handling metadata.
808    
809            * R/: Added update mechanism for text document collections.
810    
811    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
812    
813            * R/: Merging of document collections now creates a binary tree
814            for reconstructing merged document collections.
815    
816            * R/: Redesign of metadata for document collections.
817    
818    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
819    
820            * R/: Messages now use \code{ngettext}.
821    
822    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
823    
824            * R/: Added functions for modifying and removing metadata.
825    
826    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
827    
828            * man/: Updated some documentation.
829    
830            * R/: Corrected some connection issues.
831    
832            * inst/doc: Worked on the vignette.
833    
834    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
835    
836            * inst/: Added texts and started vignette.
837    
838            * R/: Final changes based upon David's comments.
839    
840    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
841    
842            * NAMESPACE: Corrected exports (generic methods need exportMethods
843            directives!).
844    
845    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
846    
847            * R/: Modified the TextDocCol constructur and various parsers. It
848            is now modular and supports various file formats via plugins (see
849            the new "Source" class).
850    
851    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
852    
853            * man/: Revised documentation after previous code changes.
854    
855    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
856    
857            * R/: Remaining changes as discussed with David.
858    
859    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
860    
861            * R/: Some changes as suggested by David. The rest will follow
862            within the next days.
863    
864    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
865    
866            * man/: Finished documentation.
867    
868    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
869    
870            * man/: Wrote some documentation.
871    
872    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
873    
874            * R/: Further syntactic sugar in form of additional assignment and
875            accessor methods.
876    
877    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
878    
879            * R/: Syntactic sugar in form of "length", "show" and "summary"
880            operators.
881    
882    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
883    
884            * R/: Diverse updates. Mainly on default operators ("[" or "c")
885            and dissimilarities.
886    
887    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
888    
889            * R/: Added similarity functions.
890    
891            * data/: Added english stopwords.
892    
893    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
894    
895            * data/: Examples compiled for new features
896    
897            * R/: Changes due to new structure.
898    
899            * NAMESPACE: Corrected namespace to reflect new structure.
900    
901            * R/termdocmatrix.R: Adapted for new naming scheme.
902    
903    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
904    
905            * R/textdoccol.R: Adapted code for new class structure. Wrote
906            several transform and filter functions operating on text document
907            collections (alias text document databases).
908    
909            * R/aobjects.R: Adapted class structure with inheritance,
910            repositories and additional meta data. Loading files on demand is
911            now possible.
912    
913    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
914    
915            * R/: Some cosmetic cleanups.
916    
917            * inst/: Removed vignette on clustering. That and much more is now
918            described in the JSS paper on text mining. Based upon that
919            article an elaborated vignette will be incorporated in the future.
920    
921    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
922    
923            * R/: Updated generic S4 methods to comply with signature changes
924            in newer versions of R (> 2.3)
925    
926    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
927    
928            * ext/R/importRIS.R: Automatic RIS import is now possible.
929    
930    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
931    
932            * R/textdoccol.R: Added RIS HTML input format.
933    
934    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
935    
936            * R/textdoccol.R: Removed bug that caused invalid text document
937            collections when handling many input files.
938    
939    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
940    
941            * R/textdoccol.R: Restructured and extended file import
942            mechanism.
943    
944            * inst/doc/clustering.Rnw: Adapted vignette for use with
945            ReutNews.rda
946    
947            * man/ReutNews.Rd: Documentation for ReutNews.rda
948    
949            * data/ReutNews.rda: A tiny Reuters21578 example data set.
950    
951    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
952    
953            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
954            clustering facilities of this package.
955    
956    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
957    
958            * R/aobjects.R: Changed package document structure to avoid class
959            dependency problems.
960    
961    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
962    
963            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
964            data set.
965    
966            *  Finished documentation and reordered directory structure. Now "R
967            CMD check textmin" works without errors.
968    
969    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
970    
971            * src/: Various splits can now be easily created for the
972            Reuters21578 data set.
973    
974    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
975    
976            *  Updated documentation
977    
978    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
979    
980            *  Wrote R documentation for some classes and methods.
981    
982    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
983    
984            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
985            files. See the questionnaire data/Umfrage.csv for such an example.
986            We are now able to import files in Reuters-21578 XML format.
987    
988            *  Changed class interfaces in various files. Weighting of the text
989            matrix is now possible.
990    
991    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
992    
993            * R/textdoccol.R: One can build term-document matrices if
994            nessecary (with buildTDM(...)) and fill the field tdm from a text
995            document collection with it.
996    
997            * R/textmatrix.R: Wrote S4 class for term-document matrices.
998    
999    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1000    
1001            * R/textdoccol.R: We now can read in a whole XML file with several
1002            news items.
1003    
1004  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1005    
1006          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1029

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge