SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1063, Fri Apr 9 10:36:39 2010 UTC
# Line 1  Line 1 
1    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/source.R (.Source): Provide document names.
4    
5    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
6    
7            * R/meta.R (`content_or_meta`): Utility function.
8    
9    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
10    
11            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
12            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
13    
14    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
15    
16            * R/weight.R (weightTfIdf): Added normalization option.
17    
18            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
19            analysis.
20    
21    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
22    
23            * R/score.R (tm_tag_score): Compute a score from the number of
24            tags matching in a document.
25    
26    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
27    
28            * R/complete.R (stemCompletion): New completion heuristics.
29    
30    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
31    
32            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
33    
34    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
35    
36            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
37            setOldClass(c(..., "list")) works.
38    
39    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
40    
41            * R/transform.R (stemDocument.character): In case input is a
42            simple character just delegate to the default Snowball stemmer.
43    
44    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
45    
46            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
47            data.
48    
49    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
50    
51            * R/doc.R (`Content<-`): Be careful with names attribute.
52    
53    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
54    
55            * R/source.R (DirSource): Improved implementation especially when
56            handling many (> 1M) files.
57    
58    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
59    
60            * R/source.R (getElem.URISource): Use encoding argument.
61    
62    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
63    
64            * R/doc.R (setOldClass): Register S3 document classes to be
65            recognized by S4 methods.
66    
67    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
68    
69            * R/matrix.R (termFreq): Add option to remove punctuation
70            characters.
71    
72    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
73    
74            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
75            merging multiple term-document matrices.
76    
77    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
78    
79            * R/corpus.R (setOldClass): Register S3 corpus classes to be
80            recognized by S4 methods.
81    
82            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
83            that CRAN Mac OS X builds do not fail any longer.
84    
85    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
86    
87            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
88            of RWeka:AlphabeticTokenizer() as default.
89    
90    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
91    
92            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
93            caused words at the beginning or the end of a line not to be removed. Do
94            not delete whitespace anymore.
95    
96    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
97    
98            * R/source.R (DirSource): Default to working directory if no path
99            is specified.
100    
101    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
102    
103            * R/source.R (DirSource): Stop on empty directories.
104    
105    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
106    
107            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
108            named documents.
109    
110    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
111    
112            * R/transform.R (removeWords): Improve regular expressions.
113    
114    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
115    
116            * R/meta.R (DublinCore): Allow lower case tags.
117    
118    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
119    
120            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
121            instead of x$children.
122    
123    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
124    
125            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
126    
127    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
128    
129            * R/: Use S3 instead of S4 class system.
130    
131    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/reader.R (readMail): Moved to tm.plugin.mail package.
134    
135    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
136    
137            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
138            postings are basically e-mails with some extra headers.
139    
140    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
141    
142            * R/transform.R: Move convertMboxEml, removeCitation,
143            removeMultipart, and removeSignature to the tm.plugin.mail package
144            since they are mainly utility functions (for handling e-mails) and
145            not very framework specific.
146    
147    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
148    
149            * man/: Fix documentation.
150    
151    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
152    
153            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
154            plain text document instead of an XML document for texts of the
155            Reuters-21578 dataset.
156    
157            * R/sparse.R: Removed since the slam package is now available on
158            CRAN.
159    
160            * DESCRIPTION (Depends): Add slam package.
161    
162    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
163    
164            * R/transform.R (stemDoc): Fix character(0) handling.
165    
166    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/doc.R (show): Pretty print.
169    
170    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
173            gracefully.
174    
175    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/corpus.R: Make corpus virtual. Implement corpus with standard
178            and permanent storage semantics.
179    
180            * DESCRIPTION: New major release. A *lot* of improvements.
181    
182    2009-05-04   Ingo Feinerer <feinerer@logic.at>
183    
184            * NAMESPACE: Export some simple_triplet_matrix functions.
185    
186    2009-04-28   Ingo Feinerer <feinerer@logic.at>
187    
188            * R/weight.R: Adapt tf-idf to new matrix format.
189    
190    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
191    
192            * R/matrix.R: Create two distinct classes for term-document and
193            document-term matrices.
194    
195    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
196    
197            * R/termdocmatrix.R: No longer use Matrix package. This reduces
198            package start-up time significantly.
199    
200    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
201    
202            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
203    
204    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
205    
206            * R/transform.R (tmReduce): Combine multiple maps into one
207            transformation.
208    
209    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
210    
211            * R/weight.R: Remove weightLogical since it does not return a
212            dgCMatrix.
213    
214            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
215            or TermDocumentMatrix instead.
216    
217    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
218    
219            * inst/doc/extensions.Rnw: Finished vignette.
220    
221    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
222    
223            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
224            DocumentTermMatrix representations.
225    
226    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
227    
228            * R/reader.R (readXML): New reader for arbitrary XML files.
229    
230    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
231    
232            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
233            (XMLSource): New XMLSource class for arbitrary XML files.
234            (Source): New slot Vectorized.
235    
236    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
237    
238            * R/reader.R (readTabular): Experimental reader for tabular data
239            structures which can be customized via user-defined mappings.
240    
241            * R/reader.R: Always use UTC time zone.
242    
243            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
244    
245    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
246    
247            * R/reader.R (readDOC): Options can be passed over to antiword.
248    
249            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
250            pdftotext.
251    
252    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
253    
254            * R/source.R (DirSource): Add pattern and ignore.case arguments
255            which are internally passed over to list.files().
256    
257    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
258    
259            * inst/doc/tm.Rnw: Suppress pointless loading message.
260    
261    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
262    
263            * DESCRIPTION: Speed up package loading (via moving packages not
264            strictly necessary for normal operation to Suggests instead of
265            Depends).
266    
267    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
268    
269            * R/reader.R (readNewsgroup): The date format is now configurable.
270    
271    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
272    
273            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
274    
275    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
276    
277            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
278    
279    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
280    
281            * R/source.R (DataframeSource): New source class for data frames.
282    
283            * R/source.R: Fixed non-standard call evaluation.
284    
285    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
286    
287            * R/source.R (URISource): New source class for a single document.
288    
289    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
290    
291            * R/source.R: Refactoring.
292    
293    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
294    
295            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
296            Rmpi installations more gracefully.
297    
298    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
299    
300            * R/source.R (Source): Add Length slot.
301    
302    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
303    
304            * R/AAA.R: Unify duplicated .onLoad function.
305    
306    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
307    
308            * DESCRIPTION (Suggests): Added Rmpi.
309    
310    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
311    
312            * R/source.R (getElem): Fix 'no visible binding' warning.
313    
314            * man/WeightFunction.Rd: Fix signature.
315    
316    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
317    
318            * R/weight.R: Introduce name abbreviations for weighting functions.
319    
320    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
321    
322            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
323    
324            * R/cluster.R: Provide convenience functions for using a MPI
325            cluster.
326    
327            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
328            available.
329    
330            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
331            available.
332    
333    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
334    
335            * R/textdoccol.R (lapply): Removed debug print out.
336    
337    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
338    
339            * R/reader.R (readRCV1): Improved meta data extraction from
340            Reuters Corpus Volume 1 documents.
341    
342    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
343    
344            * R/transform.R: Ensure that all mappings preserve multiline
345            structures.
346    
347    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
348    
349            * R/filter.R: Every filter has now an attribute indicating whether
350            it sould be applied to document level (doclevel).
351    
352            * R/textdoccol.R (tmFilter): Set searchFullText as new default
353            filter.
354    
355    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
356    
357            * R/transform.R (replacePatterns): Replaced removeWords by
358            replacePatterns. Suggested by Christian Buchta.
359    
360            * R/textdoccol.R (inspect): Improved formatting.
361    
362    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
363    
364            * inst/CITATION: Updated JSS article information.
365    
366            * R/textdoccol.R (setAs): Added coerce method from list to
367            corpus.
368    
369            * R/meta.R (meta): Improved meta data handling.
370    
371    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
372    
373            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
374            Christian Buchta.
375    
376            * inst/CITATION: Added template to include JSS article reference.
377    
378    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
379    
380            * R/textdoccol.R (tmMap): Introduced lazy mapping.
381    
382            * R/source.R: Added VectorSource.
383    
384    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
385    
386            * man/: Language codes should be in ISO 639-1 format.
387    
388            * R/textdoccol.R (asPlain): Preserve local meta data.
389    
390    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
391    
392            * R/textdoccol.R (writeCorpus): Function for writing a corpus
393            containing plain text documents to disk.
394    
395    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
396    
397            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
398            always set correctly.
399    
400            * R/textdoccol.R: Set load = TRUE as default for load on demand
401            since in most cases this is the wanted behaviour.
402    
403    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
404    
405            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
406    
407            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
408    
409    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
410    
411            * R/meta.R (meta): New function for consistent access to meta data
412            of document collections, repositories, and texts.
413    
414    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
415    
416            * R/: Better support for encodings.
417    
418    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
419    
420            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
421            selection when no reader argument is given.
422    
423    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
424    
425            * R/source.R (CSVSource): Now uses read.csv instead of scan
426            internally.
427    
428    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
429    
430            * R/reader.R (getReaders): Returns available reader functions.
431    
432            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
433            as default.
434    
435    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
436    
437            * R/stopwords.R (stopwords): Shortened code, removed codetools
438            variable warnings.
439    
440            * man/: Documentation for showMeta, added an example for tmMap.
441    
442            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
443            some minor typos fixed.
444    
445    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
446    
447            * R/aobjects.R (showMeta): Added method for pretty printing a
448            text document's meta data.
449    
450    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
451    
452            * R/textdoccol.R (TextDocCol): Better handling of empty
453            arguments.
454    
455            * NAMESPACE: Exported readDOC.
456    
457            * man/completeStems.Rd: Added an example.
458    
459    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
460    
461            * R/stopwords.R (stopwords): Look up .dat files at every
462            call. Allows users to modify stopword .dat files interactively.
463    
464    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
465    
466            * R/termdocmatrix.R (termFreq): Correct processing of empty
467            documents.
468    
469    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
470    
471            * man/: Updated documentation.
472    
473    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/complete.R (completeStems): Completes (heuristically) word
476            stems.
477    
478            * R/termdocmatrix.R (TermDocMatrix2): New modular
479            constructor.
480    
481            * NAMESPACE: Exported termFreq.
482    
483    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
484    
485            * R/reader.R (readDOC): Added MS Word reader (using antiword).
486    
487    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
488    
489            * R/weight.R: Weighting functions for TermDocMatrix.
490    
491    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
492    
493            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
494            functions for accessing dimension, column, and row names.
495    
496            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
497    
498    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
501    
502    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
503    
504            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
505    
506    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * R/reader.R (readPDF): Removed manual checks for pdftotext and
509            pdfinfo. The system call gives a warning anyway.
510    
511    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
512    
513            * R/textdoccol.R (asPlain): Conversion from
514            StructuredTextDocuments to PlainTextDocuments.
515    
516    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
519            for accessing term-document matrices.
520    
521            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
522            are installed.
523    
524    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
525    
526            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
527            Christian Buchta.
528    
529    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
530    
531            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
532    
533    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
534    
535            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
536    
537            * R/reader.R (readPDF): Added PDF reader.
538    
539    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
540    
541            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
542    
543            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
544    
545            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
546    
547            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
548    
549    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
550    
551            * R/distmeasure.R (dissimilarity): Replaced dists call from
552            package cba by new dist call from package proxy.
553    
554    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
557    
558    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
559    
560            * R/termdocmatrix.R: require() uses the quietly option to suppress
561            loading messages.
562    
563    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
564    
565            * R/dictionary.R: Added dictionary support.
566    
567    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
568    
569            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
570            documents. This simplifies some functions, e.g., asPlain.
571    
572    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
573    
574            * inst/doc/tm.Rnw: Fixed some typos in vignette.
575    
576    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
577    
578            * R/textdoccol.R (replaceWords): Added method to replace a set of
579            words by a single word. Useful for synonyms.
580    
581    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
582    
583            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
584    
585    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
586    
587            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
588            vectors. Thanks to Ariel Maguyon for his error report.
589            (removeSparseTerms): New function to remove columns from a
590            term-document matrix exceeding a sparse factor.
591    
592    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
593    
594            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
595    
596    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
597    
598            * man/sFilter.Rd: Corrected documentation on statement format (use
599            '==' instead of '=').
600    
601    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
602    
603            * R/aobjects.R (StructuredTextDocument): Inherits from
604            TextDocument.
605    
606    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
607    
608            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
609            on sparse matrices as proposed by Martin Maechler.
610    
611    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
612    
613            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
614            \pkg{filehash} version makes them deprecated.
615    
616    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
617    
618            * R/termdocmatrix.R (textvector): Stemming is now performed before
619            erasing stopwords.
620            (weightMatrix): Adapted to handle sparse matrices.
621            (TermDocMatrix): Sparse matrix is now efficiently built by
622            direct stepwise insertion of row values into it.
623    
624    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
625    
626            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
627            due to ongoing problems. For our purposes the latter is as useful
628            as the replaced package.
629    
630    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
631    
632            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
633    
634            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
635    
636    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
637    
638            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
639            languages with available stopwords.
640    
641    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
642    
643            * inst/doc/tm.Rnw: Minor corrections in the vignette.
644    
645    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
646    
647            * DESCRIPTION: Update to version 0.2, since a lot of new features
648            have been integrated.
649    
650            * inst/stopwords: Updated existing stopwords and added stopwords
651            for various other languages.
652    
653    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
654    
655            * man/: Updated documentation.
656    
657            * Work/testDb.R: Script to test database stuff.
658    
659            * R/: Fixed various database related bugs. Seems to be rather
660            useable now, i.e., consider as alpha status for now.
661    
662    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
663    
664            * R/: Fixed some bugs related to database support.
665    
666    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
667    
668            * man/: Added a lot of examples to the manuals.
669    
670    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * man/: Updated parts of the documentation.
673    
674            * R/textdoccol.R (asPlain): Added conversion from newsgroup
675            documents to plain text documents.
676    
677    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
678    
679            * R/textdoccol.R: Finished experimental database support. Not yet
680            intensively tested.
681    
682            * R/source.R: Now each source has a default reader.
683    
684            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
685            class anymore.
686    
687            * R/plaintextdoc.R: Custom show method for plain text documents.
688    
689            * R/aobjects.R: Added a class for structured text documents.
690    
691            * R/reader.R: Replaced remaining \code{parser} occurrences with
692            \code{reader}.
693    
694            * R/textdoccol.R (summary): Indent tags.
695    
696            * R/textdoccol.R (removePunctuation): Transform method to remove
697            punctuation marks.
698    
699    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
700    
701            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
702            using prescindMeta().
703    
704    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
705    
706            * R/textdoccol.R: Improved database support.
707    
708    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
709    
710            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
711    
712            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
713            language code.
714    
715            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
716            into parserControl argument.
717    
718            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
719    
720    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
721    
722            * Work/tmDataSetup.R: The datasets acq and crude can now be
723            created on the fly.
724    
725            * R/stopwords.R: Introduced a function returning the stopwords for
726            a given language (English, German and French at the moment)
727    
728            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
729            otherwise falls back to Snowball package.
730    
731    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
732    
733            * man/dissimilarity-methods.Rd: Make clear that any method offered
734            by "dists" from package "cba" can be used.
735    
736    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
737    
738            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
739            to Kurt's latex suggestion. Removed points and underscores in
740            variable names for consistent naming.
741    
742            * DESCRIPTION: Update to version 0.1-2.
743    
744            * man/TextRepository.Rd: Fixed bug in documentation.
745    
746    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
747    
748            * DESCRIPTION: Update to version 0.1-1.
749    
750    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
751    
752            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
753            wordStem.
754    
755    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
756    
757            * R/: Changes due to Kurt's review.
758    
759    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
760    
761            * R/: Implemented improvements based upon comments by David
762            Meyer.
763    
764    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
765    
766            * inst/doc/: Rewrote vignette.
767    
768            * man/: Improved documentation.
769    
770    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
771    
772            * man/: Updated documentation.
773    
774            * DESCRIPTION: Changed package name to "tm". Updated version to
775            0.1 for first CRAN release.
776    
777            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
778            list archive example.
779    
780            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
781            archive example.
782    
783            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
784            from (several mails per box) mbox format to (single mail per file)
785            eml format.
786    
787    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
788    
789            * data/crude.rda: Rebuilt.
790    
791            * data/acq.rda: Rebuilt.
792    
793            * R/reader.R: Factored out reader and parser methods from
794            textdoccol.R.
795    
796            * R/source.R: Factored out Source methods from aobjects.R and
797            textdoccol.R.
798            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
799            feeds.
800    
801            * R/textdoccol.R (DirSource): Added support for recursive
802            traversal of directories.
803    
804    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
805    
806            * R/textdoccol.R ([[): Loads the document corpus automatically
807            into memory upon access.
808            (tm_transform, tm_filter): Removed several checks whether the
809            document is already loaded ([[ ensures this now).
810            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
811            mailing list archive.
812    
813    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
814    
815            * R/aobjects.R (TextDocument): Is now a virtual class.
816            (Source): Is now a virtual class.
817    
818    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
819    
820            * R/textdoccol.R (c): Support for an arbitrary number of document
821            collections.
822    
823    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
824    
825            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
826            append_meta and remove_meta.
827    
828            * R/textdoccol.R: Removed modify_metadata method.
829    
830            * R/textrepo.R: Removed modify_metadata method.
831    
832            * R/textdoccol.R (remove_meta): Supports removal of document
833            collection metadata and document (= in data frame) metadata.
834    
835    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
836    
837            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
838    
839            * data/crude.rda: Rebuilt.
840    
841            * data/acq.rda: Rebuilt.
842    
843            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
844    
845            * R/textdoccol.R ([): Bug fix for subsetting a document
846            collection's data frame.
847    
848    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
849    
850            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
851            to s_filter.
852    
853            * R/textdoccol.R: Local text documents' metadata can now be copied
854            to a document collection's data frame with prescind_meta.
855    
856    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
857    
858            * R/: Text documents' slot metadata is now accessible in s_filter.
859    
860            * R/: Rewrote s_filter function (has still some restrictions).
861    
862    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
863    
864            * R/: Various fixes in handling metadata.
865    
866            * R/: Added update mechanism for text document collections.
867    
868    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
869    
870            * R/: Merging of document collections now creates a binary tree
871            for reconstructing merged document collections.
872    
873            * R/: Redesign of metadata for document collections.
874    
875    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
876    
877            * R/: Messages now use \code{ngettext}.
878    
879    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
880    
881            * R/: Added functions for modifying and removing metadata.
882    
883    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
884    
885            * man/: Updated some documentation.
886    
887            * R/: Corrected some connection issues.
888    
889            * inst/doc: Worked on the vignette.
890    
891    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
892    
893            * inst/: Added texts and started vignette.
894    
895            * R/: Final changes based upon David's comments.
896    
897    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
898    
899            * NAMESPACE: Corrected exports (generic methods need exportMethods
900            directives!).
901    
902    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
903    
904            * R/: Modified the TextDocCol constructur and various parsers. It
905            is now modular and supports various file formats via plugins (see
906            the new "Source" class).
907    
908    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
909    
910            * man/: Revised documentation after previous code changes.
911    
912    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
913    
914            * R/: Remaining changes as discussed with David.
915    
916    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
917    
918            * R/: Some changes as suggested by David. The rest will follow
919            within the next days.
920    
921    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
922    
923            * man/: Finished documentation.
924    
925    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
926    
927            * man/: Wrote some documentation.
928    
929    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
930    
931            * R/: Further syntactic sugar in form of additional assignment and
932            accessor methods.
933    
934    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
935    
936            * R/: Syntactic sugar in form of "length", "show" and "summary"
937            operators.
938    
939    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
940    
941            * R/: Diverse updates. Mainly on default operators ("[" or "c")
942            and dissimilarities.
943    
944    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
945    
946            * R/: Added similarity functions.
947    
948            * data/: Added english stopwords.
949    
950    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
951    
952            * data/: Examples compiled for new features
953    
954            * R/: Changes due to new structure.
955    
956            * NAMESPACE: Corrected namespace to reflect new structure.
957    
958            * R/termdocmatrix.R: Adapted for new naming scheme.
959    
960    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
961    
962            * R/textdoccol.R: Adapted code for new class structure. Wrote
963            several transform and filter functions operating on text document
964            collections (alias text document databases).
965    
966            * R/aobjects.R: Adapted class structure with inheritance,
967            repositories and additional meta data. Loading files on demand is
968            now possible.
969    
970    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
971    
972            * R/: Some cosmetic cleanups.
973    
974            * inst/: Removed vignette on clustering. That and much more is now
975            described in the JSS paper on text mining. Based upon that
976            article an elaborated vignette will be incorporated in the future.
977    
978    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
979    
980            * R/: Updated generic S4 methods to comply with signature changes
981            in newer versions of R (> 2.3)
982    
983    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * ext/R/importRIS.R: Automatic RIS import is now possible.
986    
987    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
988    
989            * R/textdoccol.R: Added RIS HTML input format.
990    
991    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
992    
993            * R/textdoccol.R: Removed bug that caused invalid text document
994            collections when handling many input files.
995    
996    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
997    
998            * R/textdoccol.R: Restructured and extended file import
999            mechanism.
1000    
1001            * inst/doc/clustering.Rnw: Adapted vignette for use with
1002            ReutNews.rda
1003    
1004            * man/ReutNews.Rd: Documentation for ReutNews.rda
1005    
1006            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1007    
1008    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1009    
1010            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1011            clustering facilities of this package.
1012    
1013    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1014    
1015            * R/aobjects.R: Changed package document structure to avoid class
1016            dependency problems.
1017    
1018    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1019    
1020            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1021            data set.
1022    
1023            *  Finished documentation and reordered directory structure. Now "R
1024            CMD check textmin" works without errors.
1025    
1026    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1027    
1028            * src/: Various splits can now be easily created for the
1029            Reuters21578 data set.
1030    
1031    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1032    
1033            *  Updated documentation
1034    
1035    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1036    
1037            *  Wrote R documentation for some classes and methods.
1038    
1039    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1040    
1041            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
1042            files. See the questionnaire data/Umfrage.csv for such an example.
1043            We are now able to import files in Reuters-21578 XML format.
1044    
1045            *  Changed class interfaces in various files. Weighting of the text
1046            matrix is now possible.
1047    
1048    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1049    
1050            * R/textdoccol.R: One can build term-document matrices if
1051            nessecary (with buildTDM(...)) and fill the field tdm from a text
1052            document collection with it.
1053    
1054            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1055    
1056    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1057    
1058            * R/textdoccol.R: We now can read in a whole XML file with several
1059            news items.
1060    
1061  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1062    
1063          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1063

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge