SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 37, Wed Jan 11 17:49:17 2006 UTC pkg/ChangeLog revision 1061, Fri Mar 19 11:41:37 2010 UTC
# Line 1  Line 1 
1    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
4            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
5    
6    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/weight.R (weightTfIdf): Added normalization option.
9    
10            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
11            analysis.
12    
13    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
14    
15            * R/score.R (tm_tag_score): Compute a score from the number of
16            tags matching in a document.
17    
18    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
19    
20            * R/complete.R (stemCompletion): New completion heuristics.
21    
22    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
23    
24            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
25    
26    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
27    
28            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
29            setOldClass(c(..., "list")) works.
30    
31    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
32    
33            * R/transform.R (stemDocument.character): In case input is a
34            simple character just delegate to the default Snowball stemmer.
35    
36    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
37    
38            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
39            data.
40    
41    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
42    
43            * R/doc.R (`Content<-`): Be careful with names attribute.
44    
45    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
46    
47            * R/source.R (DirSource): Improved implementation especially when
48            handling many (> 1M) files.
49    
50    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
51    
52            * R/source.R (getElem.URISource): Use encoding argument.
53    
54    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
55    
56            * R/doc.R (setOldClass): Register S3 document classes to be
57            recognized by S4 methods.
58    
59    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
60    
61            * R/matrix.R (termFreq): Add option to remove punctuation
62            characters.
63    
64    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
65    
66            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
67            merging multiple term-document matrices.
68    
69    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
70    
71            * R/corpus.R (setOldClass): Register S3 corpus classes to be
72            recognized by S4 methods.
73    
74            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
75            that CRAN Mac OS X builds do not fail any longer.
76    
77    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
78    
79            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
80            of RWeka:AlphabeticTokenizer() as default.
81    
82    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
83    
84            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
85            caused words at the beginning or the end of a line not to be removed. Do
86            not delete whitespace anymore.
87    
88    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/source.R (DirSource): Default to working directory if no path
91            is specified.
92    
93    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
94    
95            * R/source.R (DirSource): Stop on empty directories.
96    
97    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
100            named documents.
101    
102    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/transform.R (removeWords): Improve regular expressions.
105    
106    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
107    
108            * R/meta.R (DublinCore): Allow lower case tags.
109    
110    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
111    
112            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
113            instead of x$children.
114    
115    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
116    
117            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
118    
119    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
120    
121            * R/: Use S3 instead of S4 class system.
122    
123    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
124    
125            * R/reader.R (readMail): Moved to tm.plugin.mail package.
126    
127    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
128    
129            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
130            postings are basically e-mails with some extra headers.
131    
132    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
133    
134            * R/transform.R: Move convertMboxEml, removeCitation,
135            removeMultipart, and removeSignature to the tm.plugin.mail package
136            since they are mainly utility functions (for handling e-mails) and
137            not very framework specific.
138    
139    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
140    
141            * man/: Fix documentation.
142    
143    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
144    
145            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
146            plain text document instead of an XML document for texts of the
147            Reuters-21578 dataset.
148    
149            * R/sparse.R: Removed since the slam package is now available on
150            CRAN.
151    
152            * DESCRIPTION (Depends): Add slam package.
153    
154    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
155    
156            * R/transform.R (stemDoc): Fix character(0) handling.
157    
158    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
159    
160            * R/doc.R (show): Pretty print.
161    
162    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
163    
164            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
165            gracefully.
166    
167    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
168    
169            * R/corpus.R: Make corpus virtual. Implement corpus with standard
170            and permanent storage semantics.
171    
172            * DESCRIPTION: New major release. A *lot* of improvements.
173    
174    2009-05-04   Ingo Feinerer <feinerer@logic.at>
175    
176            * NAMESPACE: Export some simple_triplet_matrix functions.
177    
178    2009-04-28   Ingo Feinerer <feinerer@logic.at>
179    
180            * R/weight.R: Adapt tf-idf to new matrix format.
181    
182    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
183    
184            * R/matrix.R: Create two distinct classes for term-document and
185            document-term matrices.
186    
187    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
188    
189            * R/termdocmatrix.R: No longer use Matrix package. This reduces
190            package start-up time significantly.
191    
192    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
193    
194            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
195    
196    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
197    
198            * R/transform.R (tmReduce): Combine multiple maps into one
199            transformation.
200    
201    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
202    
203            * R/weight.R: Remove weightLogical since it does not return a
204            dgCMatrix.
205    
206            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
207            or TermDocumentMatrix instead.
208    
209    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
210    
211            * inst/doc/extensions.Rnw: Finished vignette.
212    
213    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
214    
215            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
216            DocumentTermMatrix representations.
217    
218    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
219    
220            * R/reader.R (readXML): New reader for arbitrary XML files.
221    
222    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
223    
224            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
225            (XMLSource): New XMLSource class for arbitrary XML files.
226            (Source): New slot Vectorized.
227    
228    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
229    
230            * R/reader.R (readTabular): Experimental reader for tabular data
231            structures which can be customized via user-defined mappings.
232    
233            * R/reader.R: Always use UTC time zone.
234    
235            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
236    
237    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
238    
239            * R/reader.R (readDOC): Options can be passed over to antiword.
240    
241            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
242            pdftotext.
243    
244    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
245    
246            * R/source.R (DirSource): Add pattern and ignore.case arguments
247            which are internally passed over to list.files().
248    
249    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
250    
251            * inst/doc/tm.Rnw: Suppress pointless loading message.
252    
253    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
254    
255            * DESCRIPTION: Speed up package loading (via moving packages not
256            strictly necessary for normal operation to Suggests instead of
257            Depends).
258    
259    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
260    
261            * R/reader.R (readNewsgroup): The date format is now configurable.
262    
263    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
264    
265            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
266    
267    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
268    
269            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
270    
271    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
272    
273            * R/source.R (DataframeSource): New source class for data frames.
274    
275            * R/source.R: Fixed non-standard call evaluation.
276    
277    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
278    
279            * R/source.R (URISource): New source class for a single document.
280    
281    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
282    
283            * R/source.R: Refactoring.
284    
285    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
286    
287            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
288            Rmpi installations more gracefully.
289    
290    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
291    
292            * R/source.R (Source): Add Length slot.
293    
294    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
295    
296            * R/AAA.R: Unify duplicated .onLoad function.
297    
298    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
299    
300            * DESCRIPTION (Suggests): Added Rmpi.
301    
302    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
303    
304            * R/source.R (getElem): Fix 'no visible binding' warning.
305    
306            * man/WeightFunction.Rd: Fix signature.
307    
308    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
309    
310            * R/weight.R: Introduce name abbreviations for weighting functions.
311    
312    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
313    
314            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
315    
316            * R/cluster.R: Provide convenience functions for using a MPI
317            cluster.
318    
319            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
320            available.
321    
322            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
323            available.
324    
325    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
326    
327            * R/textdoccol.R (lapply): Removed debug print out.
328    
329    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
330    
331            * R/reader.R (readRCV1): Improved meta data extraction from
332            Reuters Corpus Volume 1 documents.
333    
334    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
335    
336            * R/transform.R: Ensure that all mappings preserve multiline
337            structures.
338    
339    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
340    
341            * R/filter.R: Every filter has now an attribute indicating whether
342            it sould be applied to document level (doclevel).
343    
344            * R/textdoccol.R (tmFilter): Set searchFullText as new default
345            filter.
346    
347    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
348    
349            * R/transform.R (replacePatterns): Replaced removeWords by
350            replacePatterns. Suggested by Christian Buchta.
351    
352            * R/textdoccol.R (inspect): Improved formatting.
353    
354    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
355    
356            * inst/CITATION: Updated JSS article information.
357    
358            * R/textdoccol.R (setAs): Added coerce method from list to
359            corpus.
360    
361            * R/meta.R (meta): Improved meta data handling.
362    
363    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
364    
365            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
366            Christian Buchta.
367    
368            * inst/CITATION: Added template to include JSS article reference.
369    
370    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
371    
372            * R/textdoccol.R (tmMap): Introduced lazy mapping.
373    
374            * R/source.R: Added VectorSource.
375    
376    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
377    
378            * man/: Language codes should be in ISO 639-1 format.
379    
380            * R/textdoccol.R (asPlain): Preserve local meta data.
381    
382    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
383    
384            * R/textdoccol.R (writeCorpus): Function for writing a corpus
385            containing plain text documents to disk.
386    
387    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
388    
389            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
390            always set correctly.
391    
392            * R/textdoccol.R: Set load = TRUE as default for load on demand
393            since in most cases this is the wanted behaviour.
394    
395    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
396    
397            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
398    
399            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
400    
401    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * R/meta.R (meta): New function for consistent access to meta data
404            of document collections, repositories, and texts.
405    
406    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
407    
408            * R/: Better support for encodings.
409    
410    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
411    
412            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
413            selection when no reader argument is given.
414    
415    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
416    
417            * R/source.R (CSVSource): Now uses read.csv instead of scan
418            internally.
419    
420    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
421    
422            * R/reader.R (getReaders): Returns available reader functions.
423    
424            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
425            as default.
426    
427    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
428    
429            * R/stopwords.R (stopwords): Shortened code, removed codetools
430            variable warnings.
431    
432            * man/: Documentation for showMeta, added an example for tmMap.
433    
434            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
435            some minor typos fixed.
436    
437    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
438    
439            * R/aobjects.R (showMeta): Added method for pretty printing a
440            text document's meta data.
441    
442    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
443    
444            * R/textdoccol.R (TextDocCol): Better handling of empty
445            arguments.
446    
447            * NAMESPACE: Exported readDOC.
448    
449            * man/completeStems.Rd: Added an example.
450    
451    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
452    
453            * R/stopwords.R (stopwords): Look up .dat files at every
454            call. Allows users to modify stopword .dat files interactively.
455    
456    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
457    
458            * R/termdocmatrix.R (termFreq): Correct processing of empty
459            documents.
460    
461    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
462    
463            * man/: Updated documentation.
464    
465    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
466    
467            * R/complete.R (completeStems): Completes (heuristically) word
468            stems.
469    
470            * R/termdocmatrix.R (TermDocMatrix2): New modular
471            constructor.
472    
473            * NAMESPACE: Exported termFreq.
474    
475    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
476    
477            * R/reader.R (readDOC): Added MS Word reader (using antiword).
478    
479    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
480    
481            * R/weight.R: Weighting functions for TermDocMatrix.
482    
483    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
484    
485            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
486            functions for accessing dimension, column, and row names.
487    
488            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
489    
490    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
491    
492            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
493    
494    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
497    
498    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * R/reader.R (readPDF): Removed manual checks for pdftotext and
501            pdfinfo. The system call gives a warning anyway.
502    
503    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
504    
505            * R/textdoccol.R (asPlain): Conversion from
506            StructuredTextDocuments to PlainTextDocuments.
507    
508    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
511            for accessing term-document matrices.
512    
513            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
514            are installed.
515    
516    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
519            Christian Buchta.
520    
521    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
524    
525    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
526    
527            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
528    
529            * R/reader.R (readPDF): Added PDF reader.
530    
531    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
532    
533            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
534    
535            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
536    
537            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
538    
539            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
540    
541    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
542    
543            * R/distmeasure.R (dissimilarity): Replaced dists call from
544            package cba by new dist call from package proxy.
545    
546    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
547    
548            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
549    
550    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
551    
552            * R/termdocmatrix.R: require() uses the quietly option to suppress
553            loading messages.
554    
555    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
556    
557            * R/dictionary.R: Added dictionary support.
558    
559    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
560    
561            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
562            documents. This simplifies some functions, e.g., asPlain.
563    
564    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
565    
566            * inst/doc/tm.Rnw: Fixed some typos in vignette.
567    
568    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/textdoccol.R (replaceWords): Added method to replace a set of
571            words by a single word. Useful for synonyms.
572    
573    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
576    
577    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
578    
579            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
580            vectors. Thanks to Ariel Maguyon for his error report.
581            (removeSparseTerms): New function to remove columns from a
582            term-document matrix exceeding a sparse factor.
583    
584    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
585    
586            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
587    
588    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
589    
590            * man/sFilter.Rd: Corrected documentation on statement format (use
591            '==' instead of '=').
592    
593    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
594    
595            * R/aobjects.R (StructuredTextDocument): Inherits from
596            TextDocument.
597    
598    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
599    
600            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
601            on sparse matrices as proposed by Martin Maechler.
602    
603    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
604    
605            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
606            \pkg{filehash} version makes them deprecated.
607    
608    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
609    
610            * R/termdocmatrix.R (textvector): Stemming is now performed before
611            erasing stopwords.
612            (weightMatrix): Adapted to handle sparse matrices.
613            (TermDocMatrix): Sparse matrix is now efficiently built by
614            direct stepwise insertion of row values into it.
615    
616    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
617    
618            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
619            due to ongoing problems. For our purposes the latter is as useful
620            as the replaced package.
621    
622    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
623    
624            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
625    
626            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
627    
628    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
629    
630            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
631            languages with available stopwords.
632    
633    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
634    
635            * inst/doc/tm.Rnw: Minor corrections in the vignette.
636    
637    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
638    
639            * DESCRIPTION: Update to version 0.2, since a lot of new features
640            have been integrated.
641    
642            * inst/stopwords: Updated existing stopwords and added stopwords
643            for various other languages.
644    
645    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
646    
647            * man/: Updated documentation.
648    
649            * Work/testDb.R: Script to test database stuff.
650    
651            * R/: Fixed various database related bugs. Seems to be rather
652            useable now, i.e., consider as alpha status for now.
653    
654    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
655    
656            * R/: Fixed some bugs related to database support.
657    
658    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
659    
660            * man/: Added a lot of examples to the manuals.
661    
662    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
663    
664            * man/: Updated parts of the documentation.
665    
666            * R/textdoccol.R (asPlain): Added conversion from newsgroup
667            documents to plain text documents.
668    
669    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
670    
671            * R/textdoccol.R: Finished experimental database support. Not yet
672            intensively tested.
673    
674            * R/source.R: Now each source has a default reader.
675    
676            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
677            class anymore.
678    
679            * R/plaintextdoc.R: Custom show method for plain text documents.
680    
681            * R/aobjects.R: Added a class for structured text documents.
682    
683            * R/reader.R: Replaced remaining \code{parser} occurrences with
684            \code{reader}.
685    
686            * R/textdoccol.R (summary): Indent tags.
687    
688            * R/textdoccol.R (removePunctuation): Transform method to remove
689            punctuation marks.
690    
691    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
692    
693            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
694            using prescindMeta().
695    
696    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
697    
698            * R/textdoccol.R: Improved database support.
699    
700    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
701    
702            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
703    
704            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
705            language code.
706    
707            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
708            into parserControl argument.
709    
710            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
711    
712    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
713    
714            * Work/tmDataSetup.R: The datasets acq and crude can now be
715            created on the fly.
716    
717            * R/stopwords.R: Introduced a function returning the stopwords for
718            a given language (English, German and French at the moment)
719    
720            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
721            otherwise falls back to Snowball package.
722    
723    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
724    
725            * man/dissimilarity-methods.Rd: Make clear that any method offered
726            by "dists" from package "cba" can be used.
727    
728    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
729    
730            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
731            to Kurt's latex suggestion. Removed points and underscores in
732            variable names for consistent naming.
733    
734            * DESCRIPTION: Update to version 0.1-2.
735    
736            * man/TextRepository.Rd: Fixed bug in documentation.
737    
738    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
739    
740            * DESCRIPTION: Update to version 0.1-1.
741    
742    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
743    
744            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
745            wordStem.
746    
747    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
748    
749            * R/: Changes due to Kurt's review.
750    
751    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
752    
753            * R/: Implemented improvements based upon comments by David
754            Meyer.
755    
756    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
757    
758            * inst/doc/: Rewrote vignette.
759    
760            * man/: Improved documentation.
761    
762    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
763    
764            * man/: Updated documentation.
765    
766            * DESCRIPTION: Changed package name to "tm". Updated version to
767            0.1 for first CRAN release.
768    
769            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
770            list archive example.
771    
772            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
773            archive example.
774    
775            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
776            from (several mails per box) mbox format to (single mail per file)
777            eml format.
778    
779    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
780    
781            * data/crude.rda: Rebuilt.
782    
783            * data/acq.rda: Rebuilt.
784    
785            * R/reader.R: Factored out reader and parser methods from
786            textdoccol.R.
787    
788            * R/source.R: Factored out Source methods from aobjects.R and
789            textdoccol.R.
790            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
791            feeds.
792    
793            * R/textdoccol.R (DirSource): Added support for recursive
794            traversal of directories.
795    
796    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
797    
798            * R/textdoccol.R ([[): Loads the document corpus automatically
799            into memory upon access.
800            (tm_transform, tm_filter): Removed several checks whether the
801            document is already loaded ([[ ensures this now).
802            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
803            mailing list archive.
804    
805    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
806    
807            * R/aobjects.R (TextDocument): Is now a virtual class.
808            (Source): Is now a virtual class.
809    
810    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
811    
812            * R/textdoccol.R (c): Support for an arbitrary number of document
813            collections.
814    
815    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
816    
817            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
818            append_meta and remove_meta.
819    
820            * R/textdoccol.R: Removed modify_metadata method.
821    
822            * R/textrepo.R: Removed modify_metadata method.
823    
824            * R/textdoccol.R (remove_meta): Supports removal of document
825            collection metadata and document (= in data frame) metadata.
826    
827    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
828    
829            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
830    
831            * data/crude.rda: Rebuilt.
832    
833            * data/acq.rda: Rebuilt.
834    
835            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
836    
837            * R/textdoccol.R ([): Bug fix for subsetting a document
838            collection's data frame.
839    
840    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
841    
842            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
843            to s_filter.
844    
845            * R/textdoccol.R: Local text documents' metadata can now be copied
846            to a document collection's data frame with prescind_meta.
847    
848    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
849    
850            * R/: Text documents' slot metadata is now accessible in s_filter.
851    
852            * R/: Rewrote s_filter function (has still some restrictions).
853    
854    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
855    
856            * R/: Various fixes in handling metadata.
857    
858            * R/: Added update mechanism for text document collections.
859    
860    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
861    
862            * R/: Merging of document collections now creates a binary tree
863            for reconstructing merged document collections.
864    
865            * R/: Redesign of metadata for document collections.
866    
867    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * R/: Messages now use \code{ngettext}.
870    
871    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
872    
873            * R/: Added functions for modifying and removing metadata.
874    
875    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
876    
877            * man/: Updated some documentation.
878    
879            * R/: Corrected some connection issues.
880    
881            * inst/doc: Worked on the vignette.
882    
883    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
884    
885            * inst/: Added texts and started vignette.
886    
887            * R/: Final changes based upon David's comments.
888    
889    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
890    
891            * NAMESPACE: Corrected exports (generic methods need exportMethods
892            directives!).
893    
894    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
895    
896            * R/: Modified the TextDocCol constructur and various parsers. It
897            is now modular and supports various file formats via plugins (see
898            the new "Source" class).
899    
900    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
901    
902            * man/: Revised documentation after previous code changes.
903    
904    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
905    
906            * R/: Remaining changes as discussed with David.
907    
908    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
909    
910            * R/: Some changes as suggested by David. The rest will follow
911            within the next days.
912    
913    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
914    
915            * man/: Finished documentation.
916    
917    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
918    
919            * man/: Wrote some documentation.
920    
921    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
922    
923            * R/: Further syntactic sugar in form of additional assignment and
924            accessor methods.
925    
926    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
927    
928            * R/: Syntactic sugar in form of "length", "show" and "summary"
929            operators.
930    
931    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
932    
933            * R/: Diverse updates. Mainly on default operators ("[" or "c")
934            and dissimilarities.
935    
936    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
937    
938            * R/: Added similarity functions.
939    
940            * data/: Added english stopwords.
941    
942    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
943    
944            * data/: Examples compiled for new features
945    
946            * R/: Changes due to new structure.
947    
948            * NAMESPACE: Corrected namespace to reflect new structure.
949    
950            * R/termdocmatrix.R: Adapted for new naming scheme.
951    
952    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
953    
954            * R/textdoccol.R: Adapted code for new class structure. Wrote
955            several transform and filter functions operating on text document
956            collections (alias text document databases).
957    
958            * R/aobjects.R: Adapted class structure with inheritance,
959            repositories and additional meta data. Loading files on demand is
960            now possible.
961    
962    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
963    
964            * R/: Some cosmetic cleanups.
965    
966            * inst/: Removed vignette on clustering. That and much more is now
967            described in the JSS paper on text mining. Based upon that
968            article an elaborated vignette will be incorporated in the future.
969    
970    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
971    
972            * R/: Updated generic S4 methods to comply with signature changes
973            in newer versions of R (> 2.3)
974    
975    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
976    
977            * ext/R/importRIS.R: Automatic RIS import is now possible.
978    
979    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
980    
981            * R/textdoccol.R: Added RIS HTML input format.
982    
983    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * R/textdoccol.R: Removed bug that caused invalid text document
986            collections when handling many input files.
987    
988  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
989    
990          * R/textdoccol.R: Restructured and extended file import          * R/textdoccol.R: Restructured and extended file import

Legend:
Removed from v.37  
changed lines
  Added in v.1061

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge