SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 37, Wed Jan 11 17:49:17 2006 UTC pkg/ChangeLog revision 1080, Thu Jun 17 13:47:05 2010 UTC
# Line 1  Line 1 
1    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
4            remove terms not occurring in the corpus anymore.
5    
6    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
9            and Heaps' law.
10    
11    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
14            provided by a source.
15    
16    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
17    
18            * R/source.R (.Source): Provide document names.
19    
20    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
21    
22            * R/meta.R (`content_or_meta`): Utility function.
23    
24    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
25    
26            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
27            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
28    
29    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
30    
31            * R/weight.R (weightTfIdf): Added normalization option.
32    
33            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
34            analysis.
35    
36    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
37    
38            * R/score.R (tm_tag_score): Compute a score from the number of
39            tags matching in a document.
40    
41    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
42    
43            * R/complete.R (stemCompletion): New completion heuristics.
44    
45    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
46    
47            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
48    
49    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
50    
51            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
52            setOldClass(c(..., "list")) works.
53    
54    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
55    
56            * R/transform.R (stemDocument.character): In case input is a
57            simple character just delegate to the default Snowball stemmer.
58    
59    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
60    
61            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
62            data.
63    
64    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
65    
66            * R/doc.R (`Content<-`): Be careful with names attribute.
67    
68    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
69    
70            * R/source.R (DirSource): Improved implementation especially when
71            handling many (> 1M) files.
72    
73    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
74    
75            * R/source.R (getElem.URISource): Use encoding argument.
76    
77    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
78    
79            * R/doc.R (setOldClass): Register S3 document classes to be
80            recognized by S4 methods.
81    
82    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
83    
84            * R/matrix.R (termFreq): Add option to remove punctuation
85            characters.
86    
87    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
88    
89            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
90            merging multiple term-document matrices.
91    
92    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
93    
94            * R/corpus.R (setOldClass): Register S3 corpus classes to be
95            recognized by S4 methods.
96    
97            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
98            that CRAN Mac OS X builds do not fail any longer.
99    
100    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
101    
102            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
103            of RWeka:AlphabeticTokenizer() as default.
104    
105    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
106    
107            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
108            caused words at the beginning or the end of a line not to be removed. Do
109            not delete whitespace anymore.
110    
111    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
112    
113            * R/source.R (DirSource): Default to working directory if no path
114            is specified.
115    
116    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
117    
118            * R/source.R (DirSource): Stop on empty directories.
119    
120    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
121    
122            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
123            named documents.
124    
125    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
126    
127            * R/transform.R (removeWords): Improve regular expressions.
128    
129    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
130    
131            * R/meta.R (DublinCore): Allow lower case tags.
132    
133    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
134    
135            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
136            instead of x$children.
137    
138    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
139    
140            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
141    
142    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/: Use S3 instead of S4 class system.
145    
146    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
147    
148            * R/reader.R (readMail): Moved to tm.plugin.mail package.
149    
150    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
151    
152            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
153            postings are basically e-mails with some extra headers.
154    
155    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
156    
157            * R/transform.R: Move convertMboxEml, removeCitation,
158            removeMultipart, and removeSignature to the tm.plugin.mail package
159            since they are mainly utility functions (for handling e-mails) and
160            not very framework specific.
161    
162    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
163    
164            * man/: Fix documentation.
165    
166    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
169            plain text document instead of an XML document for texts of the
170            Reuters-21578 dataset.
171    
172            * R/sparse.R: Removed since the slam package is now available on
173            CRAN.
174    
175            * DESCRIPTION (Depends): Add slam package.
176    
177    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
178    
179            * R/transform.R (stemDoc): Fix character(0) handling.
180    
181    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
182    
183            * R/doc.R (show): Pretty print.
184    
185    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
186    
187            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
188            gracefully.
189    
190    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
191    
192            * R/corpus.R: Make corpus virtual. Implement corpus with standard
193            and permanent storage semantics.
194    
195            * DESCRIPTION: New major release. A *lot* of improvements.
196    
197    2009-05-04   Ingo Feinerer <feinerer@logic.at>
198    
199            * NAMESPACE: Export some simple_triplet_matrix functions.
200    
201    2009-04-28   Ingo Feinerer <feinerer@logic.at>
202    
203            * R/weight.R: Adapt tf-idf to new matrix format.
204    
205    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
206    
207            * R/matrix.R: Create two distinct classes for term-document and
208            document-term matrices.
209    
210    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
211    
212            * R/termdocmatrix.R: No longer use Matrix package. This reduces
213            package start-up time significantly.
214    
215    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
216    
217            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
218    
219    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
220    
221            * R/transform.R (tmReduce): Combine multiple maps into one
222            transformation.
223    
224    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
225    
226            * R/weight.R: Remove weightLogical since it does not return a
227            dgCMatrix.
228    
229            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
230            or TermDocumentMatrix instead.
231    
232    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
233    
234            * inst/doc/extensions.Rnw: Finished vignette.
235    
236    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
237    
238            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
239            DocumentTermMatrix representations.
240    
241    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
242    
243            * R/reader.R (readXML): New reader for arbitrary XML files.
244    
245    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
246    
247            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
248            (XMLSource): New XMLSource class for arbitrary XML files.
249            (Source): New slot Vectorized.
250    
251    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
252    
253            * R/reader.R (readTabular): Experimental reader for tabular data
254            structures which can be customized via user-defined mappings.
255    
256            * R/reader.R: Always use UTC time zone.
257    
258            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
259    
260    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
261    
262            * R/reader.R (readDOC): Options can be passed over to antiword.
263    
264            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
265            pdftotext.
266    
267    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
268    
269            * R/source.R (DirSource): Add pattern and ignore.case arguments
270            which are internally passed over to list.files().
271    
272    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
273    
274            * inst/doc/tm.Rnw: Suppress pointless loading message.
275    
276    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
277    
278            * DESCRIPTION: Speed up package loading (via moving packages not
279            strictly necessary for normal operation to Suggests instead of
280            Depends).
281    
282    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
283    
284            * R/reader.R (readNewsgroup): The date format is now configurable.
285    
286    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
287    
288            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
289    
290    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
291    
292            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
293    
294    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
295    
296            * R/source.R (DataframeSource): New source class for data frames.
297    
298            * R/source.R: Fixed non-standard call evaluation.
299    
300    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
301    
302            * R/source.R (URISource): New source class for a single document.
303    
304    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
305    
306            * R/source.R: Refactoring.
307    
308    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
309    
310            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
311            Rmpi installations more gracefully.
312    
313    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
314    
315            * R/source.R (Source): Add Length slot.
316    
317    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
318    
319            * R/AAA.R: Unify duplicated .onLoad function.
320    
321    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
322    
323            * DESCRIPTION (Suggests): Added Rmpi.
324    
325    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
326    
327            * R/source.R (getElem): Fix 'no visible binding' warning.
328    
329            * man/WeightFunction.Rd: Fix signature.
330    
331    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
332    
333            * R/weight.R: Introduce name abbreviations for weighting functions.
334    
335    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
336    
337            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
338    
339            * R/cluster.R: Provide convenience functions for using a MPI
340            cluster.
341    
342            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
343            available.
344    
345            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
346            available.
347    
348    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
349    
350            * R/textdoccol.R (lapply): Removed debug print out.
351    
352    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
353    
354            * R/reader.R (readRCV1): Improved meta data extraction from
355            Reuters Corpus Volume 1 documents.
356    
357    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
358    
359            * R/transform.R: Ensure that all mappings preserve multiline
360            structures.
361    
362    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
363    
364            * R/filter.R: Every filter has now an attribute indicating whether
365            it sould be applied to document level (doclevel).
366    
367            * R/textdoccol.R (tmFilter): Set searchFullText as new default
368            filter.
369    
370    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
371    
372            * R/transform.R (replacePatterns): Replaced removeWords by
373            replacePatterns. Suggested by Christian Buchta.
374    
375            * R/textdoccol.R (inspect): Improved formatting.
376    
377    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
378    
379            * inst/CITATION: Updated JSS article information.
380    
381            * R/textdoccol.R (setAs): Added coerce method from list to
382            corpus.
383    
384            * R/meta.R (meta): Improved meta data handling.
385    
386    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
387    
388            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
389            Christian Buchta.
390    
391            * inst/CITATION: Added template to include JSS article reference.
392    
393    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
394    
395            * R/textdoccol.R (tmMap): Introduced lazy mapping.
396    
397            * R/source.R: Added VectorSource.
398    
399    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
400    
401            * man/: Language codes should be in ISO 639-1 format.
402    
403            * R/textdoccol.R (asPlain): Preserve local meta data.
404    
405    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
406    
407            * R/textdoccol.R (writeCorpus): Function for writing a corpus
408            containing plain text documents to disk.
409    
410    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
411    
412            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
413            always set correctly.
414    
415            * R/textdoccol.R: Set load = TRUE as default for load on demand
416            since in most cases this is the wanted behaviour.
417    
418    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
419    
420            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
421    
422            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
423    
424    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
425    
426            * R/meta.R (meta): New function for consistent access to meta data
427            of document collections, repositories, and texts.
428    
429    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
430    
431            * R/: Better support for encodings.
432    
433    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
434    
435            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
436            selection when no reader argument is given.
437    
438    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
439    
440            * R/source.R (CSVSource): Now uses read.csv instead of scan
441            internally.
442    
443    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
444    
445            * R/reader.R (getReaders): Returns available reader functions.
446    
447            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
448            as default.
449    
450    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
451    
452            * R/stopwords.R (stopwords): Shortened code, removed codetools
453            variable warnings.
454    
455            * man/: Documentation for showMeta, added an example for tmMap.
456    
457            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
458            some minor typos fixed.
459    
460    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
461    
462            * R/aobjects.R (showMeta): Added method for pretty printing a
463            text document's meta data.
464    
465    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
466    
467            * R/textdoccol.R (TextDocCol): Better handling of empty
468            arguments.
469    
470            * NAMESPACE: Exported readDOC.
471    
472            * man/completeStems.Rd: Added an example.
473    
474    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
475    
476            * R/stopwords.R (stopwords): Look up .dat files at every
477            call. Allows users to modify stopword .dat files interactively.
478    
479    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
480    
481            * R/termdocmatrix.R (termFreq): Correct processing of empty
482            documents.
483    
484    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
485    
486            * man/: Updated documentation.
487    
488    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
489    
490            * R/complete.R (completeStems): Completes (heuristically) word
491            stems.
492    
493            * R/termdocmatrix.R (TermDocMatrix2): New modular
494            constructor.
495    
496            * NAMESPACE: Exported termFreq.
497    
498    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * R/reader.R (readDOC): Added MS Word reader (using antiword).
501    
502    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
503    
504            * R/weight.R: Weighting functions for TermDocMatrix.
505    
506    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
509            functions for accessing dimension, column, and row names.
510    
511            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
512    
513    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
514    
515            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
516    
517    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
520    
521    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/reader.R (readPDF): Removed manual checks for pdftotext and
524            pdfinfo. The system call gives a warning anyway.
525    
526    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/textdoccol.R (asPlain): Conversion from
529            StructuredTextDocuments to PlainTextDocuments.
530    
531    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
532    
533            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
534            for accessing term-document matrices.
535    
536            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
537            are installed.
538    
539    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
540    
541            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
542            Christian Buchta.
543    
544    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
545    
546            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
547    
548    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
549    
550            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
551    
552            * R/reader.R (readPDF): Added PDF reader.
553    
554    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
557    
558            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
559    
560            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
561    
562            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
563    
564    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
565    
566            * R/distmeasure.R (dissimilarity): Replaced dists call from
567            package cba by new dist call from package proxy.
568    
569    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
570    
571            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
572    
573    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
574    
575            * R/termdocmatrix.R: require() uses the quietly option to suppress
576            loading messages.
577    
578    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
579    
580            * R/dictionary.R: Added dictionary support.
581    
582    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
583    
584            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
585            documents. This simplifies some functions, e.g., asPlain.
586    
587    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
588    
589            * inst/doc/tm.Rnw: Fixed some typos in vignette.
590    
591    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
592    
593            * R/textdoccol.R (replaceWords): Added method to replace a set of
594            words by a single word. Useful for synonyms.
595    
596    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
597    
598            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
599    
600    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
601    
602            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
603            vectors. Thanks to Ariel Maguyon for his error report.
604            (removeSparseTerms): New function to remove columns from a
605            term-document matrix exceeding a sparse factor.
606    
607    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
608    
609            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
610    
611    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
612    
613            * man/sFilter.Rd: Corrected documentation on statement format (use
614            '==' instead of '=').
615    
616    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
617    
618            * R/aobjects.R (StructuredTextDocument): Inherits from
619            TextDocument.
620    
621    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
622    
623            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
624            on sparse matrices as proposed by Martin Maechler.
625    
626    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
627    
628            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
629            \pkg{filehash} version makes them deprecated.
630    
631    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
632    
633            * R/termdocmatrix.R (textvector): Stemming is now performed before
634            erasing stopwords.
635            (weightMatrix): Adapted to handle sparse matrices.
636            (TermDocMatrix): Sparse matrix is now efficiently built by
637            direct stepwise insertion of row values into it.
638    
639    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
640    
641            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
642            due to ongoing problems. For our purposes the latter is as useful
643            as the replaced package.
644    
645    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
646    
647            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
648    
649            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
650    
651    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
652    
653            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
654            languages with available stopwords.
655    
656    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
657    
658            * inst/doc/tm.Rnw: Minor corrections in the vignette.
659    
660    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
661    
662            * DESCRIPTION: Update to version 0.2, since a lot of new features
663            have been integrated.
664    
665            * inst/stopwords: Updated existing stopwords and added stopwords
666            for various other languages.
667    
668    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
669    
670            * man/: Updated documentation.
671    
672            * Work/testDb.R: Script to test database stuff.
673    
674            * R/: Fixed various database related bugs. Seems to be rather
675            useable now, i.e., consider as alpha status for now.
676    
677    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
678    
679            * R/: Fixed some bugs related to database support.
680    
681    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
682    
683            * man/: Added a lot of examples to the manuals.
684    
685    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
686    
687            * man/: Updated parts of the documentation.
688    
689            * R/textdoccol.R (asPlain): Added conversion from newsgroup
690            documents to plain text documents.
691    
692    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
693    
694            * R/textdoccol.R: Finished experimental database support. Not yet
695            intensively tested.
696    
697            * R/source.R: Now each source has a default reader.
698    
699            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
700            class anymore.
701    
702            * R/plaintextdoc.R: Custom show method for plain text documents.
703    
704            * R/aobjects.R: Added a class for structured text documents.
705    
706            * R/reader.R: Replaced remaining \code{parser} occurrences with
707            \code{reader}.
708    
709            * R/textdoccol.R (summary): Indent tags.
710    
711            * R/textdoccol.R (removePunctuation): Transform method to remove
712            punctuation marks.
713    
714    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
715    
716            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
717            using prescindMeta().
718    
719    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
720    
721            * R/textdoccol.R: Improved database support.
722    
723    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
724    
725            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
726    
727            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
728            language code.
729    
730            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
731            into parserControl argument.
732    
733            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
734    
735    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
736    
737            * Work/tmDataSetup.R: The datasets acq and crude can now be
738            created on the fly.
739    
740            * R/stopwords.R: Introduced a function returning the stopwords for
741            a given language (English, German and French at the moment)
742    
743            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
744            otherwise falls back to Snowball package.
745    
746    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
747    
748            * man/dissimilarity-methods.Rd: Make clear that any method offered
749            by "dists" from package "cba" can be used.
750    
751    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
752    
753            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
754            to Kurt's latex suggestion. Removed points and underscores in
755            variable names for consistent naming.
756    
757            * DESCRIPTION: Update to version 0.1-2.
758    
759            * man/TextRepository.Rd: Fixed bug in documentation.
760    
761    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * DESCRIPTION: Update to version 0.1-1.
764    
765    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
766    
767            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
768            wordStem.
769    
770    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
771    
772            * R/: Changes due to Kurt's review.
773    
774    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
775    
776            * R/: Implemented improvements based upon comments by David
777            Meyer.
778    
779    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
780    
781            * inst/doc/: Rewrote vignette.
782    
783            * man/: Improved documentation.
784    
785    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
786    
787            * man/: Updated documentation.
788    
789            * DESCRIPTION: Changed package name to "tm". Updated version to
790            0.1 for first CRAN release.
791    
792            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
793            list archive example.
794    
795            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
796            archive example.
797    
798            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
799            from (several mails per box) mbox format to (single mail per file)
800            eml format.
801    
802    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
803    
804            * data/crude.rda: Rebuilt.
805    
806            * data/acq.rda: Rebuilt.
807    
808            * R/reader.R: Factored out reader and parser methods from
809            textdoccol.R.
810    
811            * R/source.R: Factored out Source methods from aobjects.R and
812            textdoccol.R.
813            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
814            feeds.
815    
816            * R/textdoccol.R (DirSource): Added support for recursive
817            traversal of directories.
818    
819    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
820    
821            * R/textdoccol.R ([[): Loads the document corpus automatically
822            into memory upon access.
823            (tm_transform, tm_filter): Removed several checks whether the
824            document is already loaded ([[ ensures this now).
825            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
826            mailing list archive.
827    
828    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
829    
830            * R/aobjects.R (TextDocument): Is now a virtual class.
831            (Source): Is now a virtual class.
832    
833    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
834    
835            * R/textdoccol.R (c): Support for an arbitrary number of document
836            collections.
837    
838    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
839    
840            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
841            append_meta and remove_meta.
842    
843            * R/textdoccol.R: Removed modify_metadata method.
844    
845            * R/textrepo.R: Removed modify_metadata method.
846    
847            * R/textdoccol.R (remove_meta): Supports removal of document
848            collection metadata and document (= in data frame) metadata.
849    
850    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
851    
852            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
853    
854            * data/crude.rda: Rebuilt.
855    
856            * data/acq.rda: Rebuilt.
857    
858            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
859    
860            * R/textdoccol.R ([): Bug fix for subsetting a document
861            collection's data frame.
862    
863    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
864    
865            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
866            to s_filter.
867    
868            * R/textdoccol.R: Local text documents' metadata can now be copied
869            to a document collection's data frame with prescind_meta.
870    
871    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
872    
873            * R/: Text documents' slot metadata is now accessible in s_filter.
874    
875            * R/: Rewrote s_filter function (has still some restrictions).
876    
877    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
878    
879            * R/: Various fixes in handling metadata.
880    
881            * R/: Added update mechanism for text document collections.
882    
883    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
884    
885            * R/: Merging of document collections now creates a binary tree
886            for reconstructing merged document collections.
887    
888            * R/: Redesign of metadata for document collections.
889    
890    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
891    
892            * R/: Messages now use \code{ngettext}.
893    
894    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
895    
896            * R/: Added functions for modifying and removing metadata.
897    
898    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
899    
900            * man/: Updated some documentation.
901    
902            * R/: Corrected some connection issues.
903    
904            * inst/doc: Worked on the vignette.
905    
906    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
907    
908            * inst/: Added texts and started vignette.
909    
910            * R/: Final changes based upon David's comments.
911    
912    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
913    
914            * NAMESPACE: Corrected exports (generic methods need exportMethods
915            directives!).
916    
917    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
918    
919            * R/: Modified the TextDocCol constructur and various parsers. It
920            is now modular and supports various file formats via plugins (see
921            the new "Source" class).
922    
923    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
924    
925            * man/: Revised documentation after previous code changes.
926    
927    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
928    
929            * R/: Remaining changes as discussed with David.
930    
931    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
932    
933            * R/: Some changes as suggested by David. The rest will follow
934            within the next days.
935    
936    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
937    
938            * man/: Finished documentation.
939    
940    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
941    
942            * man/: Wrote some documentation.
943    
944    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
945    
946            * R/: Further syntactic sugar in form of additional assignment and
947            accessor methods.
948    
949    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
950    
951            * R/: Syntactic sugar in form of "length", "show" and "summary"
952            operators.
953    
954    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
955    
956            * R/: Diverse updates. Mainly on default operators ("[" or "c")
957            and dissimilarities.
958    
959    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
960    
961            * R/: Added similarity functions.
962    
963            * data/: Added english stopwords.
964    
965    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
966    
967            * data/: Examples compiled for new features
968    
969            * R/: Changes due to new structure.
970    
971            * NAMESPACE: Corrected namespace to reflect new structure.
972    
973            * R/termdocmatrix.R: Adapted for new naming scheme.
974    
975    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
976    
977            * R/textdoccol.R: Adapted code for new class structure. Wrote
978            several transform and filter functions operating on text document
979            collections (alias text document databases).
980    
981            * R/aobjects.R: Adapted class structure with inheritance,
982            repositories and additional meta data. Loading files on demand is
983            now possible.
984    
985    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
986    
987            * R/: Some cosmetic cleanups.
988    
989            * inst/: Removed vignette on clustering. That and much more is now
990            described in the JSS paper on text mining. Based upon that
991            article an elaborated vignette will be incorporated in the future.
992    
993    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
994    
995            * R/: Updated generic S4 methods to comply with signature changes
996            in newer versions of R (> 2.3)
997    
998    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
999    
1000            * ext/R/importRIS.R: Automatic RIS import is now possible.
1001    
1002    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1003    
1004            * R/textdoccol.R: Added RIS HTML input format.
1005    
1006    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1007    
1008            * R/textdoccol.R: Removed bug that caused invalid text document
1009            collections when handling many input files.
1010    
1011  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1012    
1013          * R/textdoccol.R: Restructured and extended file import          * R/textdoccol.R: Restructured and extended file import

Legend:
Removed from v.37  
changed lines
  Added in v.1080

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge