SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1166, Sat Dec 17 10:32:05 2011 UTC
# Line 1  Line 1 
1    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
2    
3            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
4    
5    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
6    
7            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
8    
9    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
10    
11            * R/transform.R (removePunctuation): Add option to preserve
12            intra-word dashes.
13    
14    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
15    
16            * R/matrix.R (termFreq): Allow reordering of control option
17            processing.
18    
19    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
20    
21            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
22            pdfinfo tool.
23    
24            * inst/stopwords/SMART.dat: Add SMART information retrieval system
25            stopwords (which are also used by the MC toolkit).
26    
27            * R/matrix (termFreq): Allow local option \code{bounds$local} to
28            restrict how often a term may appear in each document (generalizes
29            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
30            for word length bounds (generalizes \code{minWordLength}).
31    
32            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
33            \code{bounds$global} for restricting how often a term is allowed
34            to appear in different documents.
35    
36            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
37            local options delegated internally to termFreq() and global
38            options which are processed by the term-document matrix
39            constructor itself.
40    
41    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
42    
43            * man/getTokenizers.Rd: Document getTokenizers().
44    
45            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
46    
47    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
48    
49            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
50    
51            * man/combine.Rd: Document c.term_frequency().
52    
53    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
54    
55            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
56            can be accessed via '[' and not '[['.
57    
58    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
59    
60            * R/stopwords.R (stopwords): Raise an error if no stopwords are
61            available for requested language. Suggested by Derek M Jones.
62    
63    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
66            normalization.
67    
68    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
69    
70            * R/transform.R (stemDocument.PlainTextDocument): Use language
71            argument.
72    
73    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
74    
75            * R/source.R: Store strings and connections instead of unevaluated
76            calls.
77    
78    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
79    
80            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
81    
82    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
83    
84            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
85            (instead of a list element).
86    
87    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
88    
89            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
90            documents by names (fallback to IDs if names are not set).
91    
92    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
93    
94            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
95            \code{recursive} now determines whether existing corpus meta data
96            is used.
97    
98    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
99    
100            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
101    
102    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
105            remove terms not occurring in the corpus anymore.
106    
107    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
110            and Heaps' law.
111    
112    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
115            provided by a source.
116    
117    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/source.R (.Source): Provide document names.
120    
121    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
122    
123            * R/meta.R (`content_or_meta`): Utility function.
124    
125    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
126    
127            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
128            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
129    
130    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
131    
132            * R/weight.R (weightTfIdf): Added normalization option.
133    
134            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
135            analysis.
136    
137    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
138    
139            * R/score.R (tm_tag_score): Compute a score from the number of
140            tags matching in a document.
141    
142    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/complete.R (stemCompletion): New completion heuristics.
145    
146    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
147    
148            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
149    
150    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
151    
152            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
153            setOldClass(c(..., "list")) works.
154    
155    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
156    
157            * R/transform.R (stemDocument.character): In case input is a
158            simple character just delegate to the default Snowball stemmer.
159    
160    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
161    
162            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
163            data.
164    
165    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
166    
167            * R/doc.R (`Content<-`): Be careful with names attribute.
168    
169    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
170    
171            * R/source.R (DirSource): Improved implementation especially when
172            handling many (> 1M) files.
173    
174    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
175    
176            * R/source.R (getElem.URISource): Use encoding argument.
177    
178    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
179    
180            * R/doc.R (setOldClass): Register S3 document classes to be
181            recognized by S4 methods.
182    
183    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
184    
185            * R/matrix.R (termFreq): Add option to remove punctuation
186            characters.
187    
188    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
189    
190            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
191            merging multiple term-document matrices.
192    
193    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
194    
195            * R/corpus.R (setOldClass): Register S3 corpus classes to be
196            recognized by S4 methods.
197    
198            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
199            that CRAN Mac OS X builds do not fail any longer.
200    
201    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
202    
203            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
204            of RWeka:AlphabeticTokenizer() as default.
205    
206    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
207    
208            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
209            caused words at the beginning or the end of a line not to be removed. Do
210            not delete whitespace anymore.
211    
212    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
213    
214            * R/source.R (DirSource): Default to working directory if no path
215            is specified.
216    
217    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
218    
219            * R/source.R (DirSource): Stop on empty directories.
220    
221    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
222    
223            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
224            named documents.
225    
226    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
227    
228            * R/transform.R (removeWords): Improve regular expressions.
229    
230    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
231    
232            * R/meta.R (DublinCore): Allow lower case tags.
233    
234    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
235    
236            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
237            instead of x$children.
238    
239    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
240    
241            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
242    
243    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
244    
245            * R/: Use S3 instead of S4 class system.
246    
247    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
248    
249            * R/reader.R (readMail): Moved to tm.plugin.mail package.
250    
251    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
252    
253            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
254            postings are basically e-mails with some extra headers.
255    
256    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
257    
258            * R/transform.R: Move convertMboxEml, removeCitation,
259            removeMultipart, and removeSignature to the tm.plugin.mail package
260            since they are mainly utility functions (for handling e-mails) and
261            not very framework specific.
262    
263    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
264    
265            * man/: Fix documentation.
266    
267    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
268    
269            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
270            plain text document instead of an XML document for texts of the
271            Reuters-21578 dataset.
272    
273            * R/sparse.R: Removed since the slam package is now available on
274            CRAN.
275    
276            * DESCRIPTION (Depends): Add slam package.
277    
278    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
279    
280            * R/transform.R (stemDoc): Fix character(0) handling.
281    
282    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
283    
284            * R/doc.R (show): Pretty print.
285    
286    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
287    
288            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
289            gracefully.
290    
291    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
292    
293            * R/corpus.R: Make corpus virtual. Implement corpus with standard
294            and permanent storage semantics.
295    
296            * DESCRIPTION: New major release. A *lot* of improvements.
297    
298    2009-05-04   Ingo Feinerer <feinerer@logic.at>
299    
300            * NAMESPACE: Export some simple_triplet_matrix functions.
301    
302    2009-04-28   Ingo Feinerer <feinerer@logic.at>
303    
304            * R/weight.R: Adapt tf-idf to new matrix format.
305    
306    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
307    
308            * R/matrix.R: Create two distinct classes for term-document and
309            document-term matrices.
310    
311    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
312    
313            * R/termdocmatrix.R: No longer use Matrix package. This reduces
314            package start-up time significantly.
315    
316    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
317    
318            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
319    
320    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
321    
322            * R/transform.R (tmReduce): Combine multiple maps into one
323            transformation.
324    
325    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
326    
327            * R/weight.R: Remove weightLogical since it does not return a
328            dgCMatrix.
329    
330            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
331            or TermDocumentMatrix instead.
332    
333    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
334    
335            * inst/doc/extensions.Rnw: Finished vignette.
336    
337    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
338    
339            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
340            DocumentTermMatrix representations.
341    
342    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
343    
344            * R/reader.R (readXML): New reader for arbitrary XML files.
345    
346    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
347    
348            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
349            (XMLSource): New XMLSource class for arbitrary XML files.
350            (Source): New slot Vectorized.
351    
352    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
353    
354            * R/reader.R (readTabular): Experimental reader for tabular data
355            structures which can be customized via user-defined mappings.
356    
357            * R/reader.R: Always use UTC time zone.
358    
359            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
360    
361    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
362    
363            * R/reader.R (readDOC): Options can be passed over to antiword.
364    
365            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
366            pdftotext.
367    
368    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
369    
370            * R/source.R (DirSource): Add pattern and ignore.case arguments
371            which are internally passed over to list.files().
372    
373    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
374    
375            * inst/doc/tm.Rnw: Suppress pointless loading message.
376    
377    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
378    
379            * DESCRIPTION: Speed up package loading (via moving packages not
380            strictly necessary for normal operation to Suggests instead of
381            Depends).
382    
383    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
384    
385            * R/reader.R (readNewsgroup): The date format is now configurable.
386    
387    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
388    
389            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
390    
391    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
392    
393            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
394    
395    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
396    
397            * R/source.R (DataframeSource): New source class for data frames.
398    
399            * R/source.R: Fixed non-standard call evaluation.
400    
401    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
402    
403            * R/source.R (URISource): New source class for a single document.
404    
405    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
406    
407            * R/source.R: Refactoring.
408    
409    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
410    
411            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
412            Rmpi installations more gracefully.
413    
414    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
415    
416            * R/source.R (Source): Add Length slot.
417    
418    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
419    
420            * R/AAA.R: Unify duplicated .onLoad function.
421    
422    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
423    
424            * DESCRIPTION (Suggests): Added Rmpi.
425    
426    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
427    
428            * R/source.R (getElem): Fix 'no visible binding' warning.
429    
430            * man/WeightFunction.Rd: Fix signature.
431    
432    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
433    
434            * R/weight.R: Introduce name abbreviations for weighting functions.
435    
436    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
437    
438            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
439    
440            * R/cluster.R: Provide convenience functions for using a MPI
441            cluster.
442    
443            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
444            available.
445    
446            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
447            available.
448    
449    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
450    
451            * R/textdoccol.R (lapply): Removed debug print out.
452    
453    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/reader.R (readRCV1): Improved meta data extraction from
456            Reuters Corpus Volume 1 documents.
457    
458    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/transform.R: Ensure that all mappings preserve multiline
461            structures.
462    
463    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/filter.R: Every filter has now an attribute indicating whether
466            it sould be applied to document level (doclevel).
467    
468            * R/textdoccol.R (tmFilter): Set searchFullText as new default
469            filter.
470    
471    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
472    
473            * R/transform.R (replacePatterns): Replaced removeWords by
474            replacePatterns. Suggested by Christian Buchta.
475    
476            * R/textdoccol.R (inspect): Improved formatting.
477    
478    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * inst/CITATION: Updated JSS article information.
481    
482            * R/textdoccol.R (setAs): Added coerce method from list to
483            corpus.
484    
485            * R/meta.R (meta): Improved meta data handling.
486    
487    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
488    
489            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
490            Christian Buchta.
491    
492            * inst/CITATION: Added template to include JSS article reference.
493    
494    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * R/textdoccol.R (tmMap): Introduced lazy mapping.
497    
498            * R/source.R: Added VectorSource.
499    
500    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
501    
502            * man/: Language codes should be in ISO 639-1 format.
503    
504            * R/textdoccol.R (asPlain): Preserve local meta data.
505    
506    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * R/textdoccol.R (writeCorpus): Function for writing a corpus
509            containing plain text documents to disk.
510    
511    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
512    
513            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
514            always set correctly.
515    
516            * R/textdoccol.R: Set load = TRUE as default for load on demand
517            since in most cases this is the wanted behaviour.
518    
519    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
520    
521            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
522    
523            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
524    
525    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
526    
527            * R/meta.R (meta): New function for consistent access to meta data
528            of document collections, repositories, and texts.
529    
530    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
531    
532            * R/: Better support for encodings.
533    
534    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
535    
536            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
537            selection when no reader argument is given.
538    
539    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
540    
541            * R/source.R (CSVSource): Now uses read.csv instead of scan
542            internally.
543    
544    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
545    
546            * R/reader.R (getReaders): Returns available reader functions.
547    
548            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
549            as default.
550    
551    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
552    
553            * R/stopwords.R (stopwords): Shortened code, removed codetools
554            variable warnings.
555    
556            * man/: Documentation for showMeta, added an example for tmMap.
557    
558            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
559            some minor typos fixed.
560    
561    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
562    
563            * R/aobjects.R (showMeta): Added method for pretty printing a
564            text document's meta data.
565    
566    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
567    
568            * R/textdoccol.R (TextDocCol): Better handling of empty
569            arguments.
570    
571            * NAMESPACE: Exported readDOC.
572    
573            * man/completeStems.Rd: Added an example.
574    
575    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
576    
577            * R/stopwords.R (stopwords): Look up .dat files at every
578            call. Allows users to modify stopword .dat files interactively.
579    
580    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
581    
582            * R/termdocmatrix.R (termFreq): Correct processing of empty
583            documents.
584    
585    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
586    
587            * man/: Updated documentation.
588    
589    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
590    
591            * R/complete.R (completeStems): Completes (heuristically) word
592            stems.
593    
594            * R/termdocmatrix.R (TermDocMatrix2): New modular
595            constructor.
596    
597            * NAMESPACE: Exported termFreq.
598    
599    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
600    
601            * R/reader.R (readDOC): Added MS Word reader (using antiword).
602    
603    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
604    
605            * R/weight.R: Weighting functions for TermDocMatrix.
606    
607    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
608    
609            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
610            functions for accessing dimension, column, and row names.
611    
612            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
613    
614    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
615    
616            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
617    
618    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
619    
620            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
621    
622    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
623    
624            * R/reader.R (readPDF): Removed manual checks for pdftotext and
625            pdfinfo. The system call gives a warning anyway.
626    
627    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
628    
629            * R/textdoccol.R (asPlain): Conversion from
630            StructuredTextDocuments to PlainTextDocuments.
631    
632    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
633    
634            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
635            for accessing term-document matrices.
636    
637            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
638            are installed.
639    
640    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
641    
642            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
643            Christian Buchta.
644    
645    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
646    
647            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
648    
649    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
650    
651            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
652    
653            * R/reader.R (readPDF): Added PDF reader.
654    
655    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
658    
659            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
660    
661            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
662    
663            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
664    
665    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
666    
667            * R/distmeasure.R (dissimilarity): Replaced dists call from
668            package cba by new dist call from package proxy.
669    
670    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
673    
674    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
675    
676            * R/termdocmatrix.R: require() uses the quietly option to suppress
677            loading messages.
678    
679    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
680    
681            * R/dictionary.R: Added dictionary support.
682    
683    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
684    
685            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
686            documents. This simplifies some functions, e.g., asPlain.
687    
688    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
689    
690            * inst/doc/tm.Rnw: Fixed some typos in vignette.
691    
692    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
693    
694            * R/textdoccol.R (replaceWords): Added method to replace a set of
695            words by a single word. Useful for synonyms.
696    
697    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
698    
699            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
700    
701    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
702    
703            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
704            vectors. Thanks to Ariel Maguyon for his error report.
705            (removeSparseTerms): New function to remove columns from a
706            term-document matrix exceeding a sparse factor.
707    
708    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
709    
710            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
711    
712    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
713    
714            * man/sFilter.Rd: Corrected documentation on statement format (use
715            '==' instead of '=').
716    
717    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
718    
719            * R/aobjects.R (StructuredTextDocument): Inherits from
720            TextDocument.
721    
722    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
723    
724            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
725            on sparse matrices as proposed by Martin Maechler.
726    
727    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
728    
729            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
730            \pkg{filehash} version makes them deprecated.
731    
732    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
733    
734            * R/termdocmatrix.R (textvector): Stemming is now performed before
735            erasing stopwords.
736            (weightMatrix): Adapted to handle sparse matrices.
737            (TermDocMatrix): Sparse matrix is now efficiently built by
738            direct stepwise insertion of row values into it.
739    
740    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
741    
742            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
743            due to ongoing problems. For our purposes the latter is as useful
744            as the replaced package.
745    
746    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
747    
748            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
749    
750            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
751    
752    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
753    
754            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
755            languages with available stopwords.
756    
757    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
758    
759            * inst/doc/tm.Rnw: Minor corrections in the vignette.
760    
761    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * DESCRIPTION: Update to version 0.2, since a lot of new features
764            have been integrated.
765    
766            * inst/stopwords: Updated existing stopwords and added stopwords
767            for various other languages.
768    
769    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
770    
771            * man/: Updated documentation.
772    
773            * Work/testDb.R: Script to test database stuff.
774    
775            * R/: Fixed various database related bugs. Seems to be rather
776            useable now, i.e., consider as alpha status for now.
777    
778    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
779    
780            * R/: Fixed some bugs related to database support.
781    
782    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
783    
784            * man/: Added a lot of examples to the manuals.
785    
786    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
787    
788            * man/: Updated parts of the documentation.
789    
790            * R/textdoccol.R (asPlain): Added conversion from newsgroup
791            documents to plain text documents.
792    
793    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
794    
795            * R/textdoccol.R: Finished experimental database support. Not yet
796            intensively tested.
797    
798            * R/source.R: Now each source has a default reader.
799    
800            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
801            class anymore.
802    
803            * R/plaintextdoc.R: Custom show method for plain text documents.
804    
805            * R/aobjects.R: Added a class for structured text documents.
806    
807            * R/reader.R: Replaced remaining \code{parser} occurrences with
808            \code{reader}.
809    
810            * R/textdoccol.R (summary): Indent tags.
811    
812            * R/textdoccol.R (removePunctuation): Transform method to remove
813            punctuation marks.
814    
815    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
816    
817            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
818            using prescindMeta().
819    
820    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
821    
822            * R/textdoccol.R: Improved database support.
823    
824    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
825    
826            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
827    
828            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
829            language code.
830    
831            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
832            into parserControl argument.
833    
834            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
835    
836    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
837    
838            * Work/tmDataSetup.R: The datasets acq and crude can now be
839            created on the fly.
840    
841            * R/stopwords.R: Introduced a function returning the stopwords for
842            a given language (English, German and French at the moment)
843    
844            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
845            otherwise falls back to Snowball package.
846    
847    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
848    
849            * man/dissimilarity-methods.Rd: Make clear that any method offered
850            by "dists" from package "cba" can be used.
851    
852    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
853    
854            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
855            to Kurt's latex suggestion. Removed points and underscores in
856            variable names for consistent naming.
857    
858            * DESCRIPTION: Update to version 0.1-2.
859    
860            * man/TextRepository.Rd: Fixed bug in documentation.
861    
862    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
863    
864            * DESCRIPTION: Update to version 0.1-1.
865    
866    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
867    
868            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
869            wordStem.
870    
871    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
872    
873            * R/: Changes due to Kurt's review.
874    
875    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
876    
877            * R/: Implemented improvements based upon comments by David
878            Meyer.
879    
880    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
881    
882            * inst/doc/: Rewrote vignette.
883    
884            * man/: Improved documentation.
885    
886    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
887    
888            * man/: Updated documentation.
889    
890            * DESCRIPTION: Changed package name to "tm". Updated version to
891            0.1 for first CRAN release.
892    
893            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
894            list archive example.
895    
896            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
897            archive example.
898    
899            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
900            from (several mails per box) mbox format to (single mail per file)
901            eml format.
902    
903    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
904    
905            * data/crude.rda: Rebuilt.
906    
907            * data/acq.rda: Rebuilt.
908    
909            * R/reader.R: Factored out reader and parser methods from
910            textdoccol.R.
911    
912            * R/source.R: Factored out Source methods from aobjects.R and
913            textdoccol.R.
914            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
915            feeds.
916    
917            * R/textdoccol.R (DirSource): Added support for recursive
918            traversal of directories.
919    
920    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
921    
922            * R/textdoccol.R ([[): Loads the document corpus automatically
923            into memory upon access.
924            (tm_transform, tm_filter): Removed several checks whether the
925            document is already loaded ([[ ensures this now).
926            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
927            mailing list archive.
928    
929    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
930    
931            * R/aobjects.R (TextDocument): Is now a virtual class.
932            (Source): Is now a virtual class.
933    
934    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
935    
936            * R/textdoccol.R (c): Support for an arbitrary number of document
937            collections.
938    
939    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
940    
941            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
942            append_meta and remove_meta.
943    
944            * R/textdoccol.R: Removed modify_metadata method.
945    
946            * R/textrepo.R: Removed modify_metadata method.
947    
948            * R/textdoccol.R (remove_meta): Supports removal of document
949            collection metadata and document (= in data frame) metadata.
950    
951    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
952    
953            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
954    
955            * data/crude.rda: Rebuilt.
956    
957            * data/acq.rda: Rebuilt.
958    
959            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
960    
961            * R/textdoccol.R ([): Bug fix for subsetting a document
962            collection's data frame.
963    
964    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
965    
966            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
967            to s_filter.
968    
969            * R/textdoccol.R: Local text documents' metadata can now be copied
970            to a document collection's data frame with prescind_meta.
971    
972    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
973    
974            * R/: Text documents' slot metadata is now accessible in s_filter.
975    
976            * R/: Rewrote s_filter function (has still some restrictions).
977    
978    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
979    
980            * R/: Various fixes in handling metadata.
981    
982            * R/: Added update mechanism for text document collections.
983    
984    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
985    
986            * R/: Merging of document collections now creates a binary tree
987            for reconstructing merged document collections.
988    
989            * R/: Redesign of metadata for document collections.
990    
991    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
992    
993            * R/: Messages now use \code{ngettext}.
994    
995    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
996    
997            * R/: Added functions for modifying and removing metadata.
998    
999    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1000    
1001            * man/: Updated some documentation.
1002    
1003            * R/: Corrected some connection issues.
1004    
1005            * inst/doc: Worked on the vignette.
1006    
1007    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1008    
1009            * inst/: Added texts and started vignette.
1010    
1011            * R/: Final changes based upon David's comments.
1012    
1013    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1014    
1015            * NAMESPACE: Corrected exports (generic methods need exportMethods
1016            directives!).
1017    
1018    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1019    
1020            * R/: Modified the TextDocCol constructur and various parsers. It
1021            is now modular and supports various file formats via plugins (see
1022            the new "Source" class).
1023    
1024    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1025    
1026            * man/: Revised documentation after previous code changes.
1027    
1028    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1029    
1030            * R/: Remaining changes as discussed with David.
1031    
1032    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1033    
1034            * R/: Some changes as suggested by David. The rest will follow
1035            within the next days.
1036    
1037    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            * man/: Finished documentation.
1040    
1041    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1042    
1043            * man/: Wrote some documentation.
1044    
1045    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1046    
1047            * R/: Further syntactic sugar in form of additional assignment and
1048            accessor methods.
1049    
1050    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1051    
1052            * R/: Syntactic sugar in form of "length", "show" and "summary"
1053            operators.
1054    
1055    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1056    
1057            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1058            and dissimilarities.
1059    
1060    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1061    
1062            * R/: Added similarity functions.
1063    
1064            * data/: Added english stopwords.
1065    
1066    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1067    
1068            * data/: Examples compiled for new features
1069    
1070            * R/: Changes due to new structure.
1071    
1072            * NAMESPACE: Corrected namespace to reflect new structure.
1073    
1074            * R/termdocmatrix.R: Adapted for new naming scheme.
1075    
1076    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1077    
1078            * R/textdoccol.R: Adapted code for new class structure. Wrote
1079            several transform and filter functions operating on text document
1080            collections (alias text document databases).
1081    
1082            * R/aobjects.R: Adapted class structure with inheritance,
1083            repositories and additional meta data. Loading files on demand is
1084            now possible.
1085    
1086    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1087    
1088            * R/: Some cosmetic cleanups.
1089    
1090            * inst/: Removed vignette on clustering. That and much more is now
1091            described in the JSS paper on text mining. Based upon that
1092            article an elaborated vignette will be incorporated in the future.
1093    
1094    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1095    
1096            * R/: Updated generic S4 methods to comply with signature changes
1097            in newer versions of R (> 2.3)
1098    
1099    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1100    
1101            * ext/R/importRIS.R: Automatic RIS import is now possible.
1102    
1103    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1104    
1105            * R/textdoccol.R: Added RIS HTML input format.
1106    
1107    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1108    
1109            * R/textdoccol.R: Removed bug that caused invalid text document
1110            collections when handling many input files.
1111    
1112    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1113    
1114            * R/textdoccol.R: Restructured and extended file import
1115            mechanism.
1116    
1117            * inst/doc/clustering.Rnw: Adapted vignette for use with
1118            ReutNews.rda
1119    
1120            * man/ReutNews.Rd: Documentation for ReutNews.rda
1121    
1122            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1123    
1124    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1125    
1126            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1127            clustering facilities of this package.
1128    
1129    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1130    
1131            * R/aobjects.R: Changed package document structure to avoid class
1132            dependency problems.
1133    
1134    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1135    
1136            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1137            data set.
1138    
1139            *  Finished documentation and reordered directory structure. Now "R
1140            CMD check textmin" works without errors.
1141    
1142    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1143    
1144            * src/: Various splits can now be easily created for the
1145            Reuters21578 data set.
1146    
1147    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1148    
1149            *  Updated documentation
1150    
1151    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1152    
1153            *  Wrote R documentation for some classes and methods.
1154    
1155    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1156    
1157            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
1158            files. See the questionnaire data/Umfrage.csv for such an example.
1159            We are now able to import files in Reuters-21578 XML format.
1160    
1161            *  Changed class interfaces in various files. Weighting of the text
1162            matrix is now possible.
1163    
1164    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1165    
1166            * R/textdoccol.R: One can build term-document matrices if
1167            nessecary (with buildTDM(...)) and fill the field tdm from a text
1168            document collection with it.
1169    
1170            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1171    
1172    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1173    
1174            * R/textdoccol.R: We now can read in a whole XML file with several
1175            news items.
1176    
1177  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1178    
1179          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1166

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge