SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1151, Thu Nov 17 14:21:49 2011 UTC
# Line 1  Line 1 
1    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/matrix (termFreq): Allow local option \code{bounds$local} to
4            restrict how often a term may appear in each document (generalizes
5            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
6            for word length bounds (generalizes \code{minWordLength}).
7    
8            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
9            \code{bounds$global} for restricting how often a term is allowed
10            to appear in different documents.
11    
12            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
13            local options delegated internally to termFreq() and global
14            options which are processed by the term-document matrix
15            constructor itself.
16    
17    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
18    
19            * man/getTokenizers.Rd: Document getTokenizers().
20    
21            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
22    
23    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
24    
25            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
26    
27            * man/combine.Rd: Document c.term_frequency().
28    
29    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
30    
31            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
32            can be accessed via '[' and not '[['.
33    
34    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
35    
36            * R/stopwords.R (stopwords): Raise an error if no stopwords are
37            available for requested language. Suggested by Derek M Jones.
38    
39    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
40    
41            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
42            normalization.
43    
44    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
45    
46            * R/transform.R (stemDocument.PlainTextDocument): Use language
47            argument.
48    
49    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
50    
51            * R/source.R: Store strings and connections instead of unevaluated
52            calls.
53    
54    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
55    
56            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
57    
58    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
59    
60            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
61            (instead of a list element).
62    
63    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
64    
65            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
66            documents by names (fallback to IDs if names are not set).
67    
68    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
69    
70            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
71            \code{recursive} now determines whether existing corpus meta data
72            is used.
73    
74    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
75    
76            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
77    
78    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
79    
80            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
81            remove terms not occurring in the corpus anymore.
82    
83    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
84    
85            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
86            and Heaps' law.
87    
88    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
91            provided by a source.
92    
93    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
94    
95            * R/source.R (.Source): Provide document names.
96    
97    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/meta.R (`content_or_meta`): Utility function.
100    
101    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
102    
103            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
104            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
105    
106    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
107    
108            * R/weight.R (weightTfIdf): Added normalization option.
109    
110            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
111            analysis.
112    
113    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
114    
115            * R/score.R (tm_tag_score): Compute a score from the number of
116            tags matching in a document.
117    
118    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
119    
120            * R/complete.R (stemCompletion): New completion heuristics.
121    
122    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
125    
126    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
127    
128            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
129            setOldClass(c(..., "list")) works.
130    
131    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/transform.R (stemDocument.character): In case input is a
134            simple character just delegate to the default Snowball stemmer.
135    
136    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
137    
138            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
139            data.
140    
141    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
142    
143            * R/doc.R (`Content<-`): Be careful with names attribute.
144    
145    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
146    
147            * R/source.R (DirSource): Improved implementation especially when
148            handling many (> 1M) files.
149    
150    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
151    
152            * R/source.R (getElem.URISource): Use encoding argument.
153    
154    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
155    
156            * R/doc.R (setOldClass): Register S3 document classes to be
157            recognized by S4 methods.
158    
159    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
160    
161            * R/matrix.R (termFreq): Add option to remove punctuation
162            characters.
163    
164    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
165    
166            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
167            merging multiple term-document matrices.
168    
169    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
170    
171            * R/corpus.R (setOldClass): Register S3 corpus classes to be
172            recognized by S4 methods.
173    
174            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
175            that CRAN Mac OS X builds do not fail any longer.
176    
177    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
178    
179            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
180            of RWeka:AlphabeticTokenizer() as default.
181    
182    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
183    
184            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
185            caused words at the beginning or the end of a line not to be removed. Do
186            not delete whitespace anymore.
187    
188    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
189    
190            * R/source.R (DirSource): Default to working directory if no path
191            is specified.
192    
193    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
194    
195            * R/source.R (DirSource): Stop on empty directories.
196    
197    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
198    
199            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
200            named documents.
201    
202    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
203    
204            * R/transform.R (removeWords): Improve regular expressions.
205    
206    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
207    
208            * R/meta.R (DublinCore): Allow lower case tags.
209    
210    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
211    
212            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
213            instead of x$children.
214    
215    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
216    
217            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
218    
219    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
220    
221            * R/: Use S3 instead of S4 class system.
222    
223    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
224    
225            * R/reader.R (readMail): Moved to tm.plugin.mail package.
226    
227    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
228    
229            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
230            postings are basically e-mails with some extra headers.
231    
232    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
233    
234            * R/transform.R: Move convertMboxEml, removeCitation,
235            removeMultipart, and removeSignature to the tm.plugin.mail package
236            since they are mainly utility functions (for handling e-mails) and
237            not very framework specific.
238    
239    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
240    
241            * man/: Fix documentation.
242    
243    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
244    
245            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
246            plain text document instead of an XML document for texts of the
247            Reuters-21578 dataset.
248    
249            * R/sparse.R: Removed since the slam package is now available on
250            CRAN.
251    
252            * DESCRIPTION (Depends): Add slam package.
253    
254    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
255    
256            * R/transform.R (stemDoc): Fix character(0) handling.
257    
258    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
259    
260            * R/doc.R (show): Pretty print.
261    
262    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
263    
264            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
265            gracefully.
266    
267    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
268    
269            * R/corpus.R: Make corpus virtual. Implement corpus with standard
270            and permanent storage semantics.
271    
272            * DESCRIPTION: New major release. A *lot* of improvements.
273    
274    2009-05-04   Ingo Feinerer <feinerer@logic.at>
275    
276            * NAMESPACE: Export some simple_triplet_matrix functions.
277    
278    2009-04-28   Ingo Feinerer <feinerer@logic.at>
279    
280            * R/weight.R: Adapt tf-idf to new matrix format.
281    
282    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
283    
284            * R/matrix.R: Create two distinct classes for term-document and
285            document-term matrices.
286    
287    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
288    
289            * R/termdocmatrix.R: No longer use Matrix package. This reduces
290            package start-up time significantly.
291    
292    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
293    
294            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
295    
296    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
297    
298            * R/transform.R (tmReduce): Combine multiple maps into one
299            transformation.
300    
301    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
302    
303            * R/weight.R: Remove weightLogical since it does not return a
304            dgCMatrix.
305    
306            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
307            or TermDocumentMatrix instead.
308    
309    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
310    
311            * inst/doc/extensions.Rnw: Finished vignette.
312    
313    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
314    
315            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
316            DocumentTermMatrix representations.
317    
318    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
319    
320            * R/reader.R (readXML): New reader for arbitrary XML files.
321    
322    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
323    
324            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
325            (XMLSource): New XMLSource class for arbitrary XML files.
326            (Source): New slot Vectorized.
327    
328    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
329    
330            * R/reader.R (readTabular): Experimental reader for tabular data
331            structures which can be customized via user-defined mappings.
332    
333            * R/reader.R: Always use UTC time zone.
334    
335            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
336    
337    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
338    
339            * R/reader.R (readDOC): Options can be passed over to antiword.
340    
341            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
342            pdftotext.
343    
344    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
345    
346            * R/source.R (DirSource): Add pattern and ignore.case arguments
347            which are internally passed over to list.files().
348    
349    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
350    
351            * inst/doc/tm.Rnw: Suppress pointless loading message.
352    
353    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
354    
355            * DESCRIPTION: Speed up package loading (via moving packages not
356            strictly necessary for normal operation to Suggests instead of
357            Depends).
358    
359    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
360    
361            * R/reader.R (readNewsgroup): The date format is now configurable.
362    
363    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
364    
365            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
366    
367    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
368    
369            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
370    
371    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
372    
373            * R/source.R (DataframeSource): New source class for data frames.
374    
375            * R/source.R: Fixed non-standard call evaluation.
376    
377    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
378    
379            * R/source.R (URISource): New source class for a single document.
380    
381    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
382    
383            * R/source.R: Refactoring.
384    
385    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
386    
387            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
388            Rmpi installations more gracefully.
389    
390    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
391    
392            * R/source.R (Source): Add Length slot.
393    
394    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
395    
396            * R/AAA.R: Unify duplicated .onLoad function.
397    
398    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
399    
400            * DESCRIPTION (Suggests): Added Rmpi.
401    
402    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
403    
404            * R/source.R (getElem): Fix 'no visible binding' warning.
405    
406            * man/WeightFunction.Rd: Fix signature.
407    
408    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
409    
410            * R/weight.R: Introduce name abbreviations for weighting functions.
411    
412    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
413    
414            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
415    
416            * R/cluster.R: Provide convenience functions for using a MPI
417            cluster.
418    
419            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
420            available.
421    
422            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
423            available.
424    
425    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
426    
427            * R/textdoccol.R (lapply): Removed debug print out.
428    
429    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
430    
431            * R/reader.R (readRCV1): Improved meta data extraction from
432            Reuters Corpus Volume 1 documents.
433    
434    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * R/transform.R: Ensure that all mappings preserve multiline
437            structures.
438    
439    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * R/filter.R: Every filter has now an attribute indicating whether
442            it sould be applied to document level (doclevel).
443    
444            * R/textdoccol.R (tmFilter): Set searchFullText as new default
445            filter.
446    
447    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
448    
449            * R/transform.R (replacePatterns): Replaced removeWords by
450            replacePatterns. Suggested by Christian Buchta.
451    
452            * R/textdoccol.R (inspect): Improved formatting.
453    
454    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
455    
456            * inst/CITATION: Updated JSS article information.
457    
458            * R/textdoccol.R (setAs): Added coerce method from list to
459            corpus.
460    
461            * R/meta.R (meta): Improved meta data handling.
462    
463    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
466            Christian Buchta.
467    
468            * inst/CITATION: Added template to include JSS article reference.
469    
470    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
471    
472            * R/textdoccol.R (tmMap): Introduced lazy mapping.
473    
474            * R/source.R: Added VectorSource.
475    
476    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
477    
478            * man/: Language codes should be in ISO 639-1 format.
479    
480            * R/textdoccol.R (asPlain): Preserve local meta data.
481    
482    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * R/textdoccol.R (writeCorpus): Function for writing a corpus
485            containing plain text documents to disk.
486    
487    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
488    
489            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
490            always set correctly.
491    
492            * R/textdoccol.R: Set load = TRUE as default for load on demand
493            since in most cases this is the wanted behaviour.
494    
495    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
496    
497            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
498    
499            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
500    
501    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
502    
503            * R/meta.R (meta): New function for consistent access to meta data
504            of document collections, repositories, and texts.
505    
506    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
507    
508            * R/: Better support for encodings.
509    
510    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
511    
512            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
513            selection when no reader argument is given.
514    
515    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
516    
517            * R/source.R (CSVSource): Now uses read.csv instead of scan
518            internally.
519    
520    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
521    
522            * R/reader.R (getReaders): Returns available reader functions.
523    
524            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
525            as default.
526    
527    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
528    
529            * R/stopwords.R (stopwords): Shortened code, removed codetools
530            variable warnings.
531    
532            * man/: Documentation for showMeta, added an example for tmMap.
533    
534            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
535            some minor typos fixed.
536    
537    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
538    
539            * R/aobjects.R (showMeta): Added method for pretty printing a
540            text document's meta data.
541    
542    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
543    
544            * R/textdoccol.R (TextDocCol): Better handling of empty
545            arguments.
546    
547            * NAMESPACE: Exported readDOC.
548    
549            * man/completeStems.Rd: Added an example.
550    
551    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
552    
553            * R/stopwords.R (stopwords): Look up .dat files at every
554            call. Allows users to modify stopword .dat files interactively.
555    
556    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
557    
558            * R/termdocmatrix.R (termFreq): Correct processing of empty
559            documents.
560    
561    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
562    
563            * man/: Updated documentation.
564    
565    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
566    
567            * R/complete.R (completeStems): Completes (heuristically) word
568            stems.
569    
570            * R/termdocmatrix.R (TermDocMatrix2): New modular
571            constructor.
572    
573            * NAMESPACE: Exported termFreq.
574    
575    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
576    
577            * R/reader.R (readDOC): Added MS Word reader (using antiword).
578    
579    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
580    
581            * R/weight.R: Weighting functions for TermDocMatrix.
582    
583    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
584    
585            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
586            functions for accessing dimension, column, and row names.
587    
588            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
589    
590    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
591    
592            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
593    
594    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
595    
596            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
597    
598    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
599    
600            * R/reader.R (readPDF): Removed manual checks for pdftotext and
601            pdfinfo. The system call gives a warning anyway.
602    
603    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
604    
605            * R/textdoccol.R (asPlain): Conversion from
606            StructuredTextDocuments to PlainTextDocuments.
607    
608    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
609    
610            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
611            for accessing term-document matrices.
612    
613            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
614            are installed.
615    
616    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
617    
618            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
619            Christian Buchta.
620    
621    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
622    
623            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
624    
625    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
626    
627            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
628    
629            * R/reader.R (readPDF): Added PDF reader.
630    
631    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
632    
633            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
634    
635            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
636    
637            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
638    
639            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
640    
641    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
642    
643            * R/distmeasure.R (dissimilarity): Replaced dists call from
644            package cba by new dist call from package proxy.
645    
646    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
647    
648            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
649    
650    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
651    
652            * R/termdocmatrix.R: require() uses the quietly option to suppress
653            loading messages.
654    
655    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * R/dictionary.R: Added dictionary support.
658    
659    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
660    
661            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
662            documents. This simplifies some functions, e.g., asPlain.
663    
664    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
665    
666            * inst/doc/tm.Rnw: Fixed some typos in vignette.
667    
668    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
669    
670            * R/textdoccol.R (replaceWords): Added method to replace a set of
671            words by a single word. Useful for synonyms.
672    
673    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
674    
675            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
676    
677    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
678    
679            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
680            vectors. Thanks to Ariel Maguyon for his error report.
681            (removeSparseTerms): New function to remove columns from a
682            term-document matrix exceeding a sparse factor.
683    
684    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
687    
688    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
689    
690            * man/sFilter.Rd: Corrected documentation on statement format (use
691            '==' instead of '=').
692    
693    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
694    
695            * R/aobjects.R (StructuredTextDocument): Inherits from
696            TextDocument.
697    
698    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
699    
700            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
701            on sparse matrices as proposed by Martin Maechler.
702    
703    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
704    
705            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
706            \pkg{filehash} version makes them deprecated.
707    
708    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
709    
710            * R/termdocmatrix.R (textvector): Stemming is now performed before
711            erasing stopwords.
712            (weightMatrix): Adapted to handle sparse matrices.
713            (TermDocMatrix): Sparse matrix is now efficiently built by
714            direct stepwise insertion of row values into it.
715    
716    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
717    
718            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
719            due to ongoing problems. For our purposes the latter is as useful
720            as the replaced package.
721    
722    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
723    
724            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
725    
726            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
727    
728    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
729    
730            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
731            languages with available stopwords.
732    
733    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
734    
735            * inst/doc/tm.Rnw: Minor corrections in the vignette.
736    
737    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
738    
739            * DESCRIPTION: Update to version 0.2, since a lot of new features
740            have been integrated.
741    
742            * inst/stopwords: Updated existing stopwords and added stopwords
743            for various other languages.
744    
745    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
746    
747            * man/: Updated documentation.
748    
749            * Work/testDb.R: Script to test database stuff.
750    
751            * R/: Fixed various database related bugs. Seems to be rather
752            useable now, i.e., consider as alpha status for now.
753    
754    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
755    
756            * R/: Fixed some bugs related to database support.
757    
758    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
759    
760            * man/: Added a lot of examples to the manuals.
761    
762    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
763    
764            * man/: Updated parts of the documentation.
765    
766            * R/textdoccol.R (asPlain): Added conversion from newsgroup
767            documents to plain text documents.
768    
769    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
770    
771            * R/textdoccol.R: Finished experimental database support. Not yet
772            intensively tested.
773    
774            * R/source.R: Now each source has a default reader.
775    
776            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
777            class anymore.
778    
779            * R/plaintextdoc.R: Custom show method for plain text documents.
780    
781            * R/aobjects.R: Added a class for structured text documents.
782    
783            * R/reader.R: Replaced remaining \code{parser} occurrences with
784            \code{reader}.
785    
786            * R/textdoccol.R (summary): Indent tags.
787    
788            * R/textdoccol.R (removePunctuation): Transform method to remove
789            punctuation marks.
790    
791    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
792    
793            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
794            using prescindMeta().
795    
796    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
797    
798            * R/textdoccol.R: Improved database support.
799    
800    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
801    
802            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
803    
804            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
805            language code.
806    
807            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
808            into parserControl argument.
809    
810            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
811    
812    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
813    
814            * Work/tmDataSetup.R: The datasets acq and crude can now be
815            created on the fly.
816    
817            * R/stopwords.R: Introduced a function returning the stopwords for
818            a given language (English, German and French at the moment)
819    
820            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
821            otherwise falls back to Snowball package.
822    
823    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
824    
825            * man/dissimilarity-methods.Rd: Make clear that any method offered
826            by "dists" from package "cba" can be used.
827    
828    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
829    
830            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
831            to Kurt's latex suggestion. Removed points and underscores in
832            variable names for consistent naming.
833    
834            * DESCRIPTION: Update to version 0.1-2.
835    
836            * man/TextRepository.Rd: Fixed bug in documentation.
837    
838    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
839    
840            * DESCRIPTION: Update to version 0.1-1.
841    
842    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
843    
844            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
845            wordStem.
846    
847    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
848    
849            * R/: Changes due to Kurt's review.
850    
851    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
852    
853            * R/: Implemented improvements based upon comments by David
854            Meyer.
855    
856    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
857    
858            * inst/doc/: Rewrote vignette.
859    
860            * man/: Improved documentation.
861    
862    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
863    
864            * man/: Updated documentation.
865    
866            * DESCRIPTION: Changed package name to "tm". Updated version to
867            0.1 for first CRAN release.
868    
869            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
870            list archive example.
871    
872            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
873            archive example.
874    
875            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
876            from (several mails per box) mbox format to (single mail per file)
877            eml format.
878    
879    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
880    
881            * data/crude.rda: Rebuilt.
882    
883            * data/acq.rda: Rebuilt.
884    
885            * R/reader.R: Factored out reader and parser methods from
886            textdoccol.R.
887    
888            * R/source.R: Factored out Source methods from aobjects.R and
889            textdoccol.R.
890            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
891            feeds.
892    
893            * R/textdoccol.R (DirSource): Added support for recursive
894            traversal of directories.
895    
896    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
897    
898            * R/textdoccol.R ([[): Loads the document corpus automatically
899            into memory upon access.
900            (tm_transform, tm_filter): Removed several checks whether the
901            document is already loaded ([[ ensures this now).
902            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
903            mailing list archive.
904    
905    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
906    
907            * R/aobjects.R (TextDocument): Is now a virtual class.
908            (Source): Is now a virtual class.
909    
910    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
911    
912            * R/textdoccol.R (c): Support for an arbitrary number of document
913            collections.
914    
915    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
916    
917            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
918            append_meta and remove_meta.
919    
920            * R/textdoccol.R: Removed modify_metadata method.
921    
922            * R/textrepo.R: Removed modify_metadata method.
923    
924            * R/textdoccol.R (remove_meta): Supports removal of document
925            collection metadata and document (= in data frame) metadata.
926    
927    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
928    
929            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
930    
931            * data/crude.rda: Rebuilt.
932    
933            * data/acq.rda: Rebuilt.
934    
935            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
936    
937            * R/textdoccol.R ([): Bug fix for subsetting a document
938            collection's data frame.
939    
940    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
941    
942            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
943            to s_filter.
944    
945            * R/textdoccol.R: Local text documents' metadata can now be copied
946            to a document collection's data frame with prescind_meta.
947    
948    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
949    
950            * R/: Text documents' slot metadata is now accessible in s_filter.
951    
952            * R/: Rewrote s_filter function (has still some restrictions).
953    
954    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
955    
956            * R/: Various fixes in handling metadata.
957    
958            * R/: Added update mechanism for text document collections.
959    
960    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
961    
962            * R/: Merging of document collections now creates a binary tree
963            for reconstructing merged document collections.
964    
965            * R/: Redesign of metadata for document collections.
966    
967    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
968    
969            * R/: Messages now use \code{ngettext}.
970    
971    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
972    
973            * R/: Added functions for modifying and removing metadata.
974    
975    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
976    
977            * man/: Updated some documentation.
978    
979            * R/: Corrected some connection issues.
980    
981            * inst/doc: Worked on the vignette.
982    
983    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * inst/: Added texts and started vignette.
986    
987            * R/: Final changes based upon David's comments.
988    
989    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
990    
991            * NAMESPACE: Corrected exports (generic methods need exportMethods
992            directives!).
993    
994    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
995    
996            * R/: Modified the TextDocCol constructur and various parsers. It
997            is now modular and supports various file formats via plugins (see
998            the new "Source" class).
999    
1000    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1001    
1002            * man/: Revised documentation after previous code changes.
1003    
1004    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1005    
1006            * R/: Remaining changes as discussed with David.
1007    
1008    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1009    
1010            * R/: Some changes as suggested by David. The rest will follow
1011            within the next days.
1012    
1013    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1014    
1015            * man/: Finished documentation.
1016    
1017    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1018    
1019            * man/: Wrote some documentation.
1020    
1021    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1022    
1023            * R/: Further syntactic sugar in form of additional assignment and
1024            accessor methods.
1025    
1026    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1027    
1028            * R/: Syntactic sugar in form of "length", "show" and "summary"
1029            operators.
1030    
1031    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1032    
1033            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1034            and dissimilarities.
1035    
1036    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1037    
1038            * R/: Added similarity functions.
1039    
1040            * data/: Added english stopwords.
1041    
1042    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1043    
1044            * data/: Examples compiled for new features
1045    
1046            * R/: Changes due to new structure.
1047    
1048            * NAMESPACE: Corrected namespace to reflect new structure.
1049    
1050            * R/termdocmatrix.R: Adapted for new naming scheme.
1051    
1052    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1053    
1054            * R/textdoccol.R: Adapted code for new class structure. Wrote
1055            several transform and filter functions operating on text document
1056            collections (alias text document databases).
1057    
1058            * R/aobjects.R: Adapted class structure with inheritance,
1059            repositories and additional meta data. Loading files on demand is
1060            now possible.
1061    
1062    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1063    
1064            * R/: Some cosmetic cleanups.
1065    
1066            * inst/: Removed vignette on clustering. That and much more is now
1067            described in the JSS paper on text mining. Based upon that
1068            article an elaborated vignette will be incorporated in the future.
1069    
1070    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1071    
1072            * R/: Updated generic S4 methods to comply with signature changes
1073            in newer versions of R (> 2.3)
1074    
1075    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1076    
1077            * ext/R/importRIS.R: Automatic RIS import is now possible.
1078    
1079    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1080    
1081            * R/textdoccol.R: Added RIS HTML input format.
1082    
1083    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1084    
1085            * R/textdoccol.R: Removed bug that caused invalid text document
1086            collections when handling many input files.
1087    
1088    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1089    
1090            * R/textdoccol.R: Restructured and extended file import
1091            mechanism.
1092    
1093            * inst/doc/clustering.Rnw: Adapted vignette for use with
1094            ReutNews.rda
1095    
1096            * man/ReutNews.Rd: Documentation for ReutNews.rda
1097    
1098            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1099    
1100    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1101    
1102            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1103            clustering facilities of this package.
1104    
1105    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1106    
1107            * R/aobjects.R: Changed package document structure to avoid class
1108            dependency problems.
1109    
1110    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1111    
1112            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1113            data set.
1114    
1115            *  Finished documentation and reordered directory structure. Now "R
1116            CMD check textmin" works without errors.
1117    
1118    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1119    
1120            * src/: Various splits can now be easily created for the
1121            Reuters21578 data set.
1122    
1123    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1124    
1125            *  Updated documentation
1126    
1127    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1128    
1129            *  Wrote R documentation for some classes and methods.
1130    
1131    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1132    
1133            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
1134            files. See the questionnaire data/Umfrage.csv for such an example.
1135            We are now able to import files in Reuters-21578 XML format.
1136    
1137            *  Changed class interfaces in various files. Weighting of the text
1138            matrix is now possible.
1139    
1140    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1141    
1142            * R/textdoccol.R: One can build term-document matrices if
1143            nessecary (with buildTDM(...)) and fill the field tdm from a text
1144            document collection with it.
1145    
1146            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1147    
1148    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1149    
1150            * R/textdoccol.R: We now can read in a whole XML file with several
1151            news items.
1152    
1153  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1154    
1155          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1151

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge