SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 37, Wed Jan 11 17:49:17 2006 UTC pkg/ChangeLog revision 1234, Thu Jul 25 17:45:00 2013 UTC
# Line 1  Line 1 
1    2013-07-25  Ingo Feinerer <feinerer@logic.at>
2    
3            * R/complete.R (stemCompletion): Report NA instead of error when no
4            completion can be found by the prevalent heuristic. Suggested by Hugh
5            Devlin.
6    
7    2013-07-10  Ingo Feinerer <feinerer@logic.at>
8    
9            * R/reader.R (readPDF): Use tm:::pdfinfo() (which needs the pdfinfo
10            command line tool) instead of tools:::pdf_info().
11    
12    2013-04-11  Ingo Feinerer <feinerer@logic.at>
13    
14            * R/transform.R (removeWords): Use PCRE UCP to use Unicode properties
15            to determine character types.
16    
17    2012-12-14  Ingo Feinerer <feinerer@logic.at>
18    
19            * R/matrix.R (TermDocumentMatrix): Ensure dimnames of type character
20            when generating a simple_triplet_matrix. Reported by Arho Suominen.
21    
22    2012-12-10  Ingo Feinerer <feinerer@logic.at>
23    
24            * man/tm_reduce.Rd: Document right to left folding order. Adapt
25            example as well. Suggested by Mark Rosenstein.
26    
27    2012-12-04  Ingo Feinerer <feinerer@logic.at>
28    
29            * R/filter.R (sFilter): Avoid attach() and simplify.
30    
31    2012-11-02  Ingo Feinerer <feinerer@logic.at>
32    
33            * R/doc.R (.TextDocument): Use casts to ensure data types and to avoid
34            removal of attributes.
35    
36    2012-10-03 Ingo Feinerer  <feinerer@logic.at>
37    
38            * R/weight.R (weightTfIdf, weightSMART): Gracefully handle empty
39            columns and rows (avoids blow-up due to NaN values). Suggested by Jaap
40            Frölich.
41    
42    2012-07-27 Ingo Feinerer  <feinerer@logic.at>
43    
44            * R/transform.R (removeWords): Allow longer stopword lists.
45    
46    2012-01-31  Ingo Feinerer  <feinerer@logic.at>
47    
48            * R/reader.R (readXML): Readers can now set the document language
49            themselves.
50    
51    2012-01-14  Ingo Feinerer  <feinerer@logic.at>
52    
53            * R/source.R (XMLSource, getElem.XMLSource): Simplifications as
54            proposed by Milan Bouchet-Valat.
55    
56    2012-01-11  Ingo Feinerer  <feinerer@logic.at>
57    
58            * R/matrix.R (termFreq): Fix processing of user provided
59            stopwords. Reported by Bettina Grün.
60    
61    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
62    
63            * R/matrix.R (termFreq): Fix invalid handling of
64            control$wordLengths[1]. Reported by Steven C. Bagley.
65    
66    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
67    
68            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
69    
70    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
71    
72            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
73    
74    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
75    
76            * R/transform.R (removePunctuation): Add option to preserve
77            intra-word dashes.
78    
79    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
80    
81            * R/matrix.R (termFreq): Allow reordering of control option
82            processing.
83    
84    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
85    
86            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
87            pdfinfo tool.
88    
89            * inst/stopwords/SMART.dat: Add SMART information retrieval system
90            stopwords (which are also used by the MC toolkit).
91    
92            * R/matrix (termFreq): Allow local option \code{bounds$local} to
93            restrict how often a term may appear in each document (generalizes
94            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
95            for word length bounds (generalizes \code{minWordLength}).
96    
97            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
98            \code{bounds$global} for restricting how often a term is allowed
99            to appear in different documents.
100    
101            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
102            local options delegated internally to termFreq() and global
103            options which are processed by the term-document matrix
104            constructor itself.
105    
106    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
107    
108            * man/getTokenizers.Rd: Document getTokenizers().
109    
110            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
111    
112    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
113    
114            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
115    
116            * man/combine.Rd: Document c.term_frequency().
117    
118    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
119    
120            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
121            can be accessed via '[' and not '[['.
122    
123    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
124    
125            * R/stopwords.R (stopwords): Raise an error if no stopwords are
126            available for requested language. Suggested by Derek M Jones.
127    
128    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
129    
130            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
131            normalization.
132    
133    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
134    
135            * R/transform.R (stemDocument.PlainTextDocument): Use language
136            argument.
137    
138    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
139    
140            * R/source.R: Store strings and connections instead of unevaluated
141            calls.
142    
143    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
144    
145            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
146    
147    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
148    
149            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
150            (instead of a list element).
151    
152    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
153    
154            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
155            documents by names (fallback to IDs if names are not set).
156    
157    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
158    
159            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
160            \code{recursive} now determines whether existing corpus meta data
161            is used.
162    
163    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
164    
165            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
166    
167    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
168    
169            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
170            remove terms not occurring in the corpus anymore.
171    
172    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
173    
174            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
175            and Heaps' law.
176    
177    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
178    
179            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
180            provided by a source.
181    
182    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
183    
184            * R/source.R (.Source): Provide document names.
185    
186    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
187    
188            * R/meta.R (`content_or_meta`): Utility function.
189    
190    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
191    
192            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
193            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
194    
195    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
196    
197            * R/weight.R (weightTfIdf): Added normalization option.
198    
199            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
200            analysis.
201    
202    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
203    
204            * R/score.R (tm_tag_score): Compute a score from the number of
205            tags matching in a document.
206    
207    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
208    
209            * R/complete.R (stemCompletion): New completion heuristics.
210    
211    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
212    
213            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
214    
215    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
216    
217            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
218            setOldClass(c(..., "list")) works.
219    
220    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
221    
222            * R/transform.R (stemDocument.character): In case input is a
223            simple character just delegate to the default Snowball stemmer.
224    
225    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
226    
227            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
228            data.
229    
230    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
231    
232            * R/doc.R (`Content<-`): Be careful with names attribute.
233    
234    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
235    
236            * R/source.R (DirSource): Improved implementation especially when
237            handling many (> 1M) files.
238    
239    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
240    
241            * R/source.R (getElem.URISource): Use encoding argument.
242    
243    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
244    
245            * R/doc.R (setOldClass): Register S3 document classes to be
246            recognized by S4 methods.
247    
248    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
249    
250            * R/matrix.R (termFreq): Add option to remove punctuation
251            characters.
252    
253    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
254    
255            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
256            merging multiple term-document matrices.
257    
258    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
259    
260            * R/corpus.R (setOldClass): Register S3 corpus classes to be
261            recognized by S4 methods.
262    
263            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
264            that CRAN Mac OS X builds do not fail any longer.
265    
266    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
267    
268            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
269            of RWeka:AlphabeticTokenizer() as default.
270    
271    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
272    
273            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
274            caused words at the beginning or the end of a line not to be removed. Do
275            not delete whitespace anymore.
276    
277    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
278    
279            * R/source.R (DirSource): Default to working directory if no path
280            is specified.
281    
282    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
283    
284            * R/source.R (DirSource): Stop on empty directories.
285    
286    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
287    
288            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
289            named documents.
290    
291    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
292    
293            * R/transform.R (removeWords): Improve regular expressions.
294    
295    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
296    
297            * R/meta.R (DublinCore): Allow lower case tags.
298    
299    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
300    
301            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
302            instead of x$children.
303    
304    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
305    
306            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
307    
308    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
309    
310            * R/: Use S3 instead of S4 class system.
311    
312    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
313    
314            * R/reader.R (readMail): Moved to tm.plugin.mail package.
315    
316    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
317    
318            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
319            postings are basically e-mails with some extra headers.
320    
321    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/transform.R: Move convertMboxEml, removeCitation,
324            removeMultipart, and removeSignature to the tm.plugin.mail package
325            since they are mainly utility functions (for handling e-mails) and
326            not very framework specific.
327    
328    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
329    
330            * man/: Fix documentation.
331    
332    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
333    
334            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
335            plain text document instead of an XML document for texts of the
336            Reuters-21578 dataset.
337    
338            * R/sparse.R: Removed since the slam package is now available on
339            CRAN.
340    
341            * DESCRIPTION (Depends): Add slam package.
342    
343    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
344    
345            * R/transform.R (stemDoc): Fix character(0) handling.
346    
347    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
348    
349            * R/doc.R (show): Pretty print.
350    
351    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
352    
353            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
354            gracefully.
355    
356    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
357    
358            * R/corpus.R: Make corpus virtual. Implement corpus with standard
359            and permanent storage semantics.
360    
361            * DESCRIPTION: New major release. A *lot* of improvements.
362    
363    2009-05-04   Ingo Feinerer <feinerer@logic.at>
364    
365            * NAMESPACE: Export some simple_triplet_matrix functions.
366    
367    2009-04-28   Ingo Feinerer <feinerer@logic.at>
368    
369            * R/weight.R: Adapt tf-idf to new matrix format.
370    
371    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
372    
373            * R/matrix.R: Create two distinct classes for term-document and
374            document-term matrices.
375    
376    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
377    
378            * R/termdocmatrix.R: No longer use Matrix package. This reduces
379            package start-up time significantly.
380    
381    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
382    
383            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
384    
385    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
386    
387            * R/transform.R (tmReduce): Combine multiple maps into one
388            transformation.
389    
390    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
391    
392            * R/weight.R: Remove weightLogical since it does not return a
393            dgCMatrix.
394    
395            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
396            or TermDocumentMatrix instead.
397    
398    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
399    
400            * inst/doc/extensions.Rnw: Finished vignette.
401    
402    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
403    
404            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
405            DocumentTermMatrix representations.
406    
407    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
408    
409            * R/reader.R (readXML): New reader for arbitrary XML files.
410    
411    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
412    
413            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
414            (XMLSource): New XMLSource class for arbitrary XML files.
415            (Source): New slot Vectorized.
416    
417    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
418    
419            * R/reader.R (readTabular): Experimental reader for tabular data
420            structures which can be customized via user-defined mappings.
421    
422            * R/reader.R: Always use UTC time zone.
423    
424            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
425    
426    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
427    
428            * R/reader.R (readDOC): Options can be passed over to antiword.
429    
430            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
431            pdftotext.
432    
433    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
434    
435            * R/source.R (DirSource): Add pattern and ignore.case arguments
436            which are internally passed over to list.files().
437    
438    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
439    
440            * inst/doc/tm.Rnw: Suppress pointless loading message.
441    
442    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
443    
444            * DESCRIPTION: Speed up package loading (via moving packages not
445            strictly necessary for normal operation to Suggests instead of
446            Depends).
447    
448    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
449    
450            * R/reader.R (readNewsgroup): The date format is now configurable.
451    
452    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
453    
454            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
455    
456    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
457    
458            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
459    
460    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
461    
462            * R/source.R (DataframeSource): New source class for data frames.
463    
464            * R/source.R: Fixed non-standard call evaluation.
465    
466    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
467    
468            * R/source.R (URISource): New source class for a single document.
469    
470    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
471    
472            * R/source.R: Refactoring.
473    
474    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
475    
476            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
477            Rmpi installations more gracefully.
478    
479    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
480    
481            * R/source.R (Source): Add Length slot.
482    
483    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
484    
485            * R/AAA.R: Unify duplicated .onLoad function.
486    
487    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
488    
489            * DESCRIPTION (Suggests): Added Rmpi.
490    
491    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
492    
493            * R/source.R (getElem): Fix 'no visible binding' warning.
494    
495            * man/WeightFunction.Rd: Fix signature.
496    
497    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
498    
499            * R/weight.R: Introduce name abbreviations for weighting functions.
500    
501    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
502    
503            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
504    
505            * R/cluster.R: Provide convenience functions for using a MPI
506            cluster.
507    
508            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
509            available.
510    
511            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
512            available.
513    
514    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
515    
516            * R/textdoccol.R (lapply): Removed debug print out.
517    
518    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
519    
520            * R/reader.R (readRCV1): Improved meta data extraction from
521            Reuters Corpus Volume 1 documents.
522    
523    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
524    
525            * R/transform.R: Ensure that all mappings preserve multiline
526            structures.
527    
528    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
529    
530            * R/filter.R: Every filter has now an attribute indicating whether
531            it sould be applied to document level (doclevel).
532    
533            * R/textdoccol.R (tmFilter): Set searchFullText as new default
534            filter.
535    
536    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
537    
538            * R/transform.R (replacePatterns): Replaced removeWords by
539            replacePatterns. Suggested by Christian Buchta.
540    
541            * R/textdoccol.R (inspect): Improved formatting.
542    
543    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * inst/CITATION: Updated JSS article information.
546    
547            * R/textdoccol.R (setAs): Added coerce method from list to
548            corpus.
549    
550            * R/meta.R (meta): Improved meta data handling.
551    
552    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
553    
554            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
555            Christian Buchta.
556    
557            * inst/CITATION: Added template to include JSS article reference.
558    
559    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
560    
561            * R/textdoccol.R (tmMap): Introduced lazy mapping.
562    
563            * R/source.R: Added VectorSource.
564    
565    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
566    
567            * man/: Language codes should be in ISO 639-1 format.
568    
569            * R/textdoccol.R (asPlain): Preserve local meta data.
570    
571    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
572    
573            * R/textdoccol.R (writeCorpus): Function for writing a corpus
574            containing plain text documents to disk.
575    
576    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
577    
578            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
579            always set correctly.
580    
581            * R/textdoccol.R: Set load = TRUE as default for load on demand
582            since in most cases this is the wanted behaviour.
583    
584    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
585    
586            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
587    
588            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
589    
590    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
591    
592            * R/meta.R (meta): New function for consistent access to meta data
593            of document collections, repositories, and texts.
594    
595    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
596    
597            * R/: Better support for encodings.
598    
599    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
600    
601            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
602            selection when no reader argument is given.
603    
604    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
605    
606            * R/source.R (CSVSource): Now uses read.csv instead of scan
607            internally.
608    
609    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
610    
611            * R/reader.R (getReaders): Returns available reader functions.
612    
613            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
614            as default.
615    
616    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
617    
618            * R/stopwords.R (stopwords): Shortened code, removed codetools
619            variable warnings.
620    
621            * man/: Documentation for showMeta, added an example for tmMap.
622    
623            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
624            some minor typos fixed.
625    
626    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
627    
628            * R/aobjects.R (showMeta): Added method for pretty printing a
629            text document's meta data.
630    
631    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
632    
633            * R/textdoccol.R (TextDocCol): Better handling of empty
634            arguments.
635    
636            * NAMESPACE: Exported readDOC.
637    
638            * man/completeStems.Rd: Added an example.
639    
640    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
641    
642            * R/stopwords.R (stopwords): Look up .dat files at every
643            call. Allows users to modify stopword .dat files interactively.
644    
645    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
646    
647            * R/termdocmatrix.R (termFreq): Correct processing of empty
648            documents.
649    
650    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
651    
652            * man/: Updated documentation.
653    
654    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
655    
656            * R/complete.R (completeStems): Completes (heuristically) word
657            stems.
658    
659            * R/termdocmatrix.R (TermDocMatrix2): New modular
660            constructor.
661    
662            * NAMESPACE: Exported termFreq.
663    
664    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
665    
666            * R/reader.R (readDOC): Added MS Word reader (using antiword).
667    
668    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
669    
670            * R/weight.R: Weighting functions for TermDocMatrix.
671    
672    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
673    
674            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
675            functions for accessing dimension, column, and row names.
676    
677            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
678    
679    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
680    
681            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
682    
683    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
684    
685            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
686    
687    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
688    
689            * R/reader.R (readPDF): Removed manual checks for pdftotext and
690            pdfinfo. The system call gives a warning anyway.
691    
692    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
693    
694            * R/textdoccol.R (asPlain): Conversion from
695            StructuredTextDocuments to PlainTextDocuments.
696    
697    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
698    
699            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
700            for accessing term-document matrices.
701    
702            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
703            are installed.
704    
705    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
706    
707            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
708            Christian Buchta.
709    
710    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
711    
712            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
713    
714    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
715    
716            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
717    
718            * R/reader.R (readPDF): Added PDF reader.
719    
720    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
721    
722            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
723    
724            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
725    
726            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
727    
728            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
729    
730    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
731    
732            * R/distmeasure.R (dissimilarity): Replaced dists call from
733            package cba by new dist call from package proxy.
734    
735    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
736    
737            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
738    
739    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
740    
741            * R/termdocmatrix.R: require() uses the quietly option to suppress
742            loading messages.
743    
744    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
745    
746            * R/dictionary.R: Added dictionary support.
747    
748    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
749    
750            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
751            documents. This simplifies some functions, e.g., asPlain.
752    
753    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
754    
755            * inst/doc/tm.Rnw: Fixed some typos in vignette.
756    
757    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
758    
759            * R/textdoccol.R (replaceWords): Added method to replace a set of
760            words by a single word. Useful for synonyms.
761    
762    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
763    
764            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
765    
766    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
767    
768            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
769            vectors. Thanks to Ariel Maguyon for his error report.
770            (removeSparseTerms): New function to remove columns from a
771            term-document matrix exceeding a sparse factor.
772    
773    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
774    
775            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
776    
777    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
778    
779            * man/sFilter.Rd: Corrected documentation on statement format (use
780            '==' instead of '=').
781    
782    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
783    
784            * R/aobjects.R (StructuredTextDocument): Inherits from
785            TextDocument.
786    
787    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
788    
789            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
790            on sparse matrices as proposed by Martin Maechler.
791    
792    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
793    
794            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
795            \pkg{filehash} version makes them deprecated.
796    
797    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
798    
799            * R/termdocmatrix.R (textvector): Stemming is now performed before
800            erasing stopwords.
801            (weightMatrix): Adapted to handle sparse matrices.
802            (TermDocMatrix): Sparse matrix is now efficiently built by
803            direct stepwise insertion of row values into it.
804    
805    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
806    
807            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
808            due to ongoing problems. For our purposes the latter is as useful
809            as the replaced package.
810    
811    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
812    
813            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
814    
815            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
816    
817    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
818    
819            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
820            languages with available stopwords.
821    
822    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
823    
824            * inst/doc/tm.Rnw: Minor corrections in the vignette.
825    
826    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
827    
828            * DESCRIPTION: Update to version 0.2, since a lot of new features
829            have been integrated.
830    
831            * inst/stopwords: Updated existing stopwords and added stopwords
832            for various other languages.
833    
834    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
835    
836            * man/: Updated documentation.
837    
838            * Work/testDb.R: Script to test database stuff.
839    
840            * R/: Fixed various database related bugs. Seems to be rather
841            useable now, i.e., consider as alpha status for now.
842    
843    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
844    
845            * R/: Fixed some bugs related to database support.
846    
847    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
848    
849            * man/: Added a lot of examples to the manuals.
850    
851    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
852    
853            * man/: Updated parts of the documentation.
854    
855            * R/textdoccol.R (asPlain): Added conversion from newsgroup
856            documents to plain text documents.
857    
858    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
859    
860            * R/textdoccol.R: Finished experimental database support. Not yet
861            intensively tested.
862    
863            * R/source.R: Now each source has a default reader.
864    
865            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
866            class anymore.
867    
868            * R/plaintextdoc.R: Custom show method for plain text documents.
869    
870            * R/aobjects.R: Added a class for structured text documents.
871    
872            * R/reader.R: Replaced remaining \code{parser} occurrences with
873            \code{reader}.
874    
875            * R/textdoccol.R (summary): Indent tags.
876    
877            * R/textdoccol.R (removePunctuation): Transform method to remove
878            punctuation marks.
879    
880    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
881    
882            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
883            using prescindMeta().
884    
885    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
886    
887            * R/textdoccol.R: Improved database support.
888    
889    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
890    
891            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
892    
893            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
894            language code.
895    
896            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
897            into parserControl argument.
898    
899            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
900    
901    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
902    
903            * Work/tmDataSetup.R: The datasets acq and crude can now be
904            created on the fly.
905    
906            * R/stopwords.R: Introduced a function returning the stopwords for
907            a given language (English, German and French at the moment)
908    
909            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
910            otherwise falls back to Snowball package.
911    
912    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
913    
914            * man/dissimilarity-methods.Rd: Make clear that any method offered
915            by "dists" from package "cba" can be used.
916    
917    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
918    
919            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
920            to Kurt's latex suggestion. Removed points and underscores in
921            variable names for consistent naming.
922    
923            * DESCRIPTION: Update to version 0.1-2.
924    
925            * man/TextRepository.Rd: Fixed bug in documentation.
926    
927    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
928    
929            * DESCRIPTION: Update to version 0.1-1.
930    
931    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
932    
933            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
934            wordStem.
935    
936    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
937    
938            * R/: Changes due to Kurt's review.
939    
940    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
941    
942            * R/: Implemented improvements based upon comments by David
943            Meyer.
944    
945    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
946    
947            * inst/doc/: Rewrote vignette.
948    
949            * man/: Improved documentation.
950    
951    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
952    
953            * man/: Updated documentation.
954    
955            * DESCRIPTION: Changed package name to "tm". Updated version to
956            0.1 for first CRAN release.
957    
958            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
959            list archive example.
960    
961            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
962            archive example.
963    
964            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
965            from (several mails per box) mbox format to (single mail per file)
966            eml format.
967    
968    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
969    
970            * data/crude.rda: Rebuilt.
971    
972            * data/acq.rda: Rebuilt.
973    
974            * R/reader.R: Factored out reader and parser methods from
975            textdoccol.R.
976    
977            * R/source.R: Factored out Source methods from aobjects.R and
978            textdoccol.R.
979            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
980            feeds.
981    
982            * R/textdoccol.R (DirSource): Added support for recursive
983            traversal of directories.
984    
985    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
986    
987            * R/textdoccol.R ([[): Loads the document corpus automatically
988            into memory upon access.
989            (tm_transform, tm_filter): Removed several checks whether the
990            document is already loaded ([[ ensures this now).
991            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
992            mailing list archive.
993    
994    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
995    
996            * R/aobjects.R (TextDocument): Is now a virtual class.
997            (Source): Is now a virtual class.
998    
999    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1000    
1001            * R/textdoccol.R (c): Support for an arbitrary number of document
1002            collections.
1003    
1004    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1005    
1006            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
1007            append_meta and remove_meta.
1008    
1009            * R/textdoccol.R: Removed modify_metadata method.
1010    
1011            * R/textrepo.R: Removed modify_metadata method.
1012    
1013            * R/textdoccol.R (remove_meta): Supports removal of document
1014            collection metadata and document (= in data frame) metadata.
1015    
1016    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1017    
1018            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
1019    
1020            * data/crude.rda: Rebuilt.
1021    
1022            * data/acq.rda: Rebuilt.
1023    
1024            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
1025    
1026            * R/textdoccol.R ([): Bug fix for subsetting a document
1027            collection's data frame.
1028    
1029    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1030    
1031            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
1032            to s_filter.
1033    
1034            * R/textdoccol.R: Local text documents' metadata can now be copied
1035            to a document collection's data frame with prescind_meta.
1036    
1037    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            * R/: Text documents' slot metadata is now accessible in s_filter.
1040    
1041            * R/: Rewrote s_filter function (has still some restrictions).
1042    
1043    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1044    
1045            * R/: Various fixes in handling metadata.
1046    
1047            * R/: Added update mechanism for text document collections.
1048    
1049    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1050    
1051            * R/: Merging of document collections now creates a binary tree
1052            for reconstructing merged document collections.
1053    
1054            * R/: Redesign of metadata for document collections.
1055    
1056    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1057    
1058            * R/: Messages now use \code{ngettext}.
1059    
1060    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1061    
1062            * R/: Added functions for modifying and removing metadata.
1063    
1064    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1065    
1066            * man/: Updated some documentation.
1067    
1068            * R/: Corrected some connection issues.
1069    
1070            * inst/doc: Worked on the vignette.
1071    
1072    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1073    
1074            * inst/: Added texts and started vignette.
1075    
1076            * R/: Final changes based upon David's comments.
1077    
1078    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1079    
1080            * NAMESPACE: Corrected exports (generic methods need exportMethods
1081            directives!).
1082    
1083    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1084    
1085            * R/: Modified the TextDocCol constructur and various parsers. It
1086            is now modular and supports various file formats via plugins (see
1087            the new "Source" class).
1088    
1089    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1090    
1091            * man/: Revised documentation after previous code changes.
1092    
1093    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1094    
1095            * R/: Remaining changes as discussed with David.
1096    
1097    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1098    
1099            * R/: Some changes as suggested by David. The rest will follow
1100            within the next days.
1101    
1102    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1103    
1104            * man/: Finished documentation.
1105    
1106    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1107    
1108            * man/: Wrote some documentation.
1109    
1110    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1111    
1112            * R/: Further syntactic sugar in form of additional assignment and
1113            accessor methods.
1114    
1115    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1116    
1117            * R/: Syntactic sugar in form of "length", "show" and "summary"
1118            operators.
1119    
1120    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1121    
1122            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1123            and dissimilarities.
1124    
1125    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1126    
1127            * R/: Added similarity functions.
1128    
1129            * data/: Added english stopwords.
1130    
1131    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1132    
1133            * data/: Examples compiled for new features
1134    
1135            * R/: Changes due to new structure.
1136    
1137            * NAMESPACE: Corrected namespace to reflect new structure.
1138    
1139            * R/termdocmatrix.R: Adapted for new naming scheme.
1140    
1141    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1142    
1143            * R/textdoccol.R: Adapted code for new class structure. Wrote
1144            several transform and filter functions operating on text document
1145            collections (alias text document databases).
1146    
1147            * R/aobjects.R: Adapted class structure with inheritance,
1148            repositories and additional meta data. Loading files on demand is
1149            now possible.
1150    
1151    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1152    
1153            * R/: Some cosmetic cleanups.
1154    
1155            * inst/: Removed vignette on clustering. That and much more is now
1156            described in the JSS paper on text mining. Based upon that
1157            article an elaborated vignette will be incorporated in the future.
1158    
1159    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1160    
1161            * R/: Updated generic S4 methods to comply with signature changes
1162            in newer versions of R (> 2.3)
1163    
1164    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1165    
1166            * ext/R/importRIS.R: Automatic RIS import is now possible.
1167    
1168    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1169    
1170            * R/textdoccol.R: Added RIS HTML input format.
1171    
1172    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1173    
1174            * R/textdoccol.R: Removed bug that caused invalid text document
1175            collections when handling many input files.
1176    
1177  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1178    
1179          * R/textdoccol.R: Restructured and extended file import          * R/textdoccol.R: Restructured and extended file import

Legend:
Removed from v.37  
changed lines
  Added in v.1234

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge