SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1155, Thu Nov 17 16:53:26 2011 UTC
# Line 1  Line 1 
1    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
4            pdfinfo tool.
5    
6            * inst/stopwords/SMART.dat: Add SMART information retrieval system
7            stopwords (which are also used by the MC toolkit).
8    
9            * R/matrix (termFreq): Allow local option \code{bounds$local} to
10            restrict how often a term may appear in each document (generalizes
11            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
12            for word length bounds (generalizes \code{minWordLength}).
13    
14            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
15            \code{bounds$global} for restricting how often a term is allowed
16            to appear in different documents.
17    
18            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
19            local options delegated internally to termFreq() and global
20            options which are processed by the term-document matrix
21            constructor itself.
22    
23    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
24    
25            * man/getTokenizers.Rd: Document getTokenizers().
26    
27            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
28    
29    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
30    
31            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
32    
33            * man/combine.Rd: Document c.term_frequency().
34    
35    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
38            can be accessed via '[' and not '[['.
39    
40    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
41    
42            * R/stopwords.R (stopwords): Raise an error if no stopwords are
43            available for requested language. Suggested by Derek M Jones.
44    
45    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
46    
47            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
48            normalization.
49    
50    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
51    
52            * R/transform.R (stemDocument.PlainTextDocument): Use language
53            argument.
54    
55    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
56    
57            * R/source.R: Store strings and connections instead of unevaluated
58            calls.
59    
60    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
61    
62            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
63    
64    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
65    
66            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
67            (instead of a list element).
68    
69    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
70    
71            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
72            documents by names (fallback to IDs if names are not set).
73    
74    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
75    
76            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
77            \code{recursive} now determines whether existing corpus meta data
78            is used.
79    
80    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
81    
82            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
83    
84    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
85    
86            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
87            remove terms not occurring in the corpus anymore.
88    
89    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
90    
91            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
92            and Heaps' law.
93    
94    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
95    
96            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
97            provided by a source.
98    
99    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
100    
101            * R/source.R (.Source): Provide document names.
102    
103    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
104    
105            * R/meta.R (`content_or_meta`): Utility function.
106    
107    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
110            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
111    
112    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
113    
114            * R/weight.R (weightTfIdf): Added normalization option.
115    
116            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
117            analysis.
118    
119    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
120    
121            * R/score.R (tm_tag_score): Compute a score from the number of
122            tags matching in a document.
123    
124    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
125    
126            * R/complete.R (stemCompletion): New completion heuristics.
127    
128    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
129    
130            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
131    
132    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
133    
134            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
135            setOldClass(c(..., "list")) works.
136    
137    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
138    
139            * R/transform.R (stemDocument.character): In case input is a
140            simple character just delegate to the default Snowball stemmer.
141    
142    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
143    
144            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
145            data.
146    
147    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
148    
149            * R/doc.R (`Content<-`): Be careful with names attribute.
150    
151    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
152    
153            * R/source.R (DirSource): Improved implementation especially when
154            handling many (> 1M) files.
155    
156    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
157    
158            * R/source.R (getElem.URISource): Use encoding argument.
159    
160    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
161    
162            * R/doc.R (setOldClass): Register S3 document classes to be
163            recognized by S4 methods.
164    
165    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
166    
167            * R/matrix.R (termFreq): Add option to remove punctuation
168            characters.
169    
170    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
173            merging multiple term-document matrices.
174    
175    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/corpus.R (setOldClass): Register S3 corpus classes to be
178            recognized by S4 methods.
179    
180            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
181            that CRAN Mac OS X builds do not fail any longer.
182    
183    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
184    
185            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
186            of RWeka:AlphabeticTokenizer() as default.
187    
188    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
189    
190            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
191            caused words at the beginning or the end of a line not to be removed. Do
192            not delete whitespace anymore.
193    
194    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
195    
196            * R/source.R (DirSource): Default to working directory if no path
197            is specified.
198    
199    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
200    
201            * R/source.R (DirSource): Stop on empty directories.
202    
203    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
204    
205            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
206            named documents.
207    
208    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
209    
210            * R/transform.R (removeWords): Improve regular expressions.
211    
212    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
213    
214            * R/meta.R (DublinCore): Allow lower case tags.
215    
216    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
217    
218            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
219            instead of x$children.
220    
221    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
222    
223            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
224    
225    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
226    
227            * R/: Use S3 instead of S4 class system.
228    
229    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
230    
231            * R/reader.R (readMail): Moved to tm.plugin.mail package.
232    
233    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
234    
235            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
236            postings are basically e-mails with some extra headers.
237    
238    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
239    
240            * R/transform.R: Move convertMboxEml, removeCitation,
241            removeMultipart, and removeSignature to the tm.plugin.mail package
242            since they are mainly utility functions (for handling e-mails) and
243            not very framework specific.
244    
245    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
246    
247            * man/: Fix documentation.
248    
249    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
250    
251            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
252            plain text document instead of an XML document for texts of the
253            Reuters-21578 dataset.
254    
255            * R/sparse.R: Removed since the slam package is now available on
256            CRAN.
257    
258            * DESCRIPTION (Depends): Add slam package.
259    
260    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
261    
262            * R/transform.R (stemDoc): Fix character(0) handling.
263    
264    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
265    
266            * R/doc.R (show): Pretty print.
267    
268    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
269    
270            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
271            gracefully.
272    
273    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
274    
275            * R/corpus.R: Make corpus virtual. Implement corpus with standard
276            and permanent storage semantics.
277    
278            * DESCRIPTION: New major release. A *lot* of improvements.
279    
280    2009-05-04   Ingo Feinerer <feinerer@logic.at>
281    
282            * NAMESPACE: Export some simple_triplet_matrix functions.
283    
284    2009-04-28   Ingo Feinerer <feinerer@logic.at>
285    
286            * R/weight.R: Adapt tf-idf to new matrix format.
287    
288    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
289    
290            * R/matrix.R: Create two distinct classes for term-document and
291            document-term matrices.
292    
293    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
294    
295            * R/termdocmatrix.R: No longer use Matrix package. This reduces
296            package start-up time significantly.
297    
298    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
299    
300            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
301    
302    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
303    
304            * R/transform.R (tmReduce): Combine multiple maps into one
305            transformation.
306    
307    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
308    
309            * R/weight.R: Remove weightLogical since it does not return a
310            dgCMatrix.
311    
312            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
313            or TermDocumentMatrix instead.
314    
315    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
316    
317            * inst/doc/extensions.Rnw: Finished vignette.
318    
319    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
320    
321            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
322            DocumentTermMatrix representations.
323    
324    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
325    
326            * R/reader.R (readXML): New reader for arbitrary XML files.
327    
328    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
329    
330            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
331            (XMLSource): New XMLSource class for arbitrary XML files.
332            (Source): New slot Vectorized.
333    
334    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
335    
336            * R/reader.R (readTabular): Experimental reader for tabular data
337            structures which can be customized via user-defined mappings.
338    
339            * R/reader.R: Always use UTC time zone.
340    
341            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
342    
343    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
344    
345            * R/reader.R (readDOC): Options can be passed over to antiword.
346    
347            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
348            pdftotext.
349    
350    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
351    
352            * R/source.R (DirSource): Add pattern and ignore.case arguments
353            which are internally passed over to list.files().
354    
355    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
356    
357            * inst/doc/tm.Rnw: Suppress pointless loading message.
358    
359    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
360    
361            * DESCRIPTION: Speed up package loading (via moving packages not
362            strictly necessary for normal operation to Suggests instead of
363            Depends).
364    
365    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
366    
367            * R/reader.R (readNewsgroup): The date format is now configurable.
368    
369    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
370    
371            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
372    
373    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
374    
375            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
376    
377    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
378    
379            * R/source.R (DataframeSource): New source class for data frames.
380    
381            * R/source.R: Fixed non-standard call evaluation.
382    
383    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
384    
385            * R/source.R (URISource): New source class for a single document.
386    
387    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
388    
389            * R/source.R: Refactoring.
390    
391    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
392    
393            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
394            Rmpi installations more gracefully.
395    
396    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
397    
398            * R/source.R (Source): Add Length slot.
399    
400    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
401    
402            * R/AAA.R: Unify duplicated .onLoad function.
403    
404    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
405    
406            * DESCRIPTION (Suggests): Added Rmpi.
407    
408    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
409    
410            * R/source.R (getElem): Fix 'no visible binding' warning.
411    
412            * man/WeightFunction.Rd: Fix signature.
413    
414    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
415    
416            * R/weight.R: Introduce name abbreviations for weighting functions.
417    
418    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
419    
420            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
421    
422            * R/cluster.R: Provide convenience functions for using a MPI
423            cluster.
424    
425            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
426            available.
427    
428            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
429            available.
430    
431    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
432    
433            * R/textdoccol.R (lapply): Removed debug print out.
434    
435    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
436    
437            * R/reader.R (readRCV1): Improved meta data extraction from
438            Reuters Corpus Volume 1 documents.
439    
440    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
441    
442            * R/transform.R: Ensure that all mappings preserve multiline
443            structures.
444    
445    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
446    
447            * R/filter.R: Every filter has now an attribute indicating whether
448            it sould be applied to document level (doclevel).
449    
450            * R/textdoccol.R (tmFilter): Set searchFullText as new default
451            filter.
452    
453    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/transform.R (replacePatterns): Replaced removeWords by
456            replacePatterns. Suggested by Christian Buchta.
457    
458            * R/textdoccol.R (inspect): Improved formatting.
459    
460    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
461    
462            * inst/CITATION: Updated JSS article information.
463    
464            * R/textdoccol.R (setAs): Added coerce method from list to
465            corpus.
466    
467            * R/meta.R (meta): Improved meta data handling.
468    
469    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
470    
471            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
472            Christian Buchta.
473    
474            * inst/CITATION: Added template to include JSS article reference.
475    
476    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
477    
478            * R/textdoccol.R (tmMap): Introduced lazy mapping.
479    
480            * R/source.R: Added VectorSource.
481    
482    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
483    
484            * man/: Language codes should be in ISO 639-1 format.
485    
486            * R/textdoccol.R (asPlain): Preserve local meta data.
487    
488    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
489    
490            * R/textdoccol.R (writeCorpus): Function for writing a corpus
491            containing plain text documents to disk.
492    
493    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
494    
495            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
496            always set correctly.
497    
498            * R/textdoccol.R: Set load = TRUE as default for load on demand
499            since in most cases this is the wanted behaviour.
500    
501    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
502    
503            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
504    
505            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
506    
507    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
508    
509            * R/meta.R (meta): New function for consistent access to meta data
510            of document collections, repositories, and texts.
511    
512    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
513    
514            * R/: Better support for encodings.
515    
516    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
519            selection when no reader argument is given.
520    
521    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/source.R (CSVSource): Now uses read.csv instead of scan
524            internally.
525    
526    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/reader.R (getReaders): Returns available reader functions.
529    
530            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
531            as default.
532    
533    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
534    
535            * R/stopwords.R (stopwords): Shortened code, removed codetools
536            variable warnings.
537    
538            * man/: Documentation for showMeta, added an example for tmMap.
539    
540            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
541            some minor typos fixed.
542    
543    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * R/aobjects.R (showMeta): Added method for pretty printing a
546            text document's meta data.
547    
548    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
549    
550            * R/textdoccol.R (TextDocCol): Better handling of empty
551            arguments.
552    
553            * NAMESPACE: Exported readDOC.
554    
555            * man/completeStems.Rd: Added an example.
556    
557    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
558    
559            * R/stopwords.R (stopwords): Look up .dat files at every
560            call. Allows users to modify stopword .dat files interactively.
561    
562    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
563    
564            * R/termdocmatrix.R (termFreq): Correct processing of empty
565            documents.
566    
567    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
568    
569            * man/: Updated documentation.
570    
571    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
572    
573            * R/complete.R (completeStems): Completes (heuristically) word
574            stems.
575    
576            * R/termdocmatrix.R (TermDocMatrix2): New modular
577            constructor.
578    
579            * NAMESPACE: Exported termFreq.
580    
581    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
582    
583            * R/reader.R (readDOC): Added MS Word reader (using antiword).
584    
585    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
586    
587            * R/weight.R: Weighting functions for TermDocMatrix.
588    
589    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
590    
591            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
592            functions for accessing dimension, column, and row names.
593    
594            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
595    
596    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
597    
598            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
599    
600    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
601    
602            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
603    
604    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
605    
606            * R/reader.R (readPDF): Removed manual checks for pdftotext and
607            pdfinfo. The system call gives a warning anyway.
608    
609    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
610    
611            * R/textdoccol.R (asPlain): Conversion from
612            StructuredTextDocuments to PlainTextDocuments.
613    
614    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
615    
616            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
617            for accessing term-document matrices.
618    
619            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
620            are installed.
621    
622    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
623    
624            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
625            Christian Buchta.
626    
627    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
628    
629            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
630    
631    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
632    
633            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
634    
635            * R/reader.R (readPDF): Added PDF reader.
636    
637    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
638    
639            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
640    
641            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
642    
643            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
644    
645            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
646    
647    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
648    
649            * R/distmeasure.R (dissimilarity): Replaced dists call from
650            package cba by new dist call from package proxy.
651    
652    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
653    
654            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
655    
656    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
657    
658            * R/termdocmatrix.R: require() uses the quietly option to suppress
659            loading messages.
660    
661    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
662    
663            * R/dictionary.R: Added dictionary support.
664    
665    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
666    
667            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
668            documents. This simplifies some functions, e.g., asPlain.
669    
670    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * inst/doc/tm.Rnw: Fixed some typos in vignette.
673    
674    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
675    
676            * R/textdoccol.R (replaceWords): Added method to replace a set of
677            words by a single word. Useful for synonyms.
678    
679    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
680    
681            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
682    
683    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
684    
685            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
686            vectors. Thanks to Ariel Maguyon for his error report.
687            (removeSparseTerms): New function to remove columns from a
688            term-document matrix exceeding a sparse factor.
689    
690    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
691    
692            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
693    
694    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
695    
696            * man/sFilter.Rd: Corrected documentation on statement format (use
697            '==' instead of '=').
698    
699    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
700    
701            * R/aobjects.R (StructuredTextDocument): Inherits from
702            TextDocument.
703    
704    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
705    
706            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
707            on sparse matrices as proposed by Martin Maechler.
708    
709    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
710    
711            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
712            \pkg{filehash} version makes them deprecated.
713    
714    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
715    
716            * R/termdocmatrix.R (textvector): Stemming is now performed before
717            erasing stopwords.
718            (weightMatrix): Adapted to handle sparse matrices.
719            (TermDocMatrix): Sparse matrix is now efficiently built by
720            direct stepwise insertion of row values into it.
721    
722    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
723    
724            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
725            due to ongoing problems. For our purposes the latter is as useful
726            as the replaced package.
727    
728    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
729    
730            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
731    
732            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
733    
734    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
735    
736            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
737            languages with available stopwords.
738    
739    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
740    
741            * inst/doc/tm.Rnw: Minor corrections in the vignette.
742    
743    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
744    
745            * DESCRIPTION: Update to version 0.2, since a lot of new features
746            have been integrated.
747    
748            * inst/stopwords: Updated existing stopwords and added stopwords
749            for various other languages.
750    
751    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
752    
753            * man/: Updated documentation.
754    
755            * Work/testDb.R: Script to test database stuff.
756    
757            * R/: Fixed various database related bugs. Seems to be rather
758            useable now, i.e., consider as alpha status for now.
759    
760    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
761    
762            * R/: Fixed some bugs related to database support.
763    
764    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
765    
766            * man/: Added a lot of examples to the manuals.
767    
768    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
769    
770            * man/: Updated parts of the documentation.
771    
772            * R/textdoccol.R (asPlain): Added conversion from newsgroup
773            documents to plain text documents.
774    
775    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
776    
777            * R/textdoccol.R: Finished experimental database support. Not yet
778            intensively tested.
779    
780            * R/source.R: Now each source has a default reader.
781    
782            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
783            class anymore.
784    
785            * R/plaintextdoc.R: Custom show method for plain text documents.
786    
787            * R/aobjects.R: Added a class for structured text documents.
788    
789            * R/reader.R: Replaced remaining \code{parser} occurrences with
790            \code{reader}.
791    
792            * R/textdoccol.R (summary): Indent tags.
793    
794            * R/textdoccol.R (removePunctuation): Transform method to remove
795            punctuation marks.
796    
797    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
798    
799            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
800            using prescindMeta().
801    
802    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
803    
804            * R/textdoccol.R: Improved database support.
805    
806    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
807    
808            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
809    
810            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
811            language code.
812    
813            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
814            into parserControl argument.
815    
816            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
817    
818    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
819    
820            * Work/tmDataSetup.R: The datasets acq and crude can now be
821            created on the fly.
822    
823            * R/stopwords.R: Introduced a function returning the stopwords for
824            a given language (English, German and French at the moment)
825    
826            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
827            otherwise falls back to Snowball package.
828    
829    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
830    
831            * man/dissimilarity-methods.Rd: Make clear that any method offered
832            by "dists" from package "cba" can be used.
833    
834    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
835    
836            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
837            to Kurt's latex suggestion. Removed points and underscores in
838            variable names for consistent naming.
839    
840            * DESCRIPTION: Update to version 0.1-2.
841    
842            * man/TextRepository.Rd: Fixed bug in documentation.
843    
844    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
845    
846            * DESCRIPTION: Update to version 0.1-1.
847    
848    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
849    
850            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
851            wordStem.
852    
853    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
854    
855            * R/: Changes due to Kurt's review.
856    
857    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
858    
859            * R/: Implemented improvements based upon comments by David
860            Meyer.
861    
862    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
863    
864            * inst/doc/: Rewrote vignette.
865    
866            * man/: Improved documentation.
867    
868    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
869    
870            * man/: Updated documentation.
871    
872            * DESCRIPTION: Changed package name to "tm". Updated version to
873            0.1 for first CRAN release.
874    
875            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
876            list archive example.
877    
878            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
879            archive example.
880    
881            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
882            from (several mails per box) mbox format to (single mail per file)
883            eml format.
884    
885    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
886    
887            * data/crude.rda: Rebuilt.
888    
889            * data/acq.rda: Rebuilt.
890    
891            * R/reader.R: Factored out reader and parser methods from
892            textdoccol.R.
893    
894            * R/source.R: Factored out Source methods from aobjects.R and
895            textdoccol.R.
896            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
897            feeds.
898    
899            * R/textdoccol.R (DirSource): Added support for recursive
900            traversal of directories.
901    
902    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
903    
904            * R/textdoccol.R ([[): Loads the document corpus automatically
905            into memory upon access.
906            (tm_transform, tm_filter): Removed several checks whether the
907            document is already loaded ([[ ensures this now).
908            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
909            mailing list archive.
910    
911    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
912    
913            * R/aobjects.R (TextDocument): Is now a virtual class.
914            (Source): Is now a virtual class.
915    
916    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
917    
918            * R/textdoccol.R (c): Support for an arbitrary number of document
919            collections.
920    
921    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
922    
923            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
924            append_meta and remove_meta.
925    
926            * R/textdoccol.R: Removed modify_metadata method.
927    
928            * R/textrepo.R: Removed modify_metadata method.
929    
930            * R/textdoccol.R (remove_meta): Supports removal of document
931            collection metadata and document (= in data frame) metadata.
932    
933    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
934    
935            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
936    
937            * data/crude.rda: Rebuilt.
938    
939            * data/acq.rda: Rebuilt.
940    
941            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
942    
943            * R/textdoccol.R ([): Bug fix for subsetting a document
944            collection's data frame.
945    
946    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
947    
948            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
949            to s_filter.
950    
951            * R/textdoccol.R: Local text documents' metadata can now be copied
952            to a document collection's data frame with prescind_meta.
953    
954    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
955    
956            * R/: Text documents' slot metadata is now accessible in s_filter.
957    
958            * R/: Rewrote s_filter function (has still some restrictions).
959    
960    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
961    
962            * R/: Various fixes in handling metadata.
963    
964            * R/: Added update mechanism for text document collections.
965    
966    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
967    
968            * R/: Merging of document collections now creates a binary tree
969            for reconstructing merged document collections.
970    
971            * R/: Redesign of metadata for document collections.
972    
973    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
974    
975            * R/: Messages now use \code{ngettext}.
976    
977    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
978    
979            * R/: Added functions for modifying and removing metadata.
980    
981    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
982    
983            * man/: Updated some documentation.
984    
985            * R/: Corrected some connection issues.
986    
987            * inst/doc: Worked on the vignette.
988    
989    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
990    
991            * inst/: Added texts and started vignette.
992    
993            * R/: Final changes based upon David's comments.
994    
995    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
996    
997            * NAMESPACE: Corrected exports (generic methods need exportMethods
998            directives!).
999    
1000    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1001    
1002            * R/: Modified the TextDocCol constructur and various parsers. It
1003            is now modular and supports various file formats via plugins (see
1004            the new "Source" class).
1005    
1006    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1007    
1008            * man/: Revised documentation after previous code changes.
1009    
1010    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1011    
1012            * R/: Remaining changes as discussed with David.
1013    
1014    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1015    
1016            * R/: Some changes as suggested by David. The rest will follow
1017            within the next days.
1018    
1019    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1020    
1021            * man/: Finished documentation.
1022    
1023    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1024    
1025            * man/: Wrote some documentation.
1026    
1027    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1028    
1029            * R/: Further syntactic sugar in form of additional assignment and
1030            accessor methods.
1031    
1032    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1033    
1034            * R/: Syntactic sugar in form of "length", "show" and "summary"
1035            operators.
1036    
1037    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1040            and dissimilarities.
1041    
1042    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1043    
1044            * R/: Added similarity functions.
1045    
1046            * data/: Added english stopwords.
1047    
1048    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1049    
1050            * data/: Examples compiled for new features
1051    
1052            * R/: Changes due to new structure.
1053    
1054            * NAMESPACE: Corrected namespace to reflect new structure.
1055    
1056            * R/termdocmatrix.R: Adapted for new naming scheme.
1057    
1058    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1059    
1060            * R/textdoccol.R: Adapted code for new class structure. Wrote
1061            several transform and filter functions operating on text document
1062            collections (alias text document databases).
1063    
1064            * R/aobjects.R: Adapted class structure with inheritance,
1065            repositories and additional meta data. Loading files on demand is
1066            now possible.
1067    
1068    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1069    
1070            * R/: Some cosmetic cleanups.
1071    
1072            * inst/: Removed vignette on clustering. That and much more is now
1073            described in the JSS paper on text mining. Based upon that
1074            article an elaborated vignette will be incorporated in the future.
1075    
1076    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1077    
1078            * R/: Updated generic S4 methods to comply with signature changes
1079            in newer versions of R (> 2.3)
1080    
1081    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1082    
1083            * ext/R/importRIS.R: Automatic RIS import is now possible.
1084    
1085    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1086    
1087            * R/textdoccol.R: Added RIS HTML input format.
1088    
1089    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1090    
1091            * R/textdoccol.R: Removed bug that caused invalid text document
1092            collections when handling many input files.
1093    
1094    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1095    
1096            * R/textdoccol.R: Restructured and extended file import
1097            mechanism.
1098    
1099            * inst/doc/clustering.Rnw: Adapted vignette for use with
1100            ReutNews.rda
1101    
1102            * man/ReutNews.Rd: Documentation for ReutNews.rda
1103    
1104            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1105    
1106    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1107    
1108            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1109            clustering facilities of this package.
1110    
1111    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1112    
1113            * R/aobjects.R: Changed package document structure to avoid class
1114            dependency problems.
1115    
1116    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1117    
1118            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1119            data set.
1120    
1121            *  Finished documentation and reordered directory structure. Now "R
1122            CMD check textmin" works without errors.
1123    
1124    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1125    
1126            * src/: Various splits can now be easily created for the
1127            Reuters21578 data set.
1128    
1129    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1130    
1131            *  Updated documentation
1132    
1133    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1134    
1135            *  Wrote R documentation for some classes and methods.
1136    
1137    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1138    
1139            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
1140            files. See the questionnaire data/Umfrage.csv for such an example.
1141            We are now able to import files in Reuters-21578 XML format.
1142    
1143            *  Changed class interfaces in various files. Weighting of the text
1144            matrix is now possible.
1145    
1146    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1147    
1148            * R/textdoccol.R: One can build term-document matrices if
1149            nessecary (with buildTDM(...)) and fill the field tdm from a text
1150            document collection with it.
1151    
1152            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1153    
1154    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1155    
1156            * R/textdoccol.R: We now can read in a whole XML file with several
1157            news items.
1158    
1159  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1160    
1161          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1155

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge