SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1161, Wed Dec 7 06:10:32 2011 UTC
# Line 1  Line 1 
1    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/transform.R (removePunctuation): Add option to preserve
4            intra-word dashes.
5    
6    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/matrix.R (termFreq): Allow reordering of control option
9            processing.
10    
11    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
14            pdfinfo tool.
15    
16            * inst/stopwords/SMART.dat: Add SMART information retrieval system
17            stopwords (which are also used by the MC toolkit).
18    
19            * R/matrix (termFreq): Allow local option \code{bounds$local} to
20            restrict how often a term may appear in each document (generalizes
21            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
22            for word length bounds (generalizes \code{minWordLength}).
23    
24            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
25            \code{bounds$global} for restricting how often a term is allowed
26            to appear in different documents.
27    
28            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
29            local options delegated internally to termFreq() and global
30            options which are processed by the term-document matrix
31            constructor itself.
32    
33    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
34    
35            * man/getTokenizers.Rd: Document getTokenizers().
36    
37            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
38    
39    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
40    
41            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
42    
43            * man/combine.Rd: Document c.term_frequency().
44    
45    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
46    
47            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
48            can be accessed via '[' and not '[['.
49    
50    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
51    
52            * R/stopwords.R (stopwords): Raise an error if no stopwords are
53            available for requested language. Suggested by Derek M Jones.
54    
55    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
56    
57            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
58            normalization.
59    
60    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
61    
62            * R/transform.R (stemDocument.PlainTextDocument): Use language
63            argument.
64    
65    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
66    
67            * R/source.R: Store strings and connections instead of unevaluated
68            calls.
69    
70    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
71    
72            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
73    
74    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
75    
76            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
77            (instead of a list element).
78    
79    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
80    
81            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
82            documents by names (fallback to IDs if names are not set).
83    
84    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
85    
86            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
87            \code{recursive} now determines whether existing corpus meta data
88            is used.
89    
90    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
91    
92            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
93    
94    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
95    
96            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
97            remove terms not occurring in the corpus anymore.
98    
99    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
100    
101            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
102            and Heaps' law.
103    
104    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
105    
106            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
107            provided by a source.
108    
109    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
110    
111            * R/source.R (.Source): Provide document names.
112    
113    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
114    
115            * R/meta.R (`content_or_meta`): Utility function.
116    
117    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
120            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
121    
122    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/weight.R (weightTfIdf): Added normalization option.
125    
126            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
127            analysis.
128    
129    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
130    
131            * R/score.R (tm_tag_score): Compute a score from the number of
132            tags matching in a document.
133    
134    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
135    
136            * R/complete.R (stemCompletion): New completion heuristics.
137    
138    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
139    
140            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
141    
142    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
143    
144            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
145            setOldClass(c(..., "list")) works.
146    
147    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
148    
149            * R/transform.R (stemDocument.character): In case input is a
150            simple character just delegate to the default Snowball stemmer.
151    
152    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
153    
154            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
155            data.
156    
157    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
158    
159            * R/doc.R (`Content<-`): Be careful with names attribute.
160    
161    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
162    
163            * R/source.R (DirSource): Improved implementation especially when
164            handling many (> 1M) files.
165    
166    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
167    
168            * R/source.R (getElem.URISource): Use encoding argument.
169    
170    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/doc.R (setOldClass): Register S3 document classes to be
173            recognized by S4 methods.
174    
175    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/matrix.R (termFreq): Add option to remove punctuation
178            characters.
179    
180    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
181    
182            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
183            merging multiple term-document matrices.
184    
185    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
186    
187            * R/corpus.R (setOldClass): Register S3 corpus classes to be
188            recognized by S4 methods.
189    
190            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
191            that CRAN Mac OS X builds do not fail any longer.
192    
193    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
194    
195            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
196            of RWeka:AlphabeticTokenizer() as default.
197    
198    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
199    
200            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
201            caused words at the beginning or the end of a line not to be removed. Do
202            not delete whitespace anymore.
203    
204    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
205    
206            * R/source.R (DirSource): Default to working directory if no path
207            is specified.
208    
209    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
210    
211            * R/source.R (DirSource): Stop on empty directories.
212    
213    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
214    
215            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
216            named documents.
217    
218    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
219    
220            * R/transform.R (removeWords): Improve regular expressions.
221    
222    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
223    
224            * R/meta.R (DublinCore): Allow lower case tags.
225    
226    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
227    
228            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
229            instead of x$children.
230    
231    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
232    
233            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
234    
235    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
236    
237            * R/: Use S3 instead of S4 class system.
238    
239    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
240    
241            * R/reader.R (readMail): Moved to tm.plugin.mail package.
242    
243    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
244    
245            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
246            postings are basically e-mails with some extra headers.
247    
248    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
249    
250            * R/transform.R: Move convertMboxEml, removeCitation,
251            removeMultipart, and removeSignature to the tm.plugin.mail package
252            since they are mainly utility functions (for handling e-mails) and
253            not very framework specific.
254    
255    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
256    
257            * man/: Fix documentation.
258    
259    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
260    
261            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
262            plain text document instead of an XML document for texts of the
263            Reuters-21578 dataset.
264    
265            * R/sparse.R: Removed since the slam package is now available on
266            CRAN.
267    
268            * DESCRIPTION (Depends): Add slam package.
269    
270    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
271    
272            * R/transform.R (stemDoc): Fix character(0) handling.
273    
274    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
275    
276            * R/doc.R (show): Pretty print.
277    
278    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
279    
280            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
281            gracefully.
282    
283    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
284    
285            * R/corpus.R: Make corpus virtual. Implement corpus with standard
286            and permanent storage semantics.
287    
288            * DESCRIPTION: New major release. A *lot* of improvements.
289    
290    2009-05-04   Ingo Feinerer <feinerer@logic.at>
291    
292            * NAMESPACE: Export some simple_triplet_matrix functions.
293    
294    2009-04-28   Ingo Feinerer <feinerer@logic.at>
295    
296            * R/weight.R: Adapt tf-idf to new matrix format.
297    
298    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
299    
300            * R/matrix.R: Create two distinct classes for term-document and
301            document-term matrices.
302    
303    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
304    
305            * R/termdocmatrix.R: No longer use Matrix package. This reduces
306            package start-up time significantly.
307    
308    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
309    
310            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
311    
312    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
313    
314            * R/transform.R (tmReduce): Combine multiple maps into one
315            transformation.
316    
317    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
318    
319            * R/weight.R: Remove weightLogical since it does not return a
320            dgCMatrix.
321    
322            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
323            or TermDocumentMatrix instead.
324    
325    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
326    
327            * inst/doc/extensions.Rnw: Finished vignette.
328    
329    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
330    
331            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
332            DocumentTermMatrix representations.
333    
334    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
335    
336            * R/reader.R (readXML): New reader for arbitrary XML files.
337    
338    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
339    
340            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
341            (XMLSource): New XMLSource class for arbitrary XML files.
342            (Source): New slot Vectorized.
343    
344    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
345    
346            * R/reader.R (readTabular): Experimental reader for tabular data
347            structures which can be customized via user-defined mappings.
348    
349            * R/reader.R: Always use UTC time zone.
350    
351            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
352    
353    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
354    
355            * R/reader.R (readDOC): Options can be passed over to antiword.
356    
357            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
358            pdftotext.
359    
360    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
361    
362            * R/source.R (DirSource): Add pattern and ignore.case arguments
363            which are internally passed over to list.files().
364    
365    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
366    
367            * inst/doc/tm.Rnw: Suppress pointless loading message.
368    
369    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
370    
371            * DESCRIPTION: Speed up package loading (via moving packages not
372            strictly necessary for normal operation to Suggests instead of
373            Depends).
374    
375    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
376    
377            * R/reader.R (readNewsgroup): The date format is now configurable.
378    
379    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
380    
381            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
382    
383    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
384    
385            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
386    
387    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
388    
389            * R/source.R (DataframeSource): New source class for data frames.
390    
391            * R/source.R: Fixed non-standard call evaluation.
392    
393    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
394    
395            * R/source.R (URISource): New source class for a single document.
396    
397    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
398    
399            * R/source.R: Refactoring.
400    
401    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
402    
403            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
404            Rmpi installations more gracefully.
405    
406    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
407    
408            * R/source.R (Source): Add Length slot.
409    
410    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
411    
412            * R/AAA.R: Unify duplicated .onLoad function.
413    
414    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
415    
416            * DESCRIPTION (Suggests): Added Rmpi.
417    
418    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
419    
420            * R/source.R (getElem): Fix 'no visible binding' warning.
421    
422            * man/WeightFunction.Rd: Fix signature.
423    
424    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
425    
426            * R/weight.R: Introduce name abbreviations for weighting functions.
427    
428    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
429    
430            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
431    
432            * R/cluster.R: Provide convenience functions for using a MPI
433            cluster.
434    
435            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
436            available.
437    
438            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
439            available.
440    
441    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
442    
443            * R/textdoccol.R (lapply): Removed debug print out.
444    
445    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
446    
447            * R/reader.R (readRCV1): Improved meta data extraction from
448            Reuters Corpus Volume 1 documents.
449    
450    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
451    
452            * R/transform.R: Ensure that all mappings preserve multiline
453            structures.
454    
455    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
456    
457            * R/filter.R: Every filter has now an attribute indicating whether
458            it sould be applied to document level (doclevel).
459    
460            * R/textdoccol.R (tmFilter): Set searchFullText as new default
461            filter.
462    
463    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
464    
465            * R/transform.R (replacePatterns): Replaced removeWords by
466            replacePatterns. Suggested by Christian Buchta.
467    
468            * R/textdoccol.R (inspect): Improved formatting.
469    
470    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
471    
472            * inst/CITATION: Updated JSS article information.
473    
474            * R/textdoccol.R (setAs): Added coerce method from list to
475            corpus.
476    
477            * R/meta.R (meta): Improved meta data handling.
478    
479    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
480    
481            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
482            Christian Buchta.
483    
484            * inst/CITATION: Added template to include JSS article reference.
485    
486    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
487    
488            * R/textdoccol.R (tmMap): Introduced lazy mapping.
489    
490            * R/source.R: Added VectorSource.
491    
492    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
493    
494            * man/: Language codes should be in ISO 639-1 format.
495    
496            * R/textdoccol.R (asPlain): Preserve local meta data.
497    
498    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * R/textdoccol.R (writeCorpus): Function for writing a corpus
501            containing plain text documents to disk.
502    
503    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
504    
505            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
506            always set correctly.
507    
508            * R/textdoccol.R: Set load = TRUE as default for load on demand
509            since in most cases this is the wanted behaviour.
510    
511    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
512    
513            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
514    
515            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
516    
517    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * R/meta.R (meta): New function for consistent access to meta data
520            of document collections, repositories, and texts.
521    
522    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
523    
524            * R/: Better support for encodings.
525    
526    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
529            selection when no reader argument is given.
530    
531    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
532    
533            * R/source.R (CSVSource): Now uses read.csv instead of scan
534            internally.
535    
536    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
537    
538            * R/reader.R (getReaders): Returns available reader functions.
539    
540            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
541            as default.
542    
543    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
544    
545            * R/stopwords.R (stopwords): Shortened code, removed codetools
546            variable warnings.
547    
548            * man/: Documentation for showMeta, added an example for tmMap.
549    
550            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
551            some minor typos fixed.
552    
553    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
554    
555            * R/aobjects.R (showMeta): Added method for pretty printing a
556            text document's meta data.
557    
558    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
559    
560            * R/textdoccol.R (TextDocCol): Better handling of empty
561            arguments.
562    
563            * NAMESPACE: Exported readDOC.
564    
565            * man/completeStems.Rd: Added an example.
566    
567    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
568    
569            * R/stopwords.R (stopwords): Look up .dat files at every
570            call. Allows users to modify stopword .dat files interactively.
571    
572    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
573    
574            * R/termdocmatrix.R (termFreq): Correct processing of empty
575            documents.
576    
577    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
578    
579            * man/: Updated documentation.
580    
581    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
582    
583            * R/complete.R (completeStems): Completes (heuristically) word
584            stems.
585    
586            * R/termdocmatrix.R (TermDocMatrix2): New modular
587            constructor.
588    
589            * NAMESPACE: Exported termFreq.
590    
591    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
592    
593            * R/reader.R (readDOC): Added MS Word reader (using antiword).
594    
595    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
596    
597            * R/weight.R: Weighting functions for TermDocMatrix.
598    
599    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
600    
601            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
602            functions for accessing dimension, column, and row names.
603    
604            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
605    
606    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
607    
608            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
609    
610    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
611    
612            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
613    
614    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
615    
616            * R/reader.R (readPDF): Removed manual checks for pdftotext and
617            pdfinfo. The system call gives a warning anyway.
618    
619    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
620    
621            * R/textdoccol.R (asPlain): Conversion from
622            StructuredTextDocuments to PlainTextDocuments.
623    
624    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
625    
626            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
627            for accessing term-document matrices.
628    
629            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
630            are installed.
631    
632    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
633    
634            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
635            Christian Buchta.
636    
637    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
638    
639            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
640    
641    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
642    
643            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
644    
645            * R/reader.R (readPDF): Added PDF reader.
646    
647    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
648    
649            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
650    
651            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
652    
653            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
654    
655            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
656    
657    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
658    
659            * R/distmeasure.R (dissimilarity): Replaced dists call from
660            package cba by new dist call from package proxy.
661    
662    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
663    
664            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
665    
666    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
667    
668            * R/termdocmatrix.R: require() uses the quietly option to suppress
669            loading messages.
670    
671    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
672    
673            * R/dictionary.R: Added dictionary support.
674    
675    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
678            documents. This simplifies some functions, e.g., asPlain.
679    
680    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
681    
682            * inst/doc/tm.Rnw: Fixed some typos in vignette.
683    
684    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
685    
686            * R/textdoccol.R (replaceWords): Added method to replace a set of
687            words by a single word. Useful for synonyms.
688    
689    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
690    
691            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
692    
693    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
694    
695            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
696            vectors. Thanks to Ariel Maguyon for his error report.
697            (removeSparseTerms): New function to remove columns from a
698            term-document matrix exceeding a sparse factor.
699    
700    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
701    
702            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
703    
704    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
705    
706            * man/sFilter.Rd: Corrected documentation on statement format (use
707            '==' instead of '=').
708    
709    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
710    
711            * R/aobjects.R (StructuredTextDocument): Inherits from
712            TextDocument.
713    
714    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
715    
716            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
717            on sparse matrices as proposed by Martin Maechler.
718    
719    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
720    
721            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
722            \pkg{filehash} version makes them deprecated.
723    
724    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
725    
726            * R/termdocmatrix.R (textvector): Stemming is now performed before
727            erasing stopwords.
728            (weightMatrix): Adapted to handle sparse matrices.
729            (TermDocMatrix): Sparse matrix is now efficiently built by
730            direct stepwise insertion of row values into it.
731    
732    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
733    
734            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
735            due to ongoing problems. For our purposes the latter is as useful
736            as the replaced package.
737    
738    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
739    
740            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
741    
742            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
743    
744    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
745    
746            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
747            languages with available stopwords.
748    
749    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
750    
751            * inst/doc/tm.Rnw: Minor corrections in the vignette.
752    
753    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
754    
755            * DESCRIPTION: Update to version 0.2, since a lot of new features
756            have been integrated.
757    
758            * inst/stopwords: Updated existing stopwords and added stopwords
759            for various other languages.
760    
761    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * man/: Updated documentation.
764    
765            * Work/testDb.R: Script to test database stuff.
766    
767            * R/: Fixed various database related bugs. Seems to be rather
768            useable now, i.e., consider as alpha status for now.
769    
770    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
771    
772            * R/: Fixed some bugs related to database support.
773    
774    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
775    
776            * man/: Added a lot of examples to the manuals.
777    
778    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
779    
780            * man/: Updated parts of the documentation.
781    
782            * R/textdoccol.R (asPlain): Added conversion from newsgroup
783            documents to plain text documents.
784    
785    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
786    
787            * R/textdoccol.R: Finished experimental database support. Not yet
788            intensively tested.
789    
790            * R/source.R: Now each source has a default reader.
791    
792            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
793            class anymore.
794    
795            * R/plaintextdoc.R: Custom show method for plain text documents.
796    
797            * R/aobjects.R: Added a class for structured text documents.
798    
799            * R/reader.R: Replaced remaining \code{parser} occurrences with
800            \code{reader}.
801    
802            * R/textdoccol.R (summary): Indent tags.
803    
804            * R/textdoccol.R (removePunctuation): Transform method to remove
805            punctuation marks.
806    
807    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
808    
809            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
810            using prescindMeta().
811    
812    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
813    
814            * R/textdoccol.R: Improved database support.
815    
816    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
817    
818            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
819    
820            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
821            language code.
822    
823            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
824            into parserControl argument.
825    
826            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
827    
828    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
829    
830            * Work/tmDataSetup.R: The datasets acq and crude can now be
831            created on the fly.
832    
833            * R/stopwords.R: Introduced a function returning the stopwords for
834            a given language (English, German and French at the moment)
835    
836            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
837            otherwise falls back to Snowball package.
838    
839    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
840    
841            * man/dissimilarity-methods.Rd: Make clear that any method offered
842            by "dists" from package "cba" can be used.
843    
844    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
845    
846            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
847            to Kurt's latex suggestion. Removed points and underscores in
848            variable names for consistent naming.
849    
850            * DESCRIPTION: Update to version 0.1-2.
851    
852            * man/TextRepository.Rd: Fixed bug in documentation.
853    
854    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
855    
856            * DESCRIPTION: Update to version 0.1-1.
857    
858    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
859    
860            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
861            wordStem.
862    
863    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
864    
865            * R/: Changes due to Kurt's review.
866    
867    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * R/: Implemented improvements based upon comments by David
870            Meyer.
871    
872    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
873    
874            * inst/doc/: Rewrote vignette.
875    
876            * man/: Improved documentation.
877    
878    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
879    
880            * man/: Updated documentation.
881    
882            * DESCRIPTION: Changed package name to "tm". Updated version to
883            0.1 for first CRAN release.
884    
885            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
886            list archive example.
887    
888            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
889            archive example.
890    
891            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
892            from (several mails per box) mbox format to (single mail per file)
893            eml format.
894    
895    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
896    
897            * data/crude.rda: Rebuilt.
898    
899            * data/acq.rda: Rebuilt.
900    
901            * R/reader.R: Factored out reader and parser methods from
902            textdoccol.R.
903    
904            * R/source.R: Factored out Source methods from aobjects.R and
905            textdoccol.R.
906            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
907            feeds.
908    
909            * R/textdoccol.R (DirSource): Added support for recursive
910            traversal of directories.
911    
912    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
913    
914            * R/textdoccol.R ([[): Loads the document corpus automatically
915            into memory upon access.
916            (tm_transform, tm_filter): Removed several checks whether the
917            document is already loaded ([[ ensures this now).
918            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
919            mailing list archive.
920    
921    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
922    
923            * R/aobjects.R (TextDocument): Is now a virtual class.
924            (Source): Is now a virtual class.
925    
926    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
927    
928            * R/textdoccol.R (c): Support for an arbitrary number of document
929            collections.
930    
931    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
932    
933            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
934            append_meta and remove_meta.
935    
936            * R/textdoccol.R: Removed modify_metadata method.
937    
938            * R/textrepo.R: Removed modify_metadata method.
939    
940            * R/textdoccol.R (remove_meta): Supports removal of document
941            collection metadata and document (= in data frame) metadata.
942    
943    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
944    
945            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
946    
947            * data/crude.rda: Rebuilt.
948    
949            * data/acq.rda: Rebuilt.
950    
951            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
952    
953            * R/textdoccol.R ([): Bug fix for subsetting a document
954            collection's data frame.
955    
956    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
957    
958            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
959            to s_filter.
960    
961            * R/textdoccol.R: Local text documents' metadata can now be copied
962            to a document collection's data frame with prescind_meta.
963    
964    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
965    
966            * R/: Text documents' slot metadata is now accessible in s_filter.
967    
968            * R/: Rewrote s_filter function (has still some restrictions).
969    
970    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
971    
972            * R/: Various fixes in handling metadata.
973    
974            * R/: Added update mechanism for text document collections.
975    
976    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
977    
978            * R/: Merging of document collections now creates a binary tree
979            for reconstructing merged document collections.
980    
981            * R/: Redesign of metadata for document collections.
982    
983    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
984    
985            * R/: Messages now use \code{ngettext}.
986    
987    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
988    
989            * R/: Added functions for modifying and removing metadata.
990    
991    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
992    
993            * man/: Updated some documentation.
994    
995            * R/: Corrected some connection issues.
996    
997            * inst/doc: Worked on the vignette.
998    
999    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1000    
1001            * inst/: Added texts and started vignette.
1002    
1003            * R/: Final changes based upon David's comments.
1004    
1005    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1006    
1007            * NAMESPACE: Corrected exports (generic methods need exportMethods
1008            directives!).
1009    
1010    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1011    
1012            * R/: Modified the TextDocCol constructur and various parsers. It
1013            is now modular and supports various file formats via plugins (see
1014            the new "Source" class).
1015    
1016    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1017    
1018            * man/: Revised documentation after previous code changes.
1019    
1020    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1021    
1022            * R/: Remaining changes as discussed with David.
1023    
1024    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1025    
1026            * R/: Some changes as suggested by David. The rest will follow
1027            within the next days.
1028    
1029    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1030    
1031            * man/: Finished documentation.
1032    
1033    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1034    
1035            * man/: Wrote some documentation.
1036    
1037    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1038    
1039            * R/: Further syntactic sugar in form of additional assignment and
1040            accessor methods.
1041    
1042    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1043    
1044            * R/: Syntactic sugar in form of "length", "show" and "summary"
1045            operators.
1046    
1047    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1048    
1049            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1050            and dissimilarities.
1051    
1052    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1053    
1054            * R/: Added similarity functions.
1055    
1056            * data/: Added english stopwords.
1057    
1058    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1059    
1060            * data/: Examples compiled for new features
1061    
1062            * R/: Changes due to new structure.
1063    
1064            * NAMESPACE: Corrected namespace to reflect new structure.
1065    
1066            * R/termdocmatrix.R: Adapted for new naming scheme.
1067    
1068    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1069    
1070            * R/textdoccol.R: Adapted code for new class structure. Wrote
1071            several transform and filter functions operating on text document
1072            collections (alias text document databases).
1073    
1074            * R/aobjects.R: Adapted class structure with inheritance,
1075            repositories and additional meta data. Loading files on demand is
1076            now possible.
1077    
1078    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1079    
1080            * R/: Some cosmetic cleanups.
1081    
1082            * inst/: Removed vignette on clustering. That and much more is now
1083            described in the JSS paper on text mining. Based upon that
1084            article an elaborated vignette will be incorporated in the future.
1085    
1086    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1087    
1088            * R/: Updated generic S4 methods to comply with signature changes
1089            in newer versions of R (> 2.3)
1090    
1091    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1092    
1093            * ext/R/importRIS.R: Automatic RIS import is now possible.
1094    
1095    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1096    
1097            * R/textdoccol.R: Added RIS HTML input format.
1098    
1099    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1100    
1101            * R/textdoccol.R: Removed bug that caused invalid text document
1102            collections when handling many input files.
1103    
1104    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1105    
1106            * R/textdoccol.R: Restructured and extended file import
1107            mechanism.
1108    
1109            * inst/doc/clustering.Rnw: Adapted vignette for use with
1110            ReutNews.rda
1111    
1112            * man/ReutNews.Rd: Documentation for ReutNews.rda
1113    
1114            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1115    
1116    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1117    
1118            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1119            clustering facilities of this package.
1120    
1121    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1122    
1123            * R/aobjects.R: Changed package document structure to avoid class
1124            dependency problems.
1125    
1126    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1127    
1128            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1129            data set.
1130    
1131            *  Finished documentation and reordered directory structure. Now "R
1132            CMD check textmin" works without errors.
1133    
1134    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1135    
1136            * src/: Various splits can now be easily created for the
1137            Reuters21578 data set.
1138    
1139    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1140    
1141            *  Updated documentation
1142    
1143    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1144    
1145            *  Wrote R documentation for some classes and methods.
1146    
1147    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1148    
1149            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
1150            files. See the questionnaire data/Umfrage.csv for such an example.
1151            We are now able to import files in Reuters-21578 XML format.
1152    
1153            *  Changed class interfaces in various files. Weighting of the text
1154            matrix is now possible.
1155    
1156    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1157    
1158            * R/textdoccol.R: One can build term-document matrices if
1159            nessecary (with buildTDM(...)) and fill the field tdm from a text
1160            document collection with it.
1161    
1162            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1163    
1164    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1165    
1166            * R/textdoccol.R: We now can read in a whole XML file with several
1167            news items.
1168    
1169  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1170    
1171          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1161

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge