SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1153, Thu Nov 17 15:45:31 2011 UTC
# Line 1  Line 1 
1    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
2    
3            * inst/stopwords/SMART.dat: Add SMART information retrieval system
4            stopwords (which are also used by the MC toolkit).
5    
6            * R/matrix (termFreq): Allow local option \code{bounds$local} to
7            restrict how often a term may appear in each document (generalizes
8            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
9            for word length bounds (generalizes \code{minWordLength}).
10    
11            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
12            \code{bounds$global} for restricting how often a term is allowed
13            to appear in different documents.
14    
15            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
16            local options delegated internally to termFreq() and global
17            options which are processed by the term-document matrix
18            constructor itself.
19    
20    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
21    
22            * man/getTokenizers.Rd: Document getTokenizers().
23    
24            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
25    
26    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
27    
28            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
29    
30            * man/combine.Rd: Document c.term_frequency().
31    
32    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
33    
34            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
35            can be accessed via '[' and not '[['.
36    
37    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
38    
39            * R/stopwords.R (stopwords): Raise an error if no stopwords are
40            available for requested language. Suggested by Derek M Jones.
41    
42    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
43    
44            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
45            normalization.
46    
47    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
48    
49            * R/transform.R (stemDocument.PlainTextDocument): Use language
50            argument.
51    
52    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
53    
54            * R/source.R: Store strings and connections instead of unevaluated
55            calls.
56    
57    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
58    
59            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
60    
61    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
62    
63            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
64            (instead of a list element).
65    
66    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
67    
68            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
69            documents by names (fallback to IDs if names are not set).
70    
71    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
72    
73            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
74            \code{recursive} now determines whether existing corpus meta data
75            is used.
76    
77    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
78    
79            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
80    
81    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
82    
83            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
84            remove terms not occurring in the corpus anymore.
85    
86    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
87    
88            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
89            and Heaps' law.
90    
91    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
92    
93            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
94            provided by a source.
95    
96    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
97    
98            * R/source.R (.Source): Provide document names.
99    
100    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
101    
102            * R/meta.R (`content_or_meta`): Utility function.
103    
104    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
105    
106            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
107            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
108    
109    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
110    
111            * R/weight.R (weightTfIdf): Added normalization option.
112    
113            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
114            analysis.
115    
116    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
117    
118            * R/score.R (tm_tag_score): Compute a score from the number of
119            tags matching in a document.
120    
121    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
122    
123            * R/complete.R (stemCompletion): New completion heuristics.
124    
125    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
126    
127            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
128    
129    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
130    
131            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
132            setOldClass(c(..., "list")) works.
133    
134    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
135    
136            * R/transform.R (stemDocument.character): In case input is a
137            simple character just delegate to the default Snowball stemmer.
138    
139    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
140    
141            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
142            data.
143    
144    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
145    
146            * R/doc.R (`Content<-`): Be careful with names attribute.
147    
148    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
149    
150            * R/source.R (DirSource): Improved implementation especially when
151            handling many (> 1M) files.
152    
153    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
154    
155            * R/source.R (getElem.URISource): Use encoding argument.
156    
157    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
158    
159            * R/doc.R (setOldClass): Register S3 document classes to be
160            recognized by S4 methods.
161    
162    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
163    
164            * R/matrix.R (termFreq): Add option to remove punctuation
165            characters.
166    
167    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
168    
169            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
170            merging multiple term-document matrices.
171    
172    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
173    
174            * R/corpus.R (setOldClass): Register S3 corpus classes to be
175            recognized by S4 methods.
176    
177            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
178            that CRAN Mac OS X builds do not fail any longer.
179    
180    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
181    
182            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
183            of RWeka:AlphabeticTokenizer() as default.
184    
185    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
186    
187            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
188            caused words at the beginning or the end of a line not to be removed. Do
189            not delete whitespace anymore.
190    
191    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
192    
193            * R/source.R (DirSource): Default to working directory if no path
194            is specified.
195    
196    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
197    
198            * R/source.R (DirSource): Stop on empty directories.
199    
200    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
201    
202            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
203            named documents.
204    
205    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
206    
207            * R/transform.R (removeWords): Improve regular expressions.
208    
209    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
210    
211            * R/meta.R (DublinCore): Allow lower case tags.
212    
213    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
214    
215            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
216            instead of x$children.
217    
218    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
219    
220            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
221    
222    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
223    
224            * R/: Use S3 instead of S4 class system.
225    
226    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
227    
228            * R/reader.R (readMail): Moved to tm.plugin.mail package.
229    
230    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
231    
232            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
233            postings are basically e-mails with some extra headers.
234    
235    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
236    
237            * R/transform.R: Move convertMboxEml, removeCitation,
238            removeMultipart, and removeSignature to the tm.plugin.mail package
239            since they are mainly utility functions (for handling e-mails) and
240            not very framework specific.
241    
242    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
243    
244            * man/: Fix documentation.
245    
246    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
247    
248            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
249            plain text document instead of an XML document for texts of the
250            Reuters-21578 dataset.
251    
252            * R/sparse.R: Removed since the slam package is now available on
253            CRAN.
254    
255            * DESCRIPTION (Depends): Add slam package.
256    
257    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
258    
259            * R/transform.R (stemDoc): Fix character(0) handling.
260    
261    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
262    
263            * R/doc.R (show): Pretty print.
264    
265    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
266    
267            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
268            gracefully.
269    
270    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
271    
272            * R/corpus.R: Make corpus virtual. Implement corpus with standard
273            and permanent storage semantics.
274    
275            * DESCRIPTION: New major release. A *lot* of improvements.
276    
277    2009-05-04   Ingo Feinerer <feinerer@logic.at>
278    
279            * NAMESPACE: Export some simple_triplet_matrix functions.
280    
281    2009-04-28   Ingo Feinerer <feinerer@logic.at>
282    
283            * R/weight.R: Adapt tf-idf to new matrix format.
284    
285    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
286    
287            * R/matrix.R: Create two distinct classes for term-document and
288            document-term matrices.
289    
290    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
291    
292            * R/termdocmatrix.R: No longer use Matrix package. This reduces
293            package start-up time significantly.
294    
295    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
296    
297            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
298    
299    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
300    
301            * R/transform.R (tmReduce): Combine multiple maps into one
302            transformation.
303    
304    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
305    
306            * R/weight.R: Remove weightLogical since it does not return a
307            dgCMatrix.
308    
309            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
310            or TermDocumentMatrix instead.
311    
312    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
313    
314            * inst/doc/extensions.Rnw: Finished vignette.
315    
316    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
317    
318            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
319            DocumentTermMatrix representations.
320    
321    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/reader.R (readXML): New reader for arbitrary XML files.
324    
325    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
326    
327            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
328            (XMLSource): New XMLSource class for arbitrary XML files.
329            (Source): New slot Vectorized.
330    
331    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
332    
333            * R/reader.R (readTabular): Experimental reader for tabular data
334            structures which can be customized via user-defined mappings.
335    
336            * R/reader.R: Always use UTC time zone.
337    
338            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
339    
340    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
341    
342            * R/reader.R (readDOC): Options can be passed over to antiword.
343    
344            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
345            pdftotext.
346    
347    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
348    
349            * R/source.R (DirSource): Add pattern and ignore.case arguments
350            which are internally passed over to list.files().
351    
352    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
353    
354            * inst/doc/tm.Rnw: Suppress pointless loading message.
355    
356    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
357    
358            * DESCRIPTION: Speed up package loading (via moving packages not
359            strictly necessary for normal operation to Suggests instead of
360            Depends).
361    
362    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
363    
364            * R/reader.R (readNewsgroup): The date format is now configurable.
365    
366    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
367    
368            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
369    
370    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
371    
372            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
373    
374    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
375    
376            * R/source.R (DataframeSource): New source class for data frames.
377    
378            * R/source.R: Fixed non-standard call evaluation.
379    
380    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
381    
382            * R/source.R (URISource): New source class for a single document.
383    
384    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
385    
386            * R/source.R: Refactoring.
387    
388    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
389    
390            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
391            Rmpi installations more gracefully.
392    
393    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
394    
395            * R/source.R (Source): Add Length slot.
396    
397    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
398    
399            * R/AAA.R: Unify duplicated .onLoad function.
400    
401    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
402    
403            * DESCRIPTION (Suggests): Added Rmpi.
404    
405    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
406    
407            * R/source.R (getElem): Fix 'no visible binding' warning.
408    
409            * man/WeightFunction.Rd: Fix signature.
410    
411    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
412    
413            * R/weight.R: Introduce name abbreviations for weighting functions.
414    
415    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
416    
417            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
418    
419            * R/cluster.R: Provide convenience functions for using a MPI
420            cluster.
421    
422            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
423            available.
424    
425            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
426            available.
427    
428    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
429    
430            * R/textdoccol.R (lapply): Removed debug print out.
431    
432    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
433    
434            * R/reader.R (readRCV1): Improved meta data extraction from
435            Reuters Corpus Volume 1 documents.
436    
437    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
438    
439            * R/transform.R: Ensure that all mappings preserve multiline
440            structures.
441    
442    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
443    
444            * R/filter.R: Every filter has now an attribute indicating whether
445            it sould be applied to document level (doclevel).
446    
447            * R/textdoccol.R (tmFilter): Set searchFullText as new default
448            filter.
449    
450    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
451    
452            * R/transform.R (replacePatterns): Replaced removeWords by
453            replacePatterns. Suggested by Christian Buchta.
454    
455            * R/textdoccol.R (inspect): Improved formatting.
456    
457    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
458    
459            * inst/CITATION: Updated JSS article information.
460    
461            * R/textdoccol.R (setAs): Added coerce method from list to
462            corpus.
463    
464            * R/meta.R (meta): Improved meta data handling.
465    
466    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
467    
468            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
469            Christian Buchta.
470    
471            * inst/CITATION: Added template to include JSS article reference.
472    
473    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/textdoccol.R (tmMap): Introduced lazy mapping.
476    
477            * R/source.R: Added VectorSource.
478    
479    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
480    
481            * man/: Language codes should be in ISO 639-1 format.
482    
483            * R/textdoccol.R (asPlain): Preserve local meta data.
484    
485    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
486    
487            * R/textdoccol.R (writeCorpus): Function for writing a corpus
488            containing plain text documents to disk.
489    
490    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
491    
492            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
493            always set correctly.
494    
495            * R/textdoccol.R: Set load = TRUE as default for load on demand
496            since in most cases this is the wanted behaviour.
497    
498    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
501    
502            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
503    
504    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
505    
506            * R/meta.R (meta): New function for consistent access to meta data
507            of document collections, repositories, and texts.
508    
509    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
510    
511            * R/: Better support for encodings.
512    
513    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
514    
515            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
516            selection when no reader argument is given.
517    
518    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
519    
520            * R/source.R (CSVSource): Now uses read.csv instead of scan
521            internally.
522    
523    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
524    
525            * R/reader.R (getReaders): Returns available reader functions.
526    
527            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
528            as default.
529    
530    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
531    
532            * R/stopwords.R (stopwords): Shortened code, removed codetools
533            variable warnings.
534    
535            * man/: Documentation for showMeta, added an example for tmMap.
536    
537            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
538            some minor typos fixed.
539    
540    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
541    
542            * R/aobjects.R (showMeta): Added method for pretty printing a
543            text document's meta data.
544    
545    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
546    
547            * R/textdoccol.R (TextDocCol): Better handling of empty
548            arguments.
549    
550            * NAMESPACE: Exported readDOC.
551    
552            * man/completeStems.Rd: Added an example.
553    
554    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * R/stopwords.R (stopwords): Look up .dat files at every
557            call. Allows users to modify stopword .dat files interactively.
558    
559    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
560    
561            * R/termdocmatrix.R (termFreq): Correct processing of empty
562            documents.
563    
564    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
565    
566            * man/: Updated documentation.
567    
568    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
569    
570            * R/complete.R (completeStems): Completes (heuristically) word
571            stems.
572    
573            * R/termdocmatrix.R (TermDocMatrix2): New modular
574            constructor.
575    
576            * NAMESPACE: Exported termFreq.
577    
578    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
579    
580            * R/reader.R (readDOC): Added MS Word reader (using antiword).
581    
582    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
583    
584            * R/weight.R: Weighting functions for TermDocMatrix.
585    
586    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
587    
588            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
589            functions for accessing dimension, column, and row names.
590    
591            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
592    
593    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
594    
595            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
596    
597    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
598    
599            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
600    
601    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
602    
603            * R/reader.R (readPDF): Removed manual checks for pdftotext and
604            pdfinfo. The system call gives a warning anyway.
605    
606    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
607    
608            * R/textdoccol.R (asPlain): Conversion from
609            StructuredTextDocuments to PlainTextDocuments.
610    
611    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
612    
613            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
614            for accessing term-document matrices.
615    
616            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
617            are installed.
618    
619    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
620    
621            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
622            Christian Buchta.
623    
624    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
625    
626            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
627    
628    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
629    
630            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
631    
632            * R/reader.R (readPDF): Added PDF reader.
633    
634    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
635    
636            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
637    
638            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
639    
640            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
641    
642            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
643    
644    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
645    
646            * R/distmeasure.R (dissimilarity): Replaced dists call from
647            package cba by new dist call from package proxy.
648    
649    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
650    
651            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
652    
653    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
654    
655            * R/termdocmatrix.R: require() uses the quietly option to suppress
656            loading messages.
657    
658    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
659    
660            * R/dictionary.R: Added dictionary support.
661    
662    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
663    
664            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
665            documents. This simplifies some functions, e.g., asPlain.
666    
667    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
668    
669            * inst/doc/tm.Rnw: Fixed some typos in vignette.
670    
671    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
672    
673            * R/textdoccol.R (replaceWords): Added method to replace a set of
674            words by a single word. Useful for synonyms.
675    
676    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
677    
678            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
679    
680    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
681    
682            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
683            vectors. Thanks to Ariel Maguyon for his error report.
684            (removeSparseTerms): New function to remove columns from a
685            term-document matrix exceeding a sparse factor.
686    
687    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
688    
689            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
690    
691    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
692    
693            * man/sFilter.Rd: Corrected documentation on statement format (use
694            '==' instead of '=').
695    
696    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
697    
698            * R/aobjects.R (StructuredTextDocument): Inherits from
699            TextDocument.
700    
701    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
702    
703            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
704            on sparse matrices as proposed by Martin Maechler.
705    
706    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
707    
708            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
709            \pkg{filehash} version makes them deprecated.
710    
711    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
712    
713            * R/termdocmatrix.R (textvector): Stemming is now performed before
714            erasing stopwords.
715            (weightMatrix): Adapted to handle sparse matrices.
716            (TermDocMatrix): Sparse matrix is now efficiently built by
717            direct stepwise insertion of row values into it.
718    
719    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
720    
721            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
722            due to ongoing problems. For our purposes the latter is as useful
723            as the replaced package.
724    
725    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
726    
727            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
728    
729            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
730    
731    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
732    
733            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
734            languages with available stopwords.
735    
736    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
737    
738            * inst/doc/tm.Rnw: Minor corrections in the vignette.
739    
740    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
741    
742            * DESCRIPTION: Update to version 0.2, since a lot of new features
743            have been integrated.
744    
745            * inst/stopwords: Updated existing stopwords and added stopwords
746            for various other languages.
747    
748    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
749    
750            * man/: Updated documentation.
751    
752            * Work/testDb.R: Script to test database stuff.
753    
754            * R/: Fixed various database related bugs. Seems to be rather
755            useable now, i.e., consider as alpha status for now.
756    
757    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
758    
759            * R/: Fixed some bugs related to database support.
760    
761    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * man/: Added a lot of examples to the manuals.
764    
765    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
766    
767            * man/: Updated parts of the documentation.
768    
769            * R/textdoccol.R (asPlain): Added conversion from newsgroup
770            documents to plain text documents.
771    
772    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
773    
774            * R/textdoccol.R: Finished experimental database support. Not yet
775            intensively tested.
776    
777            * R/source.R: Now each source has a default reader.
778    
779            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
780            class anymore.
781    
782            * R/plaintextdoc.R: Custom show method for plain text documents.
783    
784            * R/aobjects.R: Added a class for structured text documents.
785    
786            * R/reader.R: Replaced remaining \code{parser} occurrences with
787            \code{reader}.
788    
789            * R/textdoccol.R (summary): Indent tags.
790    
791            * R/textdoccol.R (removePunctuation): Transform method to remove
792            punctuation marks.
793    
794    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
795    
796            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
797            using prescindMeta().
798    
799    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
800    
801            * R/textdoccol.R: Improved database support.
802    
803    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
804    
805            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
806    
807            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
808            language code.
809    
810            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
811            into parserControl argument.
812    
813            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
814    
815    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
816    
817            * Work/tmDataSetup.R: The datasets acq and crude can now be
818            created on the fly.
819    
820            * R/stopwords.R: Introduced a function returning the stopwords for
821            a given language (English, German and French at the moment)
822    
823            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
824            otherwise falls back to Snowball package.
825    
826    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
827    
828            * man/dissimilarity-methods.Rd: Make clear that any method offered
829            by "dists" from package "cba" can be used.
830    
831    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
832    
833            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
834            to Kurt's latex suggestion. Removed points and underscores in
835            variable names for consistent naming.
836    
837            * DESCRIPTION: Update to version 0.1-2.
838    
839            * man/TextRepository.Rd: Fixed bug in documentation.
840    
841    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
842    
843            * DESCRIPTION: Update to version 0.1-1.
844    
845    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
846    
847            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
848            wordStem.
849    
850    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
851    
852            * R/: Changes due to Kurt's review.
853    
854    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
855    
856            * R/: Implemented improvements based upon comments by David
857            Meyer.
858    
859    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
860    
861            * inst/doc/: Rewrote vignette.
862    
863            * man/: Improved documentation.
864    
865    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
866    
867            * man/: Updated documentation.
868    
869            * DESCRIPTION: Changed package name to "tm". Updated version to
870            0.1 for first CRAN release.
871    
872            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
873            list archive example.
874    
875            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
876            archive example.
877    
878            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
879            from (several mails per box) mbox format to (single mail per file)
880            eml format.
881    
882    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
883    
884            * data/crude.rda: Rebuilt.
885    
886            * data/acq.rda: Rebuilt.
887    
888            * R/reader.R: Factored out reader and parser methods from
889            textdoccol.R.
890    
891            * R/source.R: Factored out Source methods from aobjects.R and
892            textdoccol.R.
893            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
894            feeds.
895    
896            * R/textdoccol.R (DirSource): Added support for recursive
897            traversal of directories.
898    
899    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
900    
901            * R/textdoccol.R ([[): Loads the document corpus automatically
902            into memory upon access.
903            (tm_transform, tm_filter): Removed several checks whether the
904            document is already loaded ([[ ensures this now).
905            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
906            mailing list archive.
907    
908    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
909    
910            * R/aobjects.R (TextDocument): Is now a virtual class.
911            (Source): Is now a virtual class.
912    
913    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
914    
915            * R/textdoccol.R (c): Support for an arbitrary number of document
916            collections.
917    
918    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
919    
920            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
921            append_meta and remove_meta.
922    
923            * R/textdoccol.R: Removed modify_metadata method.
924    
925            * R/textrepo.R: Removed modify_metadata method.
926    
927            * R/textdoccol.R (remove_meta): Supports removal of document
928            collection metadata and document (= in data frame) metadata.
929    
930    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
931    
932            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
933    
934            * data/crude.rda: Rebuilt.
935    
936            * data/acq.rda: Rebuilt.
937    
938            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
939    
940            * R/textdoccol.R ([): Bug fix for subsetting a document
941            collection's data frame.
942    
943    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
944    
945            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
946            to s_filter.
947    
948            * R/textdoccol.R: Local text documents' metadata can now be copied
949            to a document collection's data frame with prescind_meta.
950    
951    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
952    
953            * R/: Text documents' slot metadata is now accessible in s_filter.
954    
955            * R/: Rewrote s_filter function (has still some restrictions).
956    
957    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
958    
959            * R/: Various fixes in handling metadata.
960    
961            * R/: Added update mechanism for text document collections.
962    
963    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
964    
965            * R/: Merging of document collections now creates a binary tree
966            for reconstructing merged document collections.
967    
968            * R/: Redesign of metadata for document collections.
969    
970    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
971    
972            * R/: Messages now use \code{ngettext}.
973    
974    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
975    
976            * R/: Added functions for modifying and removing metadata.
977    
978    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
979    
980            * man/: Updated some documentation.
981    
982            * R/: Corrected some connection issues.
983    
984            * inst/doc: Worked on the vignette.
985    
986    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
987    
988            * inst/: Added texts and started vignette.
989    
990            * R/: Final changes based upon David's comments.
991    
992    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
993    
994            * NAMESPACE: Corrected exports (generic methods need exportMethods
995            directives!).
996    
997    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
998    
999            * R/: Modified the TextDocCol constructur and various parsers. It
1000            is now modular and supports various file formats via plugins (see
1001            the new "Source" class).
1002    
1003    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1004    
1005            * man/: Revised documentation after previous code changes.
1006    
1007    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1008    
1009            * R/: Remaining changes as discussed with David.
1010    
1011    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1012    
1013            * R/: Some changes as suggested by David. The rest will follow
1014            within the next days.
1015    
1016    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1017    
1018            * man/: Finished documentation.
1019    
1020    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1021    
1022            * man/: Wrote some documentation.
1023    
1024    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1025    
1026            * R/: Further syntactic sugar in form of additional assignment and
1027            accessor methods.
1028    
1029    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1030    
1031            * R/: Syntactic sugar in form of "length", "show" and "summary"
1032            operators.
1033    
1034    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1035    
1036            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1037            and dissimilarities.
1038    
1039    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1040    
1041            * R/: Added similarity functions.
1042    
1043            * data/: Added english stopwords.
1044    
1045    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1046    
1047            * data/: Examples compiled for new features
1048    
1049            * R/: Changes due to new structure.
1050    
1051            * NAMESPACE: Corrected namespace to reflect new structure.
1052    
1053            * R/termdocmatrix.R: Adapted for new naming scheme.
1054    
1055    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1056    
1057            * R/textdoccol.R: Adapted code for new class structure. Wrote
1058            several transform and filter functions operating on text document
1059            collections (alias text document databases).
1060    
1061            * R/aobjects.R: Adapted class structure with inheritance,
1062            repositories and additional meta data. Loading files on demand is
1063            now possible.
1064    
1065    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1066    
1067            * R/: Some cosmetic cleanups.
1068    
1069            * inst/: Removed vignette on clustering. That and much more is now
1070            described in the JSS paper on text mining. Based upon that
1071            article an elaborated vignette will be incorporated in the future.
1072    
1073    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1074    
1075            * R/: Updated generic S4 methods to comply with signature changes
1076            in newer versions of R (> 2.3)
1077    
1078    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1079    
1080            * ext/R/importRIS.R: Automatic RIS import is now possible.
1081    
1082    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1083    
1084            * R/textdoccol.R: Added RIS HTML input format.
1085    
1086    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1087    
1088            * R/textdoccol.R: Removed bug that caused invalid text document
1089            collections when handling many input files.
1090    
1091    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1092    
1093            * R/textdoccol.R: Restructured and extended file import
1094            mechanism.
1095    
1096            * inst/doc/clustering.Rnw: Adapted vignette for use with
1097            ReutNews.rda
1098    
1099            * man/ReutNews.Rd: Documentation for ReutNews.rda
1100    
1101            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1102    
1103    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1104    
1105            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1106            clustering facilities of this package.
1107    
1108    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1109    
1110            * R/aobjects.R: Changed package document structure to avoid class
1111            dependency problems.
1112    
1113    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1114    
1115            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1116            data set.
1117    
1118            *  Finished documentation and reordered directory structure. Now "R
1119            CMD check textmin" works without errors.
1120    
1121    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1122    
1123            * src/: Various splits can now be easily created for the
1124            Reuters21578 data set.
1125    
1126    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1127    
1128            *  Updated documentation
1129    
1130    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1131    
1132            *  Wrote R documentation for some classes and methods.
1133    
1134    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1135    
1136            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
1137            files. See the questionnaire data/Umfrage.csv for such an example.
1138            We are now able to import files in Reuters-21578 XML format.
1139    
1140            *  Changed class interfaces in various files. Weighting of the text
1141            matrix is now possible.
1142    
1143    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1144    
1145            * R/textdoccol.R: One can build term-document matrices if
1146            nessecary (with buildTDM(...)) and fill the field tdm from a text
1147            document collection with it.
1148    
1149            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1150    
1151    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1152    
1153            * R/textdoccol.R: We now can read in a whole XML file with several
1154            news items.
1155    
1156  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1157    
1158          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1153

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge