SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 34, Thu Dec 22 15:18:10 2005 UTC pkg/ChangeLog revision 1169, Sat Jan 14 11:32:38 2012 UTC
# Line 1  Line 1 
1    2012-01-14  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/source.R (XMLSource, getElem.XMLSource): Simplifications as
4            proposed by Milan Bouchet-Valat.
5    
6    2012-01-11  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/matrix.R (termFreq): Fix processing of user provided
9            stopwords. Reported by Bettina GrĂ¼n.
10    
11    2011-12-23  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/matrix.R (termFreq): Fix invalid handling of
14            control$wordLengths[1]. Reported by Steven C. Bagley.
15    
16    2011-12-17  Ingo Feinerer  <feinerer@logic.at>
17    
18            * DESCRIPTION (Version): Prepare for CRAN Christmas release.
19    
20    2011-12-12  Ingo Feinerer  <feinerer@logic.at>
21    
22            * R/utils.R (map_IETF_Snowball): Map empty input to "porter".
23    
24    2011-12-07  Ingo Feinerer  <feinerer@logic.at>
25    
26            * R/transform.R (removePunctuation): Add option to preserve
27            intra-word dashes.
28    
29    2011-12-06  Ingo Feinerer  <feinerer@logic.at>
30    
31            * R/matrix.R (termFreq): Allow reordering of control option
32            processing.
33    
34    2011-11-17  Ingo Feinerer  <feinerer@logic.at>
35    
36            * R/reader.R (readPDF): Use tools:::pdf_info() instead of external
37            pdfinfo tool.
38    
39            * inst/stopwords/SMART.dat: Add SMART information retrieval system
40            stopwords (which are also used by the MC toolkit).
41    
42            * R/matrix (termFreq): Allow local option \code{bounds$local} to
43            restrict how often a term may appear in each document (generalizes
44            \code{minDocFreq}). Similarly the local option \code{wordLenghts}
45            for word length bounds (generalizes \code{minWordLength}).
46    
47            * R/matrix.R (TermDocumentMatrix.VCorpus): New global option
48            \code{bounds$global} for restricting how often a term is allowed
49            to appear in different documents.
50    
51            * R/matrix.R (TermDocumentMatrix.VCorpus): Distinguish between
52            local options delegated internally to termFreq() and global
53            options which are processed by the term-document matrix
54            constructor itself.
55    
56    2011-11-15  Ingo Feinerer  <feinerer@logic.at>
57    
58            * man/getTokenizers.Rd: Document getTokenizers().
59    
60            * man/tokenizer.Rd: Document MC_tokenizer() and scan_tokenizer().
61    
62    2011-11-04  Ingo Feinerer  <feinerer@logic.at>
63    
64            * man/matrix.Rd: Document as.TermDocumentMatrix.term_frequency.
65    
66            * man/combine.Rd: Document c.term_frequency().
67    
68    2011-10-11  Ingo Feinerer  <feinerer@logic.at>
69    
70            * R/meta.R (`meta<-.Corpus`): Assume that the replacement value
71            can be accessed via '[' and not '[['.
72    
73    2011-08-24  Ingo Feinerer  <feinerer@logic.at>
74    
75            * R/stopwords.R (stopwords): Raise an error if no stopwords are
76            available for requested language. Suggested by Derek M Jones.
77    
78    2011-05-27  Ingo Feinerer  <feinerer@logic.at>
79    
80            * R/weight.R (weightSMART): Implement Cosine and pivoted unique
81            normalization.
82    
83    2011-02-17  Ingo Feinerer  <feinerer@logic.at>
84    
85            * R/transform.R (stemDocument.PlainTextDocument): Use language
86            argument.
87    
88    2011-02-04  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/source.R: Store strings and connections instead of unevaluated
91            calls.
92    
93    2010-11-26  Ingo Feinerer  <feinerer@logic.at>
94    
95            * R/corpus.R (Corpus): Allow init and exit hooks for readers.
96    
97    2010-10-22  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/matrix.R (.TermDocumentMatrix): Make Weighting an attribute
100            (instead of a list element).
101    
102    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
105            documents by names (fallback to IDs if names are not set).
106    
107    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
110            \code{recursive} now determines whether existing corpus meta data
111            is used.
112    
113    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
114    
115            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
116    
117    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
118    
119            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
120            remove terms not occurring in the corpus anymore.
121    
122    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
123    
124            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
125            and Heaps' law.
126    
127    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
128    
129            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
130            provided by a source.
131    
132    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
133    
134            * R/source.R (.Source): Provide document names.
135    
136    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
137    
138            * R/meta.R (`content_or_meta`): Utility function.
139    
140    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
141    
142            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
143            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
144    
145    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
146    
147            * R/weight.R (weightTfIdf): Added normalization option.
148    
149            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
150            analysis.
151    
152    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
153    
154            * R/score.R (tm_tag_score): Compute a score from the number of
155            tags matching in a document.
156    
157    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
158    
159            * R/complete.R (stemCompletion): New completion heuristics.
160    
161    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
162    
163            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
164    
165    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
166    
167            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
168            setOldClass(c(..., "list")) works.
169    
170    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/transform.R (stemDocument.character): In case input is a
173            simple character just delegate to the default Snowball stemmer.
174    
175    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
176    
177            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
178            data.
179    
180    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
181    
182            * R/doc.R (`Content<-`): Be careful with names attribute.
183    
184    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
185    
186            * R/source.R (DirSource): Improved implementation especially when
187            handling many (> 1M) files.
188    
189    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
190    
191            * R/source.R (getElem.URISource): Use encoding argument.
192    
193    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
194    
195            * R/doc.R (setOldClass): Register S3 document classes to be
196            recognized by S4 methods.
197    
198    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
199    
200            * R/matrix.R (termFreq): Add option to remove punctuation
201            characters.
202    
203    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
204    
205            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
206            merging multiple term-document matrices.
207    
208    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
209    
210            * R/corpus.R (setOldClass): Register S3 corpus classes to be
211            recognized by S4 methods.
212    
213            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
214            that CRAN Mac OS X builds do not fail any longer.
215    
216    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
217    
218            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
219            of RWeka:AlphabeticTokenizer() as default.
220    
221    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
222    
223            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
224            caused words at the beginning or the end of a line not to be removed. Do
225            not delete whitespace anymore.
226    
227    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
228    
229            * R/source.R (DirSource): Default to working directory if no path
230            is specified.
231    
232    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
233    
234            * R/source.R (DirSource): Stop on empty directories.
235    
236    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
237    
238            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
239            named documents.
240    
241    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
242    
243            * R/transform.R (removeWords): Improve regular expressions.
244    
245    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
246    
247            * R/meta.R (DublinCore): Allow lower case tags.
248    
249    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
250    
251            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
252            instead of x$children.
253    
254    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
255    
256            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
257    
258    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
259    
260            * R/: Use S3 instead of S4 class system.
261    
262    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
263    
264            * R/reader.R (readMail): Moved to tm.plugin.mail package.
265    
266    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
267    
268            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
269            postings are basically e-mails with some extra headers.
270    
271    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
272    
273            * R/transform.R: Move convertMboxEml, removeCitation,
274            removeMultipart, and removeSignature to the tm.plugin.mail package
275            since they are mainly utility functions (for handling e-mails) and
276            not very framework specific.
277    
278    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
279    
280            * man/: Fix documentation.
281    
282    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
283    
284            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
285            plain text document instead of an XML document for texts of the
286            Reuters-21578 dataset.
287    
288            * R/sparse.R: Removed since the slam package is now available on
289            CRAN.
290    
291            * DESCRIPTION (Depends): Add slam package.
292    
293    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
294    
295            * R/transform.R (stemDoc): Fix character(0) handling.
296    
297    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
298    
299            * R/doc.R (show): Pretty print.
300    
301    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
302    
303            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
304            gracefully.
305    
306    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
307    
308            * R/corpus.R: Make corpus virtual. Implement corpus with standard
309            and permanent storage semantics.
310    
311            * DESCRIPTION: New major release. A *lot* of improvements.
312    
313    2009-05-04   Ingo Feinerer <feinerer@logic.at>
314    
315            * NAMESPACE: Export some simple_triplet_matrix functions.
316    
317    2009-04-28   Ingo Feinerer <feinerer@logic.at>
318    
319            * R/weight.R: Adapt tf-idf to new matrix format.
320    
321    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
322    
323            * R/matrix.R: Create two distinct classes for term-document and
324            document-term matrices.
325    
326    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
327    
328            * R/termdocmatrix.R: No longer use Matrix package. This reduces
329            package start-up time significantly.
330    
331    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
332    
333            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
334    
335    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
336    
337            * R/transform.R (tmReduce): Combine multiple maps into one
338            transformation.
339    
340    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
341    
342            * R/weight.R: Remove weightLogical since it does not return a
343            dgCMatrix.
344    
345            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
346            or TermDocumentMatrix instead.
347    
348    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
349    
350            * inst/doc/extensions.Rnw: Finished vignette.
351    
352    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
353    
354            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
355            DocumentTermMatrix representations.
356    
357    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
358    
359            * R/reader.R (readXML): New reader for arbitrary XML files.
360    
361    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
362    
363            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
364            (XMLSource): New XMLSource class for arbitrary XML files.
365            (Source): New slot Vectorized.
366    
367    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
368    
369            * R/reader.R (readTabular): Experimental reader for tabular data
370            structures which can be customized via user-defined mappings.
371    
372            * R/reader.R: Always use UTC time zone.
373    
374            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
375    
376    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
377    
378            * R/reader.R (readDOC): Options can be passed over to antiword.
379    
380            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
381            pdftotext.
382    
383    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
384    
385            * R/source.R (DirSource): Add pattern and ignore.case arguments
386            which are internally passed over to list.files().
387    
388    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
389    
390            * inst/doc/tm.Rnw: Suppress pointless loading message.
391    
392    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
393    
394            * DESCRIPTION: Speed up package loading (via moving packages not
395            strictly necessary for normal operation to Suggests instead of
396            Depends).
397    
398    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
399    
400            * R/reader.R (readNewsgroup): The date format is now configurable.
401    
402    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
403    
404            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
405    
406    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
407    
408            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
409    
410    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
411    
412            * R/source.R (DataframeSource): New source class for data frames.
413    
414            * R/source.R: Fixed non-standard call evaluation.
415    
416    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
417    
418            * R/source.R (URISource): New source class for a single document.
419    
420    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
421    
422            * R/source.R: Refactoring.
423    
424    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
425    
426            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
427            Rmpi installations more gracefully.
428    
429    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
430    
431            * R/source.R (Source): Add Length slot.
432    
433    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
434    
435            * R/AAA.R: Unify duplicated .onLoad function.
436    
437    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
438    
439            * DESCRIPTION (Suggests): Added Rmpi.
440    
441    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
442    
443            * R/source.R (getElem): Fix 'no visible binding' warning.
444    
445            * man/WeightFunction.Rd: Fix signature.
446    
447    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
448    
449            * R/weight.R: Introduce name abbreviations for weighting functions.
450    
451    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
452    
453            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
454    
455            * R/cluster.R: Provide convenience functions for using a MPI
456            cluster.
457    
458            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
459            available.
460    
461            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
462            available.
463    
464    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
465    
466            * R/textdoccol.R (lapply): Removed debug print out.
467    
468    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
469    
470            * R/reader.R (readRCV1): Improved meta data extraction from
471            Reuters Corpus Volume 1 documents.
472    
473    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
474    
475            * R/transform.R: Ensure that all mappings preserve multiline
476            structures.
477    
478    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
479    
480            * R/filter.R: Every filter has now an attribute indicating whether
481            it sould be applied to document level (doclevel).
482    
483            * R/textdoccol.R (tmFilter): Set searchFullText as new default
484            filter.
485    
486    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
487    
488            * R/transform.R (replacePatterns): Replaced removeWords by
489            replacePatterns. Suggested by Christian Buchta.
490    
491            * R/textdoccol.R (inspect): Improved formatting.
492    
493    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
494    
495            * inst/CITATION: Updated JSS article information.
496    
497            * R/textdoccol.R (setAs): Added coerce method from list to
498            corpus.
499    
500            * R/meta.R (meta): Improved meta data handling.
501    
502    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
503    
504            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
505            Christian Buchta.
506    
507            * inst/CITATION: Added template to include JSS article reference.
508    
509    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
510    
511            * R/textdoccol.R (tmMap): Introduced lazy mapping.
512    
513            * R/source.R: Added VectorSource.
514    
515    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
516    
517            * man/: Language codes should be in ISO 639-1 format.
518    
519            * R/textdoccol.R (asPlain): Preserve local meta data.
520    
521    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/textdoccol.R (writeCorpus): Function for writing a corpus
524            containing plain text documents to disk.
525    
526    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
527    
528            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
529            always set correctly.
530    
531            * R/textdoccol.R: Set load = TRUE as default for load on demand
532            since in most cases this is the wanted behaviour.
533    
534    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
535    
536            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
537    
538            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
539    
540    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
541    
542            * R/meta.R (meta): New function for consistent access to meta data
543            of document collections, repositories, and texts.
544    
545    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
546    
547            * R/: Better support for encodings.
548    
549    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
550    
551            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
552            selection when no reader argument is given.
553    
554    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * R/source.R (CSVSource): Now uses read.csv instead of scan
557            internally.
558    
559    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
560    
561            * R/reader.R (getReaders): Returns available reader functions.
562    
563            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
564            as default.
565    
566    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
567    
568            * R/stopwords.R (stopwords): Shortened code, removed codetools
569            variable warnings.
570    
571            * man/: Documentation for showMeta, added an example for tmMap.
572    
573            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
574            some minor typos fixed.
575    
576    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
577    
578            * R/aobjects.R (showMeta): Added method for pretty printing a
579            text document's meta data.
580    
581    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
582    
583            * R/textdoccol.R (TextDocCol): Better handling of empty
584            arguments.
585    
586            * NAMESPACE: Exported readDOC.
587    
588            * man/completeStems.Rd: Added an example.
589    
590    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
591    
592            * R/stopwords.R (stopwords): Look up .dat files at every
593            call. Allows users to modify stopword .dat files interactively.
594    
595    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
596    
597            * R/termdocmatrix.R (termFreq): Correct processing of empty
598            documents.
599    
600    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
601    
602            * man/: Updated documentation.
603    
604    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
605    
606            * R/complete.R (completeStems): Completes (heuristically) word
607            stems.
608    
609            * R/termdocmatrix.R (TermDocMatrix2): New modular
610            constructor.
611    
612            * NAMESPACE: Exported termFreq.
613    
614    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
615    
616            * R/reader.R (readDOC): Added MS Word reader (using antiword).
617    
618    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
619    
620            * R/weight.R: Weighting functions for TermDocMatrix.
621    
622    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
623    
624            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
625            functions for accessing dimension, column, and row names.
626    
627            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
628    
629    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
630    
631            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
632    
633    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
634    
635            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
636    
637    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
638    
639            * R/reader.R (readPDF): Removed manual checks for pdftotext and
640            pdfinfo. The system call gives a warning anyway.
641    
642    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
643    
644            * R/textdoccol.R (asPlain): Conversion from
645            StructuredTextDocuments to PlainTextDocuments.
646    
647    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
648    
649            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
650            for accessing term-document matrices.
651    
652            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
653            are installed.
654    
655    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
658            Christian Buchta.
659    
660    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
661    
662            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
663    
664    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
665    
666            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
667    
668            * R/reader.R (readPDF): Added PDF reader.
669    
670    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
673    
674            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
675    
676            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
677    
678            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
679    
680    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
681    
682            * R/distmeasure.R (dissimilarity): Replaced dists call from
683            package cba by new dist call from package proxy.
684    
685    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
686    
687            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
688    
689    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
690    
691            * R/termdocmatrix.R: require() uses the quietly option to suppress
692            loading messages.
693    
694    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
695    
696            * R/dictionary.R: Added dictionary support.
697    
698    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
699    
700            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
701            documents. This simplifies some functions, e.g., asPlain.
702    
703    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
704    
705            * inst/doc/tm.Rnw: Fixed some typos in vignette.
706    
707    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
708    
709            * R/textdoccol.R (replaceWords): Added method to replace a set of
710            words by a single word. Useful for synonyms.
711    
712    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
713    
714            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
715    
716    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
717    
718            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
719            vectors. Thanks to Ariel Maguyon for his error report.
720            (removeSparseTerms): New function to remove columns from a
721            term-document matrix exceeding a sparse factor.
722    
723    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
724    
725            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
726    
727    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
728    
729            * man/sFilter.Rd: Corrected documentation on statement format (use
730            '==' instead of '=').
731    
732    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
733    
734            * R/aobjects.R (StructuredTextDocument): Inherits from
735            TextDocument.
736    
737    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
738    
739            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
740            on sparse matrices as proposed by Martin Maechler.
741    
742    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
743    
744            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
745            \pkg{filehash} version makes them deprecated.
746    
747    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
748    
749            * R/termdocmatrix.R (textvector): Stemming is now performed before
750            erasing stopwords.
751            (weightMatrix): Adapted to handle sparse matrices.
752            (TermDocMatrix): Sparse matrix is now efficiently built by
753            direct stepwise insertion of row values into it.
754    
755    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
756    
757            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
758            due to ongoing problems. For our purposes the latter is as useful
759            as the replaced package.
760    
761    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
764    
765            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
766    
767    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
768    
769            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
770            languages with available stopwords.
771    
772    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
773    
774            * inst/doc/tm.Rnw: Minor corrections in the vignette.
775    
776    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
777    
778            * DESCRIPTION: Update to version 0.2, since a lot of new features
779            have been integrated.
780    
781            * inst/stopwords: Updated existing stopwords and added stopwords
782            for various other languages.
783    
784    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
785    
786            * man/: Updated documentation.
787    
788            * Work/testDb.R: Script to test database stuff.
789    
790            * R/: Fixed various database related bugs. Seems to be rather
791            useable now, i.e., consider as alpha status for now.
792    
793    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
794    
795            * R/: Fixed some bugs related to database support.
796    
797    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
798    
799            * man/: Added a lot of examples to the manuals.
800    
801    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
802    
803            * man/: Updated parts of the documentation.
804    
805            * R/textdoccol.R (asPlain): Added conversion from newsgroup
806            documents to plain text documents.
807    
808    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
809    
810            * R/textdoccol.R: Finished experimental database support. Not yet
811            intensively tested.
812    
813            * R/source.R: Now each source has a default reader.
814    
815            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
816            class anymore.
817    
818            * R/plaintextdoc.R: Custom show method for plain text documents.
819    
820            * R/aobjects.R: Added a class for structured text documents.
821    
822            * R/reader.R: Replaced remaining \code{parser} occurrences with
823            \code{reader}.
824    
825            * R/textdoccol.R (summary): Indent tags.
826    
827            * R/textdoccol.R (removePunctuation): Transform method to remove
828            punctuation marks.
829    
830    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
831    
832            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
833            using prescindMeta().
834    
835    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
836    
837            * R/textdoccol.R: Improved database support.
838    
839    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
840    
841            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
842    
843            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
844            language code.
845    
846            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
847            into parserControl argument.
848    
849            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
850    
851    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
852    
853            * Work/tmDataSetup.R: The datasets acq and crude can now be
854            created on the fly.
855    
856            * R/stopwords.R: Introduced a function returning the stopwords for
857            a given language (English, German and French at the moment)
858    
859            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
860            otherwise falls back to Snowball package.
861    
862    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
863    
864            * man/dissimilarity-methods.Rd: Make clear that any method offered
865            by "dists" from package "cba" can be used.
866    
867    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
868    
869            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
870            to Kurt's latex suggestion. Removed points and underscores in
871            variable names for consistent naming.
872    
873            * DESCRIPTION: Update to version 0.1-2.
874    
875            * man/TextRepository.Rd: Fixed bug in documentation.
876    
877    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
878    
879            * DESCRIPTION: Update to version 0.1-1.
880    
881    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
882    
883            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
884            wordStem.
885    
886    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
887    
888            * R/: Changes due to Kurt's review.
889    
890    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
891    
892            * R/: Implemented improvements based upon comments by David
893            Meyer.
894    
895    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
896    
897            * inst/doc/: Rewrote vignette.
898    
899            * man/: Improved documentation.
900    
901    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
902    
903            * man/: Updated documentation.
904    
905            * DESCRIPTION: Changed package name to "tm". Updated version to
906            0.1 for first CRAN release.
907    
908            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
909            list archive example.
910    
911            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
912            archive example.
913    
914            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
915            from (several mails per box) mbox format to (single mail per file)
916            eml format.
917    
918    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
919    
920            * data/crude.rda: Rebuilt.
921    
922            * data/acq.rda: Rebuilt.
923    
924            * R/reader.R: Factored out reader and parser methods from
925            textdoccol.R.
926    
927            * R/source.R: Factored out Source methods from aobjects.R and
928            textdoccol.R.
929            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
930            feeds.
931    
932            * R/textdoccol.R (DirSource): Added support for recursive
933            traversal of directories.
934    
935    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
936    
937            * R/textdoccol.R ([[): Loads the document corpus automatically
938            into memory upon access.
939            (tm_transform, tm_filter): Removed several checks whether the
940            document is already loaded ([[ ensures this now).
941            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
942            mailing list archive.
943    
944    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
945    
946            * R/aobjects.R (TextDocument): Is now a virtual class.
947            (Source): Is now a virtual class.
948    
949    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
950    
951            * R/textdoccol.R (c): Support for an arbitrary number of document
952            collections.
953    
954    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
955    
956            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
957            append_meta and remove_meta.
958    
959            * R/textdoccol.R: Removed modify_metadata method.
960    
961            * R/textrepo.R: Removed modify_metadata method.
962    
963            * R/textdoccol.R (remove_meta): Supports removal of document
964            collection metadata and document (= in data frame) metadata.
965    
966    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
967    
968            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
969    
970            * data/crude.rda: Rebuilt.
971    
972            * data/acq.rda: Rebuilt.
973    
974            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
975    
976            * R/textdoccol.R ([): Bug fix for subsetting a document
977            collection's data frame.
978    
979    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
980    
981            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
982            to s_filter.
983    
984            * R/textdoccol.R: Local text documents' metadata can now be copied
985            to a document collection's data frame with prescind_meta.
986    
987    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
988    
989            * R/: Text documents' slot metadata is now accessible in s_filter.
990    
991            * R/: Rewrote s_filter function (has still some restrictions).
992    
993    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
994    
995            * R/: Various fixes in handling metadata.
996    
997            * R/: Added update mechanism for text document collections.
998    
999    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1000    
1001            * R/: Merging of document collections now creates a binary tree
1002            for reconstructing merged document collections.
1003    
1004            * R/: Redesign of metadata for document collections.
1005    
1006    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1007    
1008            * R/: Messages now use \code{ngettext}.
1009    
1010    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1011    
1012            * R/: Added functions for modifying and removing metadata.
1013    
1014    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1015    
1016            * man/: Updated some documentation.
1017    
1018            * R/: Corrected some connection issues.
1019    
1020            * inst/doc: Worked on the vignette.
1021    
1022    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1023    
1024            * inst/: Added texts and started vignette.
1025    
1026            * R/: Final changes based upon David's comments.
1027    
1028    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1029    
1030            * NAMESPACE: Corrected exports (generic methods need exportMethods
1031            directives!).
1032    
1033    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1034    
1035            * R/: Modified the TextDocCol constructur and various parsers. It
1036            is now modular and supports various file formats via plugins (see
1037            the new "Source" class).
1038    
1039    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1040    
1041            * man/: Revised documentation after previous code changes.
1042    
1043    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1044    
1045            * R/: Remaining changes as discussed with David.
1046    
1047    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1048    
1049            * R/: Some changes as suggested by David. The rest will follow
1050            within the next days.
1051    
1052    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1053    
1054            * man/: Finished documentation.
1055    
1056    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1057    
1058            * man/: Wrote some documentation.
1059    
1060    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1061    
1062            * R/: Further syntactic sugar in form of additional assignment and
1063            accessor methods.
1064    
1065    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1066    
1067            * R/: Syntactic sugar in form of "length", "show" and "summary"
1068            operators.
1069    
1070    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1071    
1072            * R/: Diverse updates. Mainly on default operators ("[" or "c")
1073            and dissimilarities.
1074    
1075    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1076    
1077            * R/: Added similarity functions.
1078    
1079            * data/: Added english stopwords.
1080    
1081    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1082    
1083            * data/: Examples compiled for new features
1084    
1085            * R/: Changes due to new structure.
1086    
1087            * NAMESPACE: Corrected namespace to reflect new structure.
1088    
1089            * R/termdocmatrix.R: Adapted for new naming scheme.
1090    
1091    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1092    
1093            * R/textdoccol.R: Adapted code for new class structure. Wrote
1094            several transform and filter functions operating on text document
1095            collections (alias text document databases).
1096    
1097            * R/aobjects.R: Adapted class structure with inheritance,
1098            repositories and additional meta data. Loading files on demand is
1099            now possible.
1100    
1101    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1102    
1103            * R/: Some cosmetic cleanups.
1104    
1105            * inst/: Removed vignette on clustering. That and much more is now
1106            described in the JSS paper on text mining. Based upon that
1107            article an elaborated vignette will be incorporated in the future.
1108    
1109    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1110    
1111            * R/: Updated generic S4 methods to comply with signature changes
1112            in newer versions of R (> 2.3)
1113    
1114    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1115    
1116            * ext/R/importRIS.R: Automatic RIS import is now possible.
1117    
1118    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1119    
1120            * R/textdoccol.R: Added RIS HTML input format.
1121    
1122    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1123    
1124            * R/textdoccol.R: Removed bug that caused invalid text document
1125            collections when handling many input files.
1126    
1127    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1128    
1129            * R/textdoccol.R: Restructured and extended file import
1130            mechanism.
1131    
1132            * inst/doc/clustering.Rnw: Adapted vignette for use with
1133            ReutNews.rda
1134    
1135            * man/ReutNews.Rd: Documentation for ReutNews.rda
1136    
1137            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1138    
1139  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1140    
1141          * inst/doc/clustering.Rnw: Wrote a small vignette to present the          * inst/doc/clustering.Rnw: Wrote a small vignette to present the

Legend:
Removed from v.34  
changed lines
  Added in v.1169

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge