SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 17, Sat Nov 5 14:47:12 2005 UTC pkg/ChangeLog revision 1095, Wed Aug 25 19:05:38 2010 UTC
# Line 1  Line 1 
1    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
4            \code{recursive} now determines whether existing corpus meta data
5            is used.
6    
7    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
8    
9            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
10    
11    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
12    
13            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
14            remove terms not occurring in the corpus anymore.
15    
16    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
17    
18            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
19            and Heaps' law.
20    
21    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
22    
23            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
24            provided by a source.
25    
26    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
27    
28            * R/source.R (.Source): Provide document names.
29    
30    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
31    
32            * R/meta.R (`content_or_meta`): Utility function.
33    
34    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
35    
36            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
37            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
38    
39    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
40    
41            * R/weight.R (weightTfIdf): Added normalization option.
42    
43            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
44            analysis.
45    
46    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
47    
48            * R/score.R (tm_tag_score): Compute a score from the number of
49            tags matching in a document.
50    
51    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
52    
53            * R/complete.R (stemCompletion): New completion heuristics.
54    
55    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
56    
57            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
58    
59    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
60    
61            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
62            setOldClass(c(..., "list")) works.
63    
64    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
65    
66            * R/transform.R (stemDocument.character): In case input is a
67            simple character just delegate to the default Snowball stemmer.
68    
69    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
70    
71            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
72            data.
73    
74    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
75    
76            * R/doc.R (`Content<-`): Be careful with names attribute.
77    
78    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
79    
80            * R/source.R (DirSource): Improved implementation especially when
81            handling many (> 1M) files.
82    
83    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
84    
85            * R/source.R (getElem.URISource): Use encoding argument.
86    
87    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
88    
89            * R/doc.R (setOldClass): Register S3 document classes to be
90            recognized by S4 methods.
91    
92    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
93    
94            * R/matrix.R (termFreq): Add option to remove punctuation
95            characters.
96    
97    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
100            merging multiple term-document matrices.
101    
102    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/corpus.R (setOldClass): Register S3 corpus classes to be
105            recognized by S4 methods.
106    
107            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
108            that CRAN Mac OS X builds do not fail any longer.
109    
110    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
111    
112            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
113            of RWeka:AlphabeticTokenizer() as default.
114    
115    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
116    
117            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
118            caused words at the beginning or the end of a line not to be removed. Do
119            not delete whitespace anymore.
120    
121    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
122    
123            * R/source.R (DirSource): Default to working directory if no path
124            is specified.
125    
126    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
127    
128            * R/source.R (DirSource): Stop on empty directories.
129    
130    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
131    
132            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
133            named documents.
134    
135    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
136    
137            * R/transform.R (removeWords): Improve regular expressions.
138    
139    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
140    
141            * R/meta.R (DublinCore): Allow lower case tags.
142    
143    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
144    
145            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
146            instead of x$children.
147    
148    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
149    
150            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
151    
152    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
153    
154            * R/: Use S3 instead of S4 class system.
155    
156    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
157    
158            * R/reader.R (readMail): Moved to tm.plugin.mail package.
159    
160    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
161    
162            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
163            postings are basically e-mails with some extra headers.
164    
165    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
166    
167            * R/transform.R: Move convertMboxEml, removeCitation,
168            removeMultipart, and removeSignature to the tm.plugin.mail package
169            since they are mainly utility functions (for handling e-mails) and
170            not very framework specific.
171    
172    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
173    
174            * man/: Fix documentation.
175    
176    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
177    
178            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
179            plain text document instead of an XML document for texts of the
180            Reuters-21578 dataset.
181    
182            * R/sparse.R: Removed since the slam package is now available on
183            CRAN.
184    
185            * DESCRIPTION (Depends): Add slam package.
186    
187    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
188    
189            * R/transform.R (stemDoc): Fix character(0) handling.
190    
191    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
192    
193            * R/doc.R (show): Pretty print.
194    
195    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
196    
197            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
198            gracefully.
199    
200    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
201    
202            * R/corpus.R: Make corpus virtual. Implement corpus with standard
203            and permanent storage semantics.
204    
205            * DESCRIPTION: New major release. A *lot* of improvements.
206    
207    2009-05-04   Ingo Feinerer <feinerer@logic.at>
208    
209            * NAMESPACE: Export some simple_triplet_matrix functions.
210    
211    2009-04-28   Ingo Feinerer <feinerer@logic.at>
212    
213            * R/weight.R: Adapt tf-idf to new matrix format.
214    
215    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
216    
217            * R/matrix.R: Create two distinct classes for term-document and
218            document-term matrices.
219    
220    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
221    
222            * R/termdocmatrix.R: No longer use Matrix package. This reduces
223            package start-up time significantly.
224    
225    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
226    
227            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
228    
229    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
230    
231            * R/transform.R (tmReduce): Combine multiple maps into one
232            transformation.
233    
234    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
235    
236            * R/weight.R: Remove weightLogical since it does not return a
237            dgCMatrix.
238    
239            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
240            or TermDocumentMatrix instead.
241    
242    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
243    
244            * inst/doc/extensions.Rnw: Finished vignette.
245    
246    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
247    
248            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
249            DocumentTermMatrix representations.
250    
251    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
252    
253            * R/reader.R (readXML): New reader for arbitrary XML files.
254    
255    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
256    
257            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
258            (XMLSource): New XMLSource class for arbitrary XML files.
259            (Source): New slot Vectorized.
260    
261    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
262    
263            * R/reader.R (readTabular): Experimental reader for tabular data
264            structures which can be customized via user-defined mappings.
265    
266            * R/reader.R: Always use UTC time zone.
267    
268            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
269    
270    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
271    
272            * R/reader.R (readDOC): Options can be passed over to antiword.
273    
274            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
275            pdftotext.
276    
277    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
278    
279            * R/source.R (DirSource): Add pattern and ignore.case arguments
280            which are internally passed over to list.files().
281    
282    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
283    
284            * inst/doc/tm.Rnw: Suppress pointless loading message.
285    
286    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
287    
288            * DESCRIPTION: Speed up package loading (via moving packages not
289            strictly necessary for normal operation to Suggests instead of
290            Depends).
291    
292    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
293    
294            * R/reader.R (readNewsgroup): The date format is now configurable.
295    
296    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
297    
298            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
299    
300    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
301    
302            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
303    
304    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
305    
306            * R/source.R (DataframeSource): New source class for data frames.
307    
308            * R/source.R: Fixed non-standard call evaluation.
309    
310    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
311    
312            * R/source.R (URISource): New source class for a single document.
313    
314    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
315    
316            * R/source.R: Refactoring.
317    
318    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
319    
320            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
321            Rmpi installations more gracefully.
322    
323    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
324    
325            * R/source.R (Source): Add Length slot.
326    
327    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
328    
329            * R/AAA.R: Unify duplicated .onLoad function.
330    
331    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
332    
333            * DESCRIPTION (Suggests): Added Rmpi.
334    
335    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
336    
337            * R/source.R (getElem): Fix 'no visible binding' warning.
338    
339            * man/WeightFunction.Rd: Fix signature.
340    
341    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
342    
343            * R/weight.R: Introduce name abbreviations for weighting functions.
344    
345    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
346    
347            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
348    
349            * R/cluster.R: Provide convenience functions for using a MPI
350            cluster.
351    
352            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
353            available.
354    
355            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
356            available.
357    
358    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
359    
360            * R/textdoccol.R (lapply): Removed debug print out.
361    
362    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
363    
364            * R/reader.R (readRCV1): Improved meta data extraction from
365            Reuters Corpus Volume 1 documents.
366    
367    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
368    
369            * R/transform.R: Ensure that all mappings preserve multiline
370            structures.
371    
372    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
373    
374            * R/filter.R: Every filter has now an attribute indicating whether
375            it sould be applied to document level (doclevel).
376    
377            * R/textdoccol.R (tmFilter): Set searchFullText as new default
378            filter.
379    
380    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
381    
382            * R/transform.R (replacePatterns): Replaced removeWords by
383            replacePatterns. Suggested by Christian Buchta.
384    
385            * R/textdoccol.R (inspect): Improved formatting.
386    
387    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
388    
389            * inst/CITATION: Updated JSS article information.
390    
391            * R/textdoccol.R (setAs): Added coerce method from list to
392            corpus.
393    
394            * R/meta.R (meta): Improved meta data handling.
395    
396    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
397    
398            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
399            Christian Buchta.
400    
401            * inst/CITATION: Added template to include JSS article reference.
402    
403    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
404    
405            * R/textdoccol.R (tmMap): Introduced lazy mapping.
406    
407            * R/source.R: Added VectorSource.
408    
409    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
410    
411            * man/: Language codes should be in ISO 639-1 format.
412    
413            * R/textdoccol.R (asPlain): Preserve local meta data.
414    
415    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
416    
417            * R/textdoccol.R (writeCorpus): Function for writing a corpus
418            containing plain text documents to disk.
419    
420    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
421    
422            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
423            always set correctly.
424    
425            * R/textdoccol.R: Set load = TRUE as default for load on demand
426            since in most cases this is the wanted behaviour.
427    
428    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
429    
430            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
431    
432            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
433    
434    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
435    
436            * R/meta.R (meta): New function for consistent access to meta data
437            of document collections, repositories, and texts.
438    
439    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * R/: Better support for encodings.
442    
443    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
444    
445            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
446            selection when no reader argument is given.
447    
448    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
449    
450            * R/source.R (CSVSource): Now uses read.csv instead of scan
451            internally.
452    
453    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/reader.R (getReaders): Returns available reader functions.
456    
457            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
458            as default.
459    
460    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
461    
462            * R/stopwords.R (stopwords): Shortened code, removed codetools
463            variable warnings.
464    
465            * man/: Documentation for showMeta, added an example for tmMap.
466    
467            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
468            some minor typos fixed.
469    
470    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
471    
472            * R/aobjects.R (showMeta): Added method for pretty printing a
473            text document's meta data.
474    
475    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
476    
477            * R/textdoccol.R (TextDocCol): Better handling of empty
478            arguments.
479    
480            * NAMESPACE: Exported readDOC.
481    
482            * man/completeStems.Rd: Added an example.
483    
484    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
485    
486            * R/stopwords.R (stopwords): Look up .dat files at every
487            call. Allows users to modify stopword .dat files interactively.
488    
489    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
490    
491            * R/termdocmatrix.R (termFreq): Correct processing of empty
492            documents.
493    
494    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * man/: Updated documentation.
497    
498    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
499    
500            * R/complete.R (completeStems): Completes (heuristically) word
501            stems.
502    
503            * R/termdocmatrix.R (TermDocMatrix2): New modular
504            constructor.
505    
506            * NAMESPACE: Exported termFreq.
507    
508    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
509    
510            * R/reader.R (readDOC): Added MS Word reader (using antiword).
511    
512    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
513    
514            * R/weight.R: Weighting functions for TermDocMatrix.
515    
516    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
517    
518            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
519            functions for accessing dimension, column, and row names.
520    
521            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
522    
523    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
524    
525            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
526    
527    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
528    
529            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
530    
531    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
532    
533            * R/reader.R (readPDF): Removed manual checks for pdftotext and
534            pdfinfo. The system call gives a warning anyway.
535    
536    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
537    
538            * R/textdoccol.R (asPlain): Conversion from
539            StructuredTextDocuments to PlainTextDocuments.
540    
541    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
542    
543            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
544            for accessing term-document matrices.
545    
546            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
547            are installed.
548    
549    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
550    
551            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
552            Christian Buchta.
553    
554    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
557    
558    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
559    
560            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
561    
562            * R/reader.R (readPDF): Added PDF reader.
563    
564    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
565    
566            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
567    
568            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
569    
570            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
571    
572            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
573    
574    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
575    
576            * R/distmeasure.R (dissimilarity): Replaced dists call from
577            package cba by new dist call from package proxy.
578    
579    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
580    
581            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
582    
583    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
584    
585            * R/termdocmatrix.R: require() uses the quietly option to suppress
586            loading messages.
587    
588    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
589    
590            * R/dictionary.R: Added dictionary support.
591    
592    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
593    
594            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
595            documents. This simplifies some functions, e.g., asPlain.
596    
597    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
598    
599            * inst/doc/tm.Rnw: Fixed some typos in vignette.
600    
601    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
602    
603            * R/textdoccol.R (replaceWords): Added method to replace a set of
604            words by a single word. Useful for synonyms.
605    
606    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
607    
608            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
609    
610    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
611    
612            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
613            vectors. Thanks to Ariel Maguyon for his error report.
614            (removeSparseTerms): New function to remove columns from a
615            term-document matrix exceeding a sparse factor.
616    
617    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
618    
619            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
620    
621    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
622    
623            * man/sFilter.Rd: Corrected documentation on statement format (use
624            '==' instead of '=').
625    
626    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
627    
628            * R/aobjects.R (StructuredTextDocument): Inherits from
629            TextDocument.
630    
631    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
632    
633            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
634            on sparse matrices as proposed by Martin Maechler.
635    
636    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
637    
638            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
639            \pkg{filehash} version makes them deprecated.
640    
641    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
642    
643            * R/termdocmatrix.R (textvector): Stemming is now performed before
644            erasing stopwords.
645            (weightMatrix): Adapted to handle sparse matrices.
646            (TermDocMatrix): Sparse matrix is now efficiently built by
647            direct stepwise insertion of row values into it.
648    
649    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
650    
651            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
652            due to ongoing problems. For our purposes the latter is as useful
653            as the replaced package.
654    
655    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
656    
657            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
658    
659            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
660    
661    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
662    
663            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
664            languages with available stopwords.
665    
666    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
667    
668            * inst/doc/tm.Rnw: Minor corrections in the vignette.
669    
670    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
671    
672            * DESCRIPTION: Update to version 0.2, since a lot of new features
673            have been integrated.
674    
675            * inst/stopwords: Updated existing stopwords and added stopwords
676            for various other languages.
677    
678    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
679    
680            * man/: Updated documentation.
681    
682            * Work/testDb.R: Script to test database stuff.
683    
684            * R/: Fixed various database related bugs. Seems to be rather
685            useable now, i.e., consider as alpha status for now.
686    
687    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
688    
689            * R/: Fixed some bugs related to database support.
690    
691    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
692    
693            * man/: Added a lot of examples to the manuals.
694    
695    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
696    
697            * man/: Updated parts of the documentation.
698    
699            * R/textdoccol.R (asPlain): Added conversion from newsgroup
700            documents to plain text documents.
701    
702    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
703    
704            * R/textdoccol.R: Finished experimental database support. Not yet
705            intensively tested.
706    
707            * R/source.R: Now each source has a default reader.
708    
709            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
710            class anymore.
711    
712            * R/plaintextdoc.R: Custom show method for plain text documents.
713    
714            * R/aobjects.R: Added a class for structured text documents.
715    
716            * R/reader.R: Replaced remaining \code{parser} occurrences with
717            \code{reader}.
718    
719            * R/textdoccol.R (summary): Indent tags.
720    
721            * R/textdoccol.R (removePunctuation): Transform method to remove
722            punctuation marks.
723    
724    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
725    
726            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
727            using prescindMeta().
728    
729    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
730    
731            * R/textdoccol.R: Improved database support.
732    
733    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
734    
735            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
736    
737            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
738            language code.
739    
740            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
741            into parserControl argument.
742    
743            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
744    
745    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
746    
747            * Work/tmDataSetup.R: The datasets acq and crude can now be
748            created on the fly.
749    
750            * R/stopwords.R: Introduced a function returning the stopwords for
751            a given language (English, German and French at the moment)
752    
753            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
754            otherwise falls back to Snowball package.
755    
756    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
757    
758            * man/dissimilarity-methods.Rd: Make clear that any method offered
759            by "dists" from package "cba" can be used.
760    
761    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
764            to Kurt's latex suggestion. Removed points and underscores in
765            variable names for consistent naming.
766    
767            * DESCRIPTION: Update to version 0.1-2.
768    
769            * man/TextRepository.Rd: Fixed bug in documentation.
770    
771    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
772    
773            * DESCRIPTION: Update to version 0.1-1.
774    
775    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
776    
777            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
778            wordStem.
779    
780    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
781    
782            * R/: Changes due to Kurt's review.
783    
784    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
785    
786            * R/: Implemented improvements based upon comments by David
787            Meyer.
788    
789    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
790    
791            * inst/doc/: Rewrote vignette.
792    
793            * man/: Improved documentation.
794    
795    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
796    
797            * man/: Updated documentation.
798    
799            * DESCRIPTION: Changed package name to "tm". Updated version to
800            0.1 for first CRAN release.
801    
802            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
803            list archive example.
804    
805            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
806            archive example.
807    
808            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
809            from (several mails per box) mbox format to (single mail per file)
810            eml format.
811    
812    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
813    
814            * data/crude.rda: Rebuilt.
815    
816            * data/acq.rda: Rebuilt.
817    
818            * R/reader.R: Factored out reader and parser methods from
819            textdoccol.R.
820    
821            * R/source.R: Factored out Source methods from aobjects.R and
822            textdoccol.R.
823            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
824            feeds.
825    
826            * R/textdoccol.R (DirSource): Added support for recursive
827            traversal of directories.
828    
829    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
830    
831            * R/textdoccol.R ([[): Loads the document corpus automatically
832            into memory upon access.
833            (tm_transform, tm_filter): Removed several checks whether the
834            document is already loaded ([[ ensures this now).
835            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
836            mailing list archive.
837    
838    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
839    
840            * R/aobjects.R (TextDocument): Is now a virtual class.
841            (Source): Is now a virtual class.
842    
843    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
844    
845            * R/textdoccol.R (c): Support for an arbitrary number of document
846            collections.
847    
848    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
849    
850            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
851            append_meta and remove_meta.
852    
853            * R/textdoccol.R: Removed modify_metadata method.
854    
855            * R/textrepo.R: Removed modify_metadata method.
856    
857            * R/textdoccol.R (remove_meta): Supports removal of document
858            collection metadata and document (= in data frame) metadata.
859    
860    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
861    
862            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
863    
864            * data/crude.rda: Rebuilt.
865    
866            * data/acq.rda: Rebuilt.
867    
868            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
869    
870            * R/textdoccol.R ([): Bug fix for subsetting a document
871            collection's data frame.
872    
873    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
874    
875            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
876            to s_filter.
877    
878            * R/textdoccol.R: Local text documents' metadata can now be copied
879            to a document collection's data frame with prescind_meta.
880    
881    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
882    
883            * R/: Text documents' slot metadata is now accessible in s_filter.
884    
885            * R/: Rewrote s_filter function (has still some restrictions).
886    
887    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
888    
889            * R/: Various fixes in handling metadata.
890    
891            * R/: Added update mechanism for text document collections.
892    
893    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
894    
895            * R/: Merging of document collections now creates a binary tree
896            for reconstructing merged document collections.
897    
898            * R/: Redesign of metadata for document collections.
899    
900    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
901    
902            * R/: Messages now use \code{ngettext}.
903    
904    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
905    
906            * R/: Added functions for modifying and removing metadata.
907    
908    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
909    
910            * man/: Updated some documentation.
911    
912            * R/: Corrected some connection issues.
913    
914            * inst/doc: Worked on the vignette.
915    
916    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
917    
918            * inst/: Added texts and started vignette.
919    
920            * R/: Final changes based upon David's comments.
921    
922    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
923    
924            * NAMESPACE: Corrected exports (generic methods need exportMethods
925            directives!).
926    
927    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
928    
929            * R/: Modified the TextDocCol constructur and various parsers. It
930            is now modular and supports various file formats via plugins (see
931            the new "Source" class).
932    
933    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
934    
935            * man/: Revised documentation after previous code changes.
936    
937    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
938    
939            * R/: Remaining changes as discussed with David.
940    
941    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
942    
943            * R/: Some changes as suggested by David. The rest will follow
944            within the next days.
945    
946    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
947    
948            * man/: Finished documentation.
949    
950    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
951    
952            * man/: Wrote some documentation.
953    
954    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
955    
956            * R/: Further syntactic sugar in form of additional assignment and
957            accessor methods.
958    
959    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
960    
961            * R/: Syntactic sugar in form of "length", "show" and "summary"
962            operators.
963    
964    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
965    
966            * R/: Diverse updates. Mainly on default operators ("[" or "c")
967            and dissimilarities.
968    
969    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
970    
971            * R/: Added similarity functions.
972    
973            * data/: Added english stopwords.
974    
975    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
976    
977            * data/: Examples compiled for new features
978    
979            * R/: Changes due to new structure.
980    
981            * NAMESPACE: Corrected namespace to reflect new structure.
982    
983            * R/termdocmatrix.R: Adapted for new naming scheme.
984    
985    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
986    
987            * R/textdoccol.R: Adapted code for new class structure. Wrote
988            several transform and filter functions operating on text document
989            collections (alias text document databases).
990    
991            * R/aobjects.R: Adapted class structure with inheritance,
992            repositories and additional meta data. Loading files on demand is
993            now possible.
994    
995    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
996    
997            * R/: Some cosmetic cleanups.
998    
999            * inst/: Removed vignette on clustering. That and much more is now
1000            described in the JSS paper on text mining. Based upon that
1001            article an elaborated vignette will be incorporated in the future.
1002    
1003    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1004    
1005            * R/: Updated generic S4 methods to comply with signature changes
1006            in newer versions of R (> 2.3)
1007    
1008    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1009    
1010            * ext/R/importRIS.R: Automatic RIS import is now possible.
1011    
1012    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1013    
1014            * R/textdoccol.R: Added RIS HTML input format.
1015    
1016    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1017    
1018            * R/textdoccol.R: Removed bug that caused invalid text document
1019            collections when handling many input files.
1020    
1021    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1022    
1023            * R/textdoccol.R: Restructured and extended file import
1024            mechanism.
1025    
1026            * inst/doc/clustering.Rnw: Adapted vignette for use with
1027            ReutNews.rda
1028    
1029            * man/ReutNews.Rd: Documentation for ReutNews.rda
1030    
1031            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1032    
1033    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1034    
1035            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1036            clustering facilities of this package.
1037    
1038    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1039    
1040            * R/aobjects.R: Changed package document structure to avoid class
1041            dependency problems.
1042    
1043    2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1044    
1045            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1046            data set.
1047    
1048            *  Finished documentation and reordered directory structure. Now "R
1049            CMD check textmin" works without errors.
1050    
1051    2005-12-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1052    
1053            * src/: Various splits can now be easily created for the
1054            Reuters21578 data set.
1055    
1056    2005-12-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1057    
1058            *  Updated documentation
1059    
1060    2005-11-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1061    
1062            *  Wrote R documentation for some classes and methods.
1063    
1064    2005-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1065    
1066            * R/textdoccol.R: Constructor of textdoccol allows import of CSV
1067            files. See the questionnaire data/Umfrage.csv for such an example.
1068            We are now able to import files in Reuters-21578 XML format.
1069    
1070            *  Changed class interfaces in various files. Weighting of the text
1071            matrix is now possible.
1072    
1073    2005-11-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1074    
1075            * R/textdoccol.R: One can build term-document matrices if
1076            nessecary (with buildTDM(...)) and fill the field tdm from a text
1077            document collection with it.
1078    
1079            * R/textmatrix.R: Wrote S4 class for term-document matrices.
1080    
1081    2005-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1082    
1083            * R/textdoccol.R: We now can read in a whole XML file with several
1084            news items.
1085    
1086  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-11-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1087    
1088          * R/textdoccol.R: Set up an S4 class for a collection of text          * R/textdoccol.R: Set up an S4 class for a collection of text

Legend:
Removed from v.17  
changed lines
  Added in v.1095

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge