SCM

SCM Repository

[tm] Diff of /pkg/ChangeLog
ViewVC logotype

Diff of /pkg/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

trunk/R/trunk/ChangeLog revision 28, Tue Dec 6 13:46:33 2005 UTC pkg/ChangeLog revision 1102, Sat Oct 16 10:01:09 2010 UTC
# Line 1  Line 1 
1    2010-10-16  Ingo Feinerer  <feinerer@logic.at>
2    
3            * R/corpus.R (`[[.VCorpus`, `[[.PCorpus'): Access individual
4            documents by document ID.
5    
6    2010-08-25  Ingo Feinerer  <feinerer@logic.at>
7    
8            * R/corpus.R (c.Corpus): When concatenating corpora, the argument
9            \code{recursive} now determines whether existing corpus meta data
10            is used.
11    
12    2010-08-06  Ingo Feinerer  <feinerer@logic.at>
13    
14            * R/transform.R: Removed convert_UTF_8(). Use enc2utf8() instead.
15    
16    2010-06-17  Ingo Feinerer  <feinerer@logic.at>
17    
18            * R/matrix.R (TermDocumentMatrix): If a dictionary is given do not
19            remove terms not occurring in the corpus anymore.
20    
21    2010-06-02  Ingo Feinerer  <feinerer@logic.at>
22    
23            * R/plot.R (Zipf_plot, Heaps_plot): Plotting functions for Zipf's
24            and Heaps' law.
25    
26    2010-05-18  Ingo Feinerer  <feinerer@logic.at>
27    
28            * R/corpus.R (Corpus, PCorpus): Use element names as IDs if
29            provided by a source.
30    
31    2010-04-09  Ingo Feinerer  <feinerer@logic.at>
32    
33            * R/source.R (.Source): Provide document names.
34    
35    2010-04-07  Ingo Feinerer  <feinerer@logic.at>
36    
37            * R/meta.R (`content_or_meta`): Utility function.
38    
39    2010-03-19  Ingo Feinerer  <feinerer@logic.at>
40    
41            * R/reader.R (readReut21578XML, readReut21578XMLasPlain): Extract
42            TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags.
43    
44    2010-03-03  Ingo Feinerer  <feinerer@logic.at>
45    
46            * R/weight.R (weightTfIdf): Added normalization option.
47    
48            * man/tm_tag_score.Rd: Add General Inquirer example for sentiment
49            analysis.
50    
51    2010-02-25  Ingo Feinerer  <feinerer@logic.at>
52    
53            * R/score.R (tm_tag_score): Compute a score from the number of
54            tags matching in a document.
55    
56    2010-02-18  Ingo Feinerer  <feinerer@logic.at>
57    
58            * R/complete.R (stemCompletion): New completion heuristics.
59    
60    2010-02-17  Ingo Feinerer  <feinerer@logic.at>
61    
62            * R/plot.R (plot.TermDocumentMatrix): Memory improvements.
63    
64    2010-02-06  Ingo Feinerer  <feinerer@logic.at>
65    
66            * DESCRIPTION (Depends): Depend on R (>= 2.10.0) to ensure that
67            setOldClass(c(..., "list")) works.
68    
69    2010-01-22  Ingo Feinerer  <feinerer@logic.at>
70    
71            * R/transform.R (stemDocument.character): In case input is a
72            simple character just delegate to the default Snowball stemmer.
73    
74    2010-01-15  Ingo Feinerer  <feinerer@logic.at>
75    
76            * R/reader.R (readReut21578XML, readRCV1): Extract more meta
77            data.
78    
79    2010-01-12  Ingo Feinerer  <feinerer@logic.at>
80    
81            * R/doc.R (`Content<-`): Be careful with names attribute.
82    
83    2010-01-07  Stefan Theussl  <stefan.theussl@wu.ac.at>
84    
85            * R/source.R (DirSource): Improved implementation especially when
86            handling many (> 1M) files.
87    
88    2009-12-22  Ingo Feinerer  <feinerer@logic.at>
89    
90            * R/source.R (getElem.URISource): Use encoding argument.
91    
92    2009-12-11  Ingo Feinerer  <feinerer@logic.at>
93    
94            * R/doc.R (setOldClass): Register S3 document classes to be
95            recognized by S4 methods.
96    
97    2009-11-25  Ingo Feinerer  <feinerer@logic.at>
98    
99            * R/matrix.R (termFreq): Add option to remove punctuation
100            characters.
101    
102    2009-11-19  Ingo Feinerer  <feinerer@logic.at>
103    
104            * R/matrix.R (c.TermDocumentMatrix): Added combine method for
105            merging multiple term-document matrices.
106    
107    2009-11-17  Ingo Feinerer  <feinerer@logic.at>
108    
109            * R/corpus.R (setOldClass): Register S3 corpus classes to be
110            recognized by S4 methods.
111    
112            * man/plot.Rd: Use \dontrun{} in \examples{} section in the hope
113            that CRAN Mac OS X builds do not fail any longer.
114    
115    2009-11-15  Ingo Feinerer  <feinerer@logic.at>
116    
117            * R/matrix.R (tokenize): Use scan(..., what = "character") instead
118            of RWeka:AlphabeticTokenizer() as default.
119    
120    2009-11-14  Ingo Feinerer  <feinerer@logic.at>
121    
122            * R/transform.R (removeWords.PlainTextDocument): Fix bug which
123            caused words at the beginning or the end of a line not to be removed. Do
124            not delete whitespace anymore.
125    
126    2009-11-12  Ingo Feinerer  <feinerer@logic.at>
127    
128            * R/source.R (DirSource): Default to working directory if no path
129            is specified.
130    
131    2009-11-11  Ingo Feinerer  <feinerer@logic.at>
132    
133            * R/source.R (DirSource): Stop on empty directories.
134    
135    2009-11-07  Ingo Feinerer  <feinerer@logic.at>
136    
137            * R/matrix.R (TermDocumentMatrix): Avoid prefixes originating from
138            named documents.
139    
140    2009-10-21  Ingo Feinerer  <feinerer@logic.at>
141    
142            * R/transform.R (removeWords): Improve regular expressions.
143    
144    2009-10-19  Ingo Feinerer  <feinerer@logic.at>
145    
146            * R/meta.R (DublinCore): Allow lower case tags.
147    
148    2009-10-09  Ingo Feinerer  <feinerer@logic.at>
149    
150            * R/source.R (GmaneSource, ReutersSource): Use xmlChildren(x)
151            instead of x$children.
152    
153    2009-09-15  Ingo Feinerer  <feinerer@logic.at>
154    
155            * R/preprocess.R (preprocessReut21578XML): Fix generated file names.
156    
157    2009-09-06  Ingo Feinerer  <feinerer@logic.at>
158    
159            * R/: Use S3 instead of S4 class system.
160    
161    2009-08-11  Ingo Feinerer  <feinerer@logic.at>
162    
163            * R/reader.R (readMail): Moved to tm.plugin.mail package.
164    
165    2009-07-04  Ingo Feinerer  <feinerer@logic.at>
166    
167            * R/reader.R (readNewsgroup): Rename to readMail as newsgroup
168            postings are basically e-mails with some extra headers.
169    
170    2009-07-03  Ingo Feinerer  <feinerer@logic.at>
171    
172            * R/transform.R: Move convertMboxEml, removeCitation,
173            removeMultipart, and removeSignature to the tm.plugin.mail package
174            since they are mainly utility functions (for handling e-mails) and
175            not very framework specific.
176    
177    2009-06-28  Ingo Feinerer  <feinerer@logic.at>
178    
179            * man/: Fix documentation.
180    
181    2009-06-26  Ingo Feinerer  <feinerer@logic.at>
182    
183            * R/reader.R (readReut21578XMLasPlain): New reader which returns a
184            plain text document instead of an XML document for texts of the
185            Reuters-21578 dataset.
186    
187            * R/sparse.R: Removed since the slam package is now available on
188            CRAN.
189    
190            * DESCRIPTION (Depends): Add slam package.
191    
192    2009-06-17  Ingo Feinerer  <feinerer@logic.at>
193    
194            * R/transform.R (stemDoc): Fix character(0) handling.
195    
196    2009-06-12  Ingo Feinerer  <feinerer@logic.at>
197    
198            * R/doc.R (show): Pretty print.
199    
200    2009-05-27  Ingo Feinerer  <feinerer@logic.at>
201    
202            * R/matrix.R (print.TermDocumentMatrix): Handle empty matrices
203            gracefully.
204    
205    2009-05-13  Ingo Feinerer  <feinerer@logic.at>
206    
207            * R/corpus.R: Make corpus virtual. Implement corpus with standard
208            and permanent storage semantics.
209    
210            * DESCRIPTION: New major release. A *lot* of improvements.
211    
212    2009-05-04   Ingo Feinerer <feinerer@logic.at>
213    
214            * NAMESPACE: Export some simple_triplet_matrix functions.
215    
216    2009-04-28   Ingo Feinerer <feinerer@logic.at>
217    
218            * R/weight.R: Adapt tf-idf to new matrix format.
219    
220    2009-04-27  Ingo Feinerer  <feinerer@logic.at>
221    
222            * R/matrix.R: Create two distinct classes for term-document and
223            document-term matrices.
224    
225    2009-04-26  Ingo Feinerer  <feinerer@logic.at>
226    
227            * R/termdocmatrix.R: No longer use Matrix package. This reduces
228            package start-up time significantly.
229    
230    2009-04-11  Ingo Feinerer  <feinerer@logic.at>
231    
232            * inst/doc/tm.Rnw: Fix code/documentation mismatch.
233    
234    2009-04-04  Ingo Feinerer  <feinerer@logic.at>
235    
236            * R/transform.R (tmReduce): Combine multiple maps into one
237            transformation.
238    
239    2009-04-03  Ingo Feinerer  <feinerer@logic.at>
240    
241            * R/weight.R: Remove weightLogical since it does not return a
242            dgCMatrix.
243    
244            * R/termdocmatrix.R: Removed TermDocMatrix. Use DocumentTermMatrix
245            or TermDocumentMatrix instead.
246    
247    2009-03-28  Ingo Feinerer  <feinerer@logic.at>
248    
249            * inst/doc/extensions.Rnw: Finished vignette.
250    
251    2009-03-27  Ingo Feinerer  <feinerer@logic.at>
252    
253            * R/termdocmatrix.R: Start to work on new TermDocumentMatrix and
254            DocumentTermMatrix representations.
255    
256    2009-03-23  Ingo Feinerer  <feinerer@logic.at>
257    
258            * R/reader.R (readXML): New reader for arbitrary XML files.
259    
260    2009-03-22  Ingo Feinerer  <feinerer@logic.at>
261    
262            * R/source.R (CSVSource): Defunct (use DataframeSource instead).
263            (XMLSource): New XMLSource class for arbitrary XML files.
264            (Source): New slot Vectorized.
265    
266    2009-03-21  Ingo Feinerer  <feinerer@logic.at>
267    
268            * R/reader.R (readTabular): Experimental reader for tabular data
269            structures which can be customized via user-defined mappings.
270    
271            * R/reader.R: Always use UTC time zone.
272    
273            * R/AAA.R (.onLoad): No longer try to start a MPI cluster.
274    
275    2009-03-20  Ingo Feinerer  <feinerer@logic.at>
276    
277            * R/reader.R (readDOC): Options can be passed over to antiword.
278    
279            * R/reader.R (readPDF): Options can be passed over to pdfinfo and
280            pdftotext.
281    
282    2009-03-10  Ingo Feinerer  <feinerer@logic.at>
283    
284            * R/source.R (DirSource): Add pattern and ignore.case arguments
285            which are internally passed over to list.files().
286    
287    2009-03-02  Ingo Feinerer  <feinerer@logic.at>
288    
289            * inst/doc/tm.Rnw: Suppress pointless loading message.
290    
291    2009-01-29  Ingo Feinerer  <feinerer@logic.at>
292    
293            * DESCRIPTION: Speed up package loading (via moving packages not
294            strictly necessary for normal operation to Suggests instead of
295            Depends).
296    
297    2009-01-08  Ingo Feinerer  <feinerer@logic.at>
298    
299            * R/reader.R (readNewsgroup): The date format is now configurable.
300    
301    2008-12-20  Ingo Feinerer  <feinerer@logic.at>
302    
303            * R/preprocess.R (convertMboxEml): Fix off-by-one error.
304    
305    2008-12-16  Ingo Feinerer  <feinerer@logic.at>
306    
307            * R/termdocmatrix.R (TermDocMatrix): Sort row indices.
308    
309    2008-12-06  Ingo Feinerer  <feinerer@logic.at>
310    
311            * R/source.R (DataframeSource): New source class for data frames.
312    
313            * R/source.R: Fixed non-standard call evaluation.
314    
315    2008-11-29  Ingo Feinerer  <feinerer@logic.at>
316    
317            * R/source.R (URISource): New source class for a single document.
318    
319    2008-11-27  Ingo Feinerer  <feinerer@logic.at>
320    
321            * R/source.R: Refactoring.
322    
323    2008-11-25  Ingo Feinerer  <feinerer@logic.at>
324    
325            * R/AAA.R (.onLoad, .Last): Use tryCatch() to handle misconfigured
326            Rmpi installations more gracefully.
327    
328    2008-11-08  Ingo Feinerer  <feinerer@logic.at>
329    
330            * R/source.R (Source): Add Length slot.
331    
332    2008-11-06  Ingo Feinerer  <feinerer@logic.at>
333    
334            * R/AAA.R: Unify duplicated .onLoad function.
335    
336    2008-11-03  Ingo Feinerer  <feinerer@logic.at>
337    
338            * DESCRIPTION (Suggests): Added Rmpi.
339    
340    2008-11-02  Ingo Feinerer  <feinerer@logic.at>
341    
342            * R/source.R (getElem): Fix 'no visible binding' warning.
343    
344            * man/WeightFunction.Rd: Fix signature.
345    
346    2008-08-03  Ingo Feinerer  <feinerer@logic.at>
347    
348            * R/weight.R: Introduce name abbreviations for weighting functions.
349    
350    2008-07-24  Ingo Feinerer  <feinerer@logic.at>
351    
352            * R/AAA.R (.onLoad, .Last): Start and stop MPI cluster.
353    
354            * R/cluster.R: Provide convenience functions for using a MPI
355            cluster.
356    
357            * R/termdocmatrix.R (TermDocMatrix): Use MPI cluster if
358            available.
359    
360            * R/textdoccol.R (tmIndex, tmFilter, tmMap): Use MPI cluster if
361            available.
362    
363    2008-07-17  Ingo Feinerer  <feinerer@logic.at>
364    
365            * R/textdoccol.R (lapply): Removed debug print out.
366    
367    2008-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
368    
369            * R/reader.R (readRCV1): Improved meta data extraction from
370            Reuters Corpus Volume 1 documents.
371    
372    2008-05-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
373    
374            * R/transform.R: Ensure that all mappings preserve multiline
375            structures.
376    
377    2008-05-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
378    
379            * R/filter.R: Every filter has now an attribute indicating whether
380            it sould be applied to document level (doclevel).
381    
382            * R/textdoccol.R (tmFilter): Set searchFullText as new default
383            filter.
384    
385    2008-04-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
386    
387            * R/transform.R (replacePatterns): Replaced removeWords by
388            replacePatterns. Suggested by Christian Buchta.
389    
390            * R/textdoccol.R (inspect): Improved formatting.
391    
392    2008-04-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
393    
394            * inst/CITATION: Updated JSS article information.
395    
396            * R/textdoccol.R (setAs): Added coerce method from list to
397            corpus.
398    
399            * R/meta.R (meta): Improved meta data handling.
400    
401    2008-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
402    
403            * R/textdoccol.R (materialize, tmMap): Improvements suggested by
404            Christian Buchta.
405    
406            * inst/CITATION: Added template to include JSS article reference.
407    
408    2008-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
409    
410            * R/textdoccol.R (tmMap): Introduced lazy mapping.
411    
412            * R/source.R: Added VectorSource.
413    
414    2008-02-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
415    
416            * man/: Language codes should be in ISO 639-1 format.
417    
418            * R/textdoccol.R (asPlain): Preserve local meta data.
419    
420    2008-01-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
421    
422            * R/textdoccol.R (writeCorpus): Function for writing a corpus
423            containing plain text documents to disk.
424    
425    2008-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
426    
427            * R/termdocmatrix.R (TermDocMatrix): Ensure that dimnames are
428            always set correctly.
429    
430            * R/textdoccol.R: Set load = TRUE as default for load on demand
431            since in most cases this is the wanted behaviour.
432    
433    2008-01-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
434    
435            * R/: Renamed TextDocCol to Corpus, and Corpus to Content.
436    
437            * DESCRIPTION: Updated Version to 0.3 due to core name changes.
438    
439    2008-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
440    
441            * R/meta.R (meta): New function for consistent access to meta data
442            of document collections, repositories, and texts.
443    
444    2008-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
445    
446            * R/: Better support for encodings.
447    
448    2008-01-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
449    
450            * R/textdoccol.R (TextDocCol): Fixed bug regarding default reader
451            selection when no reader argument is given.
452    
453    2008-01-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
454    
455            * R/source.R (CSVSource): Now uses read.csv instead of scan
456            internally.
457    
458    2008-01-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
459    
460            * R/reader.R (getReaders): Returns available reader functions.
461    
462            * R/termdocmatrix.R (TermDocMatrix): Set new modular constructor
463            as default.
464    
465    2007-12-02  Ingo Feinerer  <h0125130@wu-wien.ac.at>
466    
467            * R/stopwords.R (stopwords): Shortened code, removed codetools
468            variable warnings.
469    
470            * man/: Documentation for showMeta, added an example for tmMap.
471    
472            * inst/doc/tm.Rnw: Updated vignette, comments on MS word reader,
473            some minor typos fixed.
474    
475    2007-12-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
476    
477            * R/aobjects.R (showMeta): Added method for pretty printing a
478            text document's meta data.
479    
480    2007-11-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
481    
482            * R/textdoccol.R (TextDocCol): Better handling of empty
483            arguments.
484    
485            * NAMESPACE: Exported readDOC.
486    
487            * man/completeStems.Rd: Added an example.
488    
489    2007-11-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
490    
491            * R/stopwords.R (stopwords): Look up .dat files at every
492            call. Allows users to modify stopword .dat files interactively.
493    
494    2007-11-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
495    
496            * R/termdocmatrix.R (termFreq): Correct processing of empty
497            documents.
498    
499    2007-10-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
500    
501            * man/: Updated documentation.
502    
503    2007-10-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
504    
505            * R/complete.R (completeStems): Completes (heuristically) word
506            stems.
507    
508            * R/termdocmatrix.R (TermDocMatrix2): New modular
509            constructor.
510    
511            * NAMESPACE: Exported termFreq.
512    
513    2007-10-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
514    
515            * R/reader.R (readDOC): Added MS Word reader (using antiword).
516    
517    2007-10-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
518    
519            * R/weight.R: Weighting functions for TermDocMatrix.
520    
521    2007-10-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
522    
523            * R/termdocmatrix.R (dimnames, colnames, rownames): Wrapper
524            functions for accessing dimension, column, and row names.
525    
526            * R/plot.R (plot.TermDocMatrix): Plot correlations between terms.
527    
528    2007-09-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
529    
530            * man/removePunctuation.Rd: Added documentation. Function also exported to NAMESPACE.
531    
532    2007-08-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
533    
534            * R/fungen.R: Use S4 class for function generators instead of S3 attributes.
535    
536    2007-07-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
537    
538            * R/reader.R (readPDF): Removed manual checks for pdftotext and
539            pdfinfo. The system call gives a warning anyway.
540    
541    2007-07-28  Ingo Feinerer  <h0125130@wu-wien.ac.at>
542    
543            * R/textdoccol.R (asPlain): Conversion from
544            StructuredTextDocuments to PlainTextDocuments.
545    
546    2007-07-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
547    
548            * R/termdocmatrix.R: Added convenience methods ("[", nrow, ncol)
549            for accessing term-document matrices.
550    
551            * inst/doc/tm.Rnw: readPDF is only called if pdftotext and pdfinfo
552            are installed.
553    
554    2007-07-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
555    
556            * R/termdocmatrix.R (TermDocMatrix): Improved efficiency. Kudos to
557            Christian Buchta.
558    
559    2007-07-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
560    
561            * inst/doc/tm.Rnw: Update vignette (readPDF, readHTML, preprocessReut21578XML).
562    
563    2007-07-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
564    
565            * R/reader.R (readHTML): Added very simple HTML reader to obtain StructuredTextDocuments.
566    
567            * R/reader.R (readPDF): Added PDF reader.
568    
569    2007-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
570    
571            * DESCRIPTION: Moved proxy from Depends to Imports to avoid name clashes.
572    
573            * inst/stopwords/english.dat: Added the term "yes" to stopwords.
574    
575            * R/termdocmatrix.R (dim): dim function for TermDocMatrix.
576    
577            * R/preprocess.R (convertMboxEml): Accepts gzipped mboxes.
578    
579    2007-07-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
580    
581            * R/distmeasure.R (dissimilarity): Replaced dists call from
582            package cba by new dist call from package proxy.
583    
584    2007-07-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
585    
586            * inst/doc/tm.Rnw: Described removeSparseTerms and Dictionary.
587    
588    2007-06-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
589    
590            * R/termdocmatrix.R: require() uses the quietly option to suppress
591            loading messages.
592    
593    2007-06-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
594    
595            * R/dictionary.R: Added dictionary support.
596    
597    2007-06-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
598    
599            * R/aobjects.R: Added classes for Reuters21578 XML and RCV1
600            documents. This simplifies some functions, e.g., asPlain.
601    
602    2007-06-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
603    
604            * inst/doc/tm.Rnw: Fixed some typos in vignette.
605    
606    2007-06-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
607    
608            * R/textdoccol.R (replaceWords): Added method to replace a set of
609            words by a single word. Useful for synonyms.
610    
611    2007-05-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
612    
613            * man/TermDocMatrix.Rd: Fixed documentation on Data slot.
614    
615    2007-05-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
616    
617            * R/termdocmatrix.R (textvector): Small fix for dealing with empty
618            vectors. Thanks to Ariel Maguyon for his error report.
619            (removeSparseTerms): New function to remove columns from a
620            term-document matrix exceeding a sparse factor.
621    
622    2007-05-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
623    
624            * man/tmUpdate.Rd: Corrected documentation on readerControl parameter.
625    
626    2007-05-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
627    
628            * man/sFilter.Rd: Corrected documentation on statement format (use
629            '==' instead of '=').
630    
631    2007-05-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
632    
633            * R/aobjects.R (StructuredTextDocument): Inherits from
634            TextDocument.
635    
636    2007-05-04  Ingo Feinerer  <h0125130@wu-wien.ac.at>
637    
638            * R/termdocmatrix.R (findFreqTerms): Perform efficient computation
639            on sparse matrices as proposed by Martin Maechler.
640    
641    2007-04-27  Ingo Feinerer  <h0125130@wu-wien.ac.at>
642    
643            * R/textdoccol.R: Removed \code{dbDisconnect} calls since last
644            \pkg{filehash} version makes them deprecated.
645    
646    2007-04-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
647    
648            * R/termdocmatrix.R (textvector): Stemming is now performed before
649            erasing stopwords.
650            (weightMatrix): Adapted to handle sparse matrices.
651            (TermDocMatrix): Sparse matrix is now efficiently built by
652            direct stepwise insertion of row values into it.
653    
654    2007-04-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
655    
656            * DESCRIPTION: Replaced \pkg{filehashSQLite} with \pkg{filehash}
657            due to ongoing problems. For our purposes the latter is as useful
658            as the replaced package.
659    
660    2007-04-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
661    
662            * man/TextDocCol.Rd: Replaced \code{readPlain} with \code{object@DefaultReader}.
663    
664            * man/TermDocMatrix.Rd: Remove deprecated \code{language} argument.
665    
666    2007-04-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
667    
668            * R/resolve.R (resolveISOCode): Added ISO 639-1 codes for
669            languages with available stopwords.
670    
671    2007-04-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
672    
673            * inst/doc/tm.Rnw: Minor corrections in the vignette.
674    
675    2007-04-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
676    
677            * DESCRIPTION: Update to version 0.2, since a lot of new features
678            have been integrated.
679    
680            * inst/stopwords: Updated existing stopwords and added stopwords
681            for various other languages.
682    
683    2007-04-10  Ingo Feinerer  <h0125130@wu-wien.ac.at>
684    
685            * man/: Updated documentation.
686    
687            * Work/testDb.R: Script to test database stuff.
688    
689            * R/: Fixed various database related bugs. Seems to be rather
690            useable now, i.e., consider as alpha status for now.
691    
692    2007-04-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
693    
694            * R/: Fixed some bugs related to database support.
695    
696    2007-04-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
697    
698            * man/: Added a lot of examples to the manuals.
699    
700    2007-04-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
701    
702            * man/: Updated parts of the documentation.
703    
704            * R/textdoccol.R (asPlain): Added conversion from newsgroup
705            documents to plain text documents.
706    
707    2007-04-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
708    
709            * R/textdoccol.R: Finished experimental database support. Not yet
710            intensively tested.
711    
712            * R/source.R: Now each source has a default reader.
713    
714            * R/reader.R: \code{FunctionGenerator} is now an attribute, not a
715            class anymore.
716    
717            * R/plaintextdoc.R: Custom show method for plain text documents.
718    
719            * R/aobjects.R: Added a class for structured text documents.
720    
721            * R/reader.R: Replaced remaining \code{parser} occurrences with
722            \code{reader}.
723    
724            * R/textdoccol.R (summary): Indent tags.
725    
726            * R/textdoccol.R (removePunctuation): Transform method to remove
727            punctuation marks.
728    
729    2007-03-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
730    
731            * R/textdoccol.R (sFilter): Simplified sFilter significantly by
732            using prescindMeta().
733    
734    2007-03-18  Ingo Feinerer  <h0125130@wu-wien.ac.at>
735    
736            * R/textdoccol.R: Improved database support.
737    
738    2007-03-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
739    
740            * R/termdocmatrix.R (TermDocMatrix): Uses sparse matrices.
741    
742            * R/resolve.R (resolveISOcode): Extracts the language from a ISO
743            language code.
744    
745            * R/textdoccol.R (TextDocCol): Refactored several parser arguments
746            into parserControl argument.
747    
748            * R/aobjects.R (TextDocument): Introduced the "Language" slot.
749    
750    2007-03-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
751    
752            * Work/tmDataSetup.R: The datasets acq and crude can now be
753            created on the fly.
754    
755            * R/stopwords.R: Introduced a function returning the stopwords for
756            a given language (English, German and French at the moment)
757    
758            * R/textdoccol.R (stemDoc): Stemming uses Rstem if available,
759            otherwise falls back to Snowball package.
760    
761    2007-01-30  Ingo Feinerer  <h0125130@wu-wien.ac.at>
762    
763            * man/dissimilarity-methods.Rd: Make clear that any method offered
764            by "dists" from package "cba" can be used.
765    
766    2007-01-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
767    
768            * inst/doc/tm.Rnw: Fixed quotes-appearing-as-boxes-bug according
769            to Kurt's latex suggestion. Removed points and underscores in
770            variable names for consistent naming.
771    
772            * DESCRIPTION: Update to version 0.1-2.
773    
774            * man/TextRepository.Rd: Fixed bug in documentation.
775    
776    2007-01-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
777    
778            * DESCRIPTION: Update to version 0.1-1.
779    
780    2007-01-09  Ingo Feinerer  <h0125130@wu-wien.ac.at>
781    
782            * R/textdoccol.R (stemDoc): Use Rstem::wordStem instead of
783            wordStem.
784    
785    2007-01-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
786    
787            * R/: Changes due to Kurt's review.
788    
789    2006-12-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
790    
791            * R/: Implemented improvements based upon comments by David
792            Meyer.
793    
794    2006-12-17  Ingo Feinerer  <h0125130@wu-wien.ac.at>
795    
796            * inst/doc/: Rewrote vignette.
797    
798            * man/: Improved documentation.
799    
800    2006-12-16  Ingo Feinerer  <h0125130@wu-wien.ac.at>
801    
802            * man/: Updated documentation.
803    
804            * DESCRIPTION: Changed package name to "tm". Updated version to
805            0.1 for first CRAN release.
806    
807            * inst/texts/gmane.comp.lang.r.general.mbox: mbox Gmane R mailing
808            list archive example.
809    
810            * inst/texts/gmane.comp.lang.r.gr.rdf: RSS Gmane R mailing list
811            archive example.
812    
813            * R/preprocess.R (convert_mbox_eml): A simple e-mail converter
814            from (several mails per box) mbox format to (single mail per file)
815            eml format.
816    
817    2006-12-08  Ingo Feinerer  <h0125130@wu-wien.ac.at>
818    
819            * data/crude.rda: Rebuilt.
820    
821            * data/acq.rda: Rebuilt.
822    
823            * R/reader.R: Factored out reader and parser methods from
824            textdoccol.R.
825    
826            * R/source.R: Factored out Source methods from aobjects.R and
827            textdoccol.R.
828            (GmaneRSource): Encapsulates Gmane R mailing list archive RSS
829            feeds.
830    
831            * R/textdoccol.R (DirSource): Added support for recursive
832            traversal of directories.
833    
834    2006-12-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
835    
836            * R/textdoccol.R ([[): Loads the document corpus automatically
837            into memory upon access.
838            (tm_transform, tm_filter): Removed several checks whether the
839            document is already loaded ([[ ensures this now).
840            (gmane_r_reader): Reader for RSS feeds as provided by the Gmane R
841            mailing list archive.
842    
843    2006-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
844    
845            * R/aobjects.R (TextDocument): Is now a virtual class.
846            (Source): Is now a virtual class.
847    
848    2006-12-05  Ingo Feinerer  <h0125130@wu-wien.ac.at>
849    
850            * R/textdoccol.R (c): Support for an arbitrary number of document
851            collections.
852    
853    2006-11-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
854    
855            * R/textrepo.R: Updated TextRepository (constructor), append_elem,
856            append_meta and remove_meta.
857    
858            * R/textdoccol.R: Removed modify_metadata method.
859    
860            * R/textrepo.R: Removed modify_metadata method.
861    
862            * R/textdoccol.R (remove_meta): Supports removal of document
863            collection metadata and document (= in data frame) metadata.
864    
865    2006-11-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
866    
867            * R/textdoccol.R (append_doc): Bug fix for handling empty metadata.
868    
869            * data/crude.rda: Rebuilt.
870    
871            * data/acq.rda: Rebuilt.
872    
873            * inst/doc/textmin.Rnw: Updated vignette to reflect code changes.
874    
875            * R/textdoccol.R ([): Bug fix for subsetting a document
876            collection's data frame.
877    
878    2006-11-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
879    
880            * R/textdoccol.R: Bug fixes in s_filter. Added full query support
881            to s_filter.
882    
883            * R/textdoccol.R: Local text documents' metadata can now be copied
884            to a document collection's data frame with prescind_meta.
885    
886    2006-11-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
887    
888            * R/: Text documents' slot metadata is now accessible in s_filter.
889    
890            * R/: Rewrote s_filter function (has still some restrictions).
891    
892    2006-11-20  Ingo Feinerer  <h0125130@wu-wien.ac.at>
893    
894            * R/: Various fixes in handling metadata.
895    
896            * R/: Added update mechanism for text document collections.
897    
898    2006-11-19  Ingo Feinerer  <h0125130@wu-wien.ac.at>
899    
900            * R/: Merging of document collections now creates a binary tree
901            for reconstructing merged document collections.
902    
903            * R/: Redesign of metadata for document collections.
904    
905    2006-11-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
906    
907            * R/: Messages now use \code{ngettext}.
908    
909    2006-11-03  Ingo Feinerer  <h0125130@wu-wien.ac.at>
910    
911            * R/: Added functions for modifying and removing metadata.
912    
913    2006-11-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
914    
915            * man/: Updated some documentation.
916    
917            * R/: Corrected some connection issues.
918    
919            * inst/doc: Worked on the vignette.
920    
921    2006-10-31  Ingo Feinerer  <h0125130@wu-wien.ac.at>
922    
923            * inst/: Added texts and started vignette.
924    
925            * R/: Final changes based upon David's comments.
926    
927    2006-10-29  Ingo Feinerer  <h0125130@wu-wien.ac.at>
928    
929            * NAMESPACE: Corrected exports (generic methods need exportMethods
930            directives!).
931    
932    2006-10-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
933    
934            * R/: Modified the TextDocCol constructur and various parsers. It
935            is now modular and supports various file formats via plugins (see
936            the new "Source" class).
937    
938    2006-10-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
939    
940            * man/: Revised documentation after previous code changes.
941    
942    2006-10-23  Ingo Feinerer  <h0125130@wu-wien.ac.at>
943    
944            * R/: Remaining changes as discussed with David.
945    
946    2006-10-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
947    
948            * R/: Some changes as suggested by David. The rest will follow
949            within the next days.
950    
951    2006-09-26  Ingo Feinerer  <h0125130@wu-wien.ac.at>
952    
953            * man/: Finished documentation.
954    
955    2006-09-25  Ingo Feinerer  <h0125130@wu-wien.ac.at>
956    
957            * man/: Wrote some documentation.
958    
959    2006-09-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
960    
961            * R/: Further syntactic sugar in form of additional assignment and
962            accessor methods.
963    
964    2006-09-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
965    
966            * R/: Syntactic sugar in form of "length", "show" and "summary"
967            operators.
968    
969    2006-08-24  Ingo Feinerer  <h0125130@wu-wien.ac.at>
970    
971            * R/: Diverse updates. Mainly on default operators ("[" or "c")
972            and dissimilarities.
973    
974    2006-08-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
975    
976            * R/: Added similarity functions.
977    
978            * data/: Added english stopwords.
979    
980    2006-08-07  Ingo Feinerer  <h0125130@wu-wien.ac.at>
981    
982            * data/: Examples compiled for new features
983    
984            * R/: Changes due to new structure.
985    
986            * NAMESPACE: Corrected namespace to reflect new structure.
987    
988            * R/termdocmatrix.R: Adapted for new naming scheme.
989    
990    2006-08-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
991    
992            * R/textdoccol.R: Adapted code for new class structure. Wrote
993            several transform and filter functions operating on text document
994            collections (alias text document databases).
995    
996            * R/aobjects.R: Adapted class structure with inheritance,
997            repositories and additional meta data. Loading files on demand is
998            now possible.
999    
1000    2006-07-13  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1001    
1002            * R/: Some cosmetic cleanups.
1003    
1004            * inst/: Removed vignette on clustering. That and much more is now
1005            described in the JSS paper on text mining. Based upon that
1006            article an elaborated vignette will be incorporated in the future.
1007    
1008    2006-07-01  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1009    
1010            * R/: Updated generic S4 methods to comply with signature changes
1011            in newer versions of R (> 2.3)
1012    
1013    2006-03-12  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1014    
1015            * ext/R/importRIS.R: Automatic RIS import is now possible.
1016    
1017    2006-02-14  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1018    
1019            * R/textdoccol.R: Added RIS HTML input format.
1020    
1021    2006-01-21  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1022    
1023            * R/textdoccol.R: Removed bug that caused invalid text document
1024            collections when handling many input files.
1025    
1026    2006-01-11  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1027    
1028            * R/textdoccol.R: Restructured and extended file import
1029            mechanism.
1030    
1031            * inst/doc/clustering.Rnw: Adapted vignette for use with
1032            ReutNews.rda
1033    
1034            * man/ReutNews.Rd: Documentation for ReutNews.rda
1035    
1036            * data/ReutNews.rda: A tiny Reuters21578 example data set.
1037    
1038    2005-12-22  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1039    
1040            * inst/doc/clustering.Rnw: Wrote a small vignette to present the
1041            clustering facilities of this package.
1042    
1043    2005-12-15  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1044    
1045            * R/aobjects.R: Changed package document structure to avoid class
1046            dependency problems.
1047    
1048  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>  2005-12-06  Ingo Feinerer  <h0125130@wu-wien.ac.at>
1049    
1050            *  Wrote a script for the ModLewis Split for the Reuters-21578 XML
1051            data set.
1052    
1053          * Finished documentation and reordered directory structure. Now "R          * Finished documentation and reordered directory structure. Now "R
1054          CMD check textmin" works without errors.          CMD check textmin" works without errors.
1055    

Legend:
Removed from v.28  
changed lines
  Added in v.1102

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge