SCM

SCM Repository

[tm] Annotation of /pkg/inst/NEWS.Rd
ViewVC logotype

Annotation of /pkg/inst/NEWS.Rd

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1558 - (view) (download) (as text)

1 : feinerer 1170 \name{NEWS}
2 :     \title{News for Package 'tm'}
3 : khornik 1396 \encoding{UTF-8}
4 : feinerer 1557 \section{Changes in tm version 0.7-8}{
5 :     \subsection{BUG FIXES}{
6 :     \itemize{
7 : feinerer 1558 \item Fix invalid counting in \code{prevalent} \code{stemCompletion()}.
8 :     Reported by Bernard Chang.
9 : feinerer 1557 \item \code{tm_index()} now interprets all non-\code{TRUE} logical values
10 :     returned by the filter function as \code{FALSE}. This fixes corner cases
11 :     where filter functions return \code{logical(0)} or \code{NA}. Reported
12 :     by Tom Nicholls.
13 :     }
14 :     }
15 :     }
16 : feinerer 1541 \section{Changes in tm version 0.7-6}{
17 : khornik 1546 \subsection{NEW FEATURES}{
18 :     \itemize{
19 :     \item \code{TermDocumentMatrix.SimpleCorpus()} now also honors a
20 :     logical \code{removePunctuation} control option (default: false).
21 :     }
22 :     }
23 : feinerer 1541 \subsection{BUG FIXES}{
24 :     \itemize{
25 :     \item Sync encoding fixes in \code{TermDocumentMatrix.SimpleCorpus()} with
26 :     \code{Boost_tokenizer()}.
27 :     }
28 :     }
29 :     }
30 : feinerer 1535 \section{Changes in tm version 0.7-5}{
31 :     \subsection{BUG FIXES}{
32 :     \itemize{
33 :     \item Handle \code{NA}s consistently in tokenizers.
34 :     }
35 :     }
36 :     }
37 : feinerer 1529 \section{Changes in tm version 0.7-4}{
38 :     \subsection{BUG FIXES}{
39 :     \itemize{
40 : feinerer 1531 \item Keep document names in \code{tm_map.SimpleCorpus()}.
41 : feinerer 1529 \item Fix encoding problems in \code{scan_tokenizer()} and
42 :     \code{Boost_tokenizer()}.
43 :     }
44 :     }
45 :     }
46 : feinerer 1520 \section{Changes in tm version 0.7-3}{
47 :     \subsection{BUG FIXES}{
48 :     \itemize{
49 :     \item \code{scan_tokenizer()} now works with character vectors and
50 :     character strings.
51 :     \item \code{removePunctuation()} now works again in \code{latin1} locales.
52 :     \item Handle empty term-document matrices gracefully.
53 :     }
54 :     }
55 :     }
56 : feinerer 1474 \section{Changes in tm version 0.7-2}{
57 : feinerer 1481 \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
58 :     \itemize{
59 :     \item \code{DataframeSource} now only processes data frames with the two
60 :     mandatory columns \code{"doc_id"} and \code{"text"}. Additional columns
61 :     are used as document level metadata. This implements compatibility with
62 :     \emph{Text Interchange Formats} corpora
63 :     (\url{https://github.com/ropensci/tif}).
64 :     \item \code{readTabular()} has been removed. Use \code{DataframeSource}
65 :     instead.
66 : feinerer 1495 \item \code{removeNumbers()} and \code{removePunctuation()} now have an
67 :     argument \code{ucp} to check for Unicode general categories \code{Nd}
68 :     (decimal digits) and \code{P} (punctuation), respectively. Contributed
69 :     by Kurt Hornik.
70 : feinerer 1503 \item The package \pkg{xml2} is now imported for \acronym{XML}
71 :     functionality instead of the (\acronym{CRAN} maintainer orphaned)
72 :     package \pkg{XML}.
73 : feinerer 1481 }
74 :     }
75 : feinerer 1488 \subsection{NEW FEATURES}{
76 :     \itemize{
77 :     \item \code{Boost_tokenizer} provides a tokenizer based on the Boost
78 :     (\url{http://www.boost.org}) Tokenizer.
79 :     }
80 :     }
81 : feinerer 1474 \subsection{BUG FIXES}{
82 :     \itemize{
83 :     \item Correctly handle the \code{dictionary} argument when constructing a
84 : feinerer 1478 term-document matrix from a \code{SimpleCorpus} (reported by Joe
85 :     Corrigan) or from a \code{VCorpus} (reported by Mark Rosenstein).
86 : feinerer 1474 }
87 :     }
88 :     }
89 : khornik 1472 \section{Changes in tm version 0.7-1}{
90 :     \subsection{BUG FIXES}{
91 :     \itemize{
92 :     \item Compilation fixes for Clang's libc++.
93 :     }
94 :     }
95 :     }
96 : feinerer 1437 \section{Changes in tm version 0.7}{
97 : feinerer 1436 \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
98 :     \itemize{
99 :     \item \code{inspect.TermDocumentMatrix()} now displays a sample instead
100 :     of the full matrix. The full dense representation is available via
101 :     \code{as.matrix()}.
102 :     }
103 :     }
104 :     \subsection{NEW FEATURES}{
105 :     \itemize{
106 : feinerer 1437 \item \code{SimpleCorpus} provides a corpus which is optimized for the
107 :     most common usage scenario: importing plain texts from files in a
108 : feinerer 1440 directory or directly from a vector in \R, preprocessing and transforming
109 : feinerer 1437 the texts, and finally exporting them to a term-document matrix. The aim
110 :     is to boost performance and minimize memory pressure. It loads all
111 :     documents into memory, and is designed for medium-sized to large data
112 :     sets.
113 : feinerer 1436 \item \code{inspect()} on text documents as a shorthand for
114 :     \code{writeLines(as.character())}.
115 : feinerer 1450 \item \code{findMostFreqTerms()} finds most frequent terms in a
116 :     document-term or term-document matrix, or a vector of term frequencies.
117 : feinerer 1466 \item \code{tm_parLapply()} is now internally used for the parallelization
118 :     of transformations, filters, and term-document matrix construction. The
119 :     preferred parallelization engine can be registered via
120 :     \code{tm_parLapply_engine()}. The default is to use no parallelization
121 :     (instead of \code{\link[parallel]{mclapply}} (package \pkg{parallel}) in
122 :     previous versions).
123 : feinerer 1436 }
124 :     }
125 :     }
126 : feinerer 1432 \section{Changes in tm version 0.6-2}{
127 :     \subsection{BUG FIXES}{
128 :     \itemize{
129 :     \item \code{format.PlainTextDocument()} now reports only one character
130 :     count for a whole document.
131 :     }
132 :     }
133 :     }
134 : feinerer 1397 \section{Changes in tm version 0.6-1}{
135 : feinerer 1432 \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
136 :     \itemize{
137 :     \item \code{format.PlainTextDocument()} now displays a compact
138 :     representation instead of the content. Use \code{as.character()} to
139 :     obtain the character content (which in turn can be applied to a corpus
140 :     via \code{lapply()}).
141 :     }
142 :     }
143 : feinerer 1397 \subsection{NEW FEATURES}{
144 :     \itemize{
145 : feinerer 1425 \item \code{ZipSource()} for processing ZIP files.
146 : feinerer 1397 \item Sources now provide \code{open()} and \code{close()}.
147 : feinerer 1425 \item \code{termFreq()} now accepts \code{Span_Tokenizer} and
148 :     \code{Token_Tokenizer} (both from package \pkg{NLP}) objects as
149 :     tokenizers.
150 :     \item \code{readTagged()}, a reader for text documents containing
151 :     POS-tagged words.
152 : feinerer 1397 }
153 :     }
154 : feinerer 1413 \subsection{BUG FIXES}{
155 :     \itemize{
156 :     \item The function \code{removeWords()} now correctly processes words
157 :     being truncations of others. Reported by Александр Труфанов.
158 :     }
159 :     }
160 : feinerer 1397 }
161 : feinerer 1345 \section{Changes in tm version 0.6}{
162 :     \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
163 :     \itemize{
164 : feinerer 1368 \item \code{DirSource()} and \code{URISource()} now use the argument
165 :     \code{encoding} for conversion via \code{iconv()} to \code{"UTF-8"}.
166 :     \item \code{termFreq()} now uses \code{words()} as the default tokenizer.
167 : feinerer 1345 \item Text documents now provide the functions \code{content()} and
168 :     \code{as.character()} to access the (possibly raw) document content and
169 :     the natural language text in a suitable (not necessarily structured)
170 :     form.
171 : feinerer 1372 \item The internal representation of corpora, sources, and text documents
172 :     changed. Saved objects created with older \pkg{tm} versions are
173 :     incompatible and need to be rebuilt.
174 : feinerer 1345 }
175 :     }
176 :     \subsection{NEW FEATURES}{
177 :     \itemize{
178 :     \item \code{DirSource()} and \code{URISource()} now have a \code{mode}
179 :     argument specifying how elements should be read (no read, binary, text).
180 :     \item Improved high-level documentation on corpora (\code{?Corpus}), text
181 :     documents (\code{?TextDocument}), sources (\code{?Source}), and readers
182 :     (\code{?Reader}).
183 : feinerer 1368 \item Integration with package \pkg{NLP}.
184 :     \item Romanian stopwords. Suggested by Cristian Chirita.
185 :     \item \code{words.PlainTextDocument()} delivers word tokens in the
186 :     document.
187 : feinerer 1345 }
188 :     }
189 :     \subsection{BUG FIXES}{
190 :     \itemize{
191 :     \item The function \code{stemCompletion()} now avoids spurious duplicate
192 :     results. Reported by Seong-Hyeon Kim.
193 :     }
194 :     }
195 :     \subsection{DEPRECATED & DEFUNCT}{
196 :     \itemize{
197 :     \item Following functions have been removed:
198 :     \itemize{
199 :     \item \code{Author()}, \code{DateTimeStamp()}, \code{CMetaData()},
200 :     \code{content_meta()}, \code{DMetaData()}, \code{Description()},
201 :     \code{Heading()}, \code{ID()}, \code{Language()},
202 :     \code{LocalMetaData()}, \code{Origin()}, \code{prescindMeta()},
203 :     \code{sFilter()} (use \code{meta()} instead).
204 :     \item \code{dissimilarity()} (use \code{proxy::dist()} instead).
205 :     \item \code{makeChunks()} (use \code{[} and \code{[[} manually).
206 :     \item \code{summary.Corpus()} and \code{summary.TextRepository()}
207 :     (\code{print()} now gives a more informative but succinct overview).
208 :     \item \code{TextRepository()} and \code{RepoMetaData()} (use e.g. a
209 :     list to store multiple corpora instead).
210 :     }
211 :     }
212 :     }
213 :     }
214 : feinerer 1239 \section{Changes in tm version 0.5-10}{
215 :     \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
216 :     \itemize{
217 :     \item License changed to GPL-3 (from GPL-2 | GPL-3).
218 : feinerer 1255 \item Following functions have been renamed:
219 :     \itemize{
220 : feinerer 1432 \item \code{tm_tag_score()} to \code{tm_term_score()}.
221 : feinerer 1255 }
222 : feinerer 1239 }
223 :     }
224 : feinerer 1242 \subsection{DEPRECATED & DEFUNCT}{
225 :     \itemize{
226 : feinerer 1253 \item Following functions have been removed:
227 :     \itemize{
228 : feinerer 1277 \item \code{Dictionary()} (use a character vector instead; use
229 : feinerer 1535 \code{Terms()} to extract terms from a document-term or term-document
230 : feinerer 1277 matrix),
231 : feinerer 1258 \item \code{GmaneSource()} (but still available via an example in
232 : feinerer 1283 \code{XMLSource()}),
233 : feinerer 1260 \item \code{preprocessReut21578XML()} (moved to package
234 :     \pkg{tm.corpus.Reuters21578}),
235 : feinerer 1258 \item \code{readGmane()} (but still available via an example in
236 : feinerer 1283 \code{readXML()}),
237 : feinerer 1432 \item \code{searchFullText()} and \code{tm_intersect()}
238 : feinerer 1253 (use \code{grep()} instead).
239 :     }
240 : feinerer 1242 \item Following S3 classes are no longer registered as S4 classes:
241 :     \itemize{
242 : feinerer 1432 \item \code{VCorpus} and \code{PlainTextDocument}.
243 : feinerer 1242 }
244 :     }
245 :     }
246 : feinerer 1239 }
247 : feinerer 1224 \section{Changes in tm version 0.5-9}{
248 :     \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
249 :     \itemize{
250 : feinerer 1226 \item Stemming functionality is now provided by the package
251 :     \pkg{SnowballC} replacing packages \pkg{Snowball} and \pkg{RWeka}.
252 : feinerer 1224 \item All stopword lists (besides Catalan and SMART) available via
253 :     \code{stopwords()} now come from the Snowball stemmer project.
254 : feinerer 1227 \item Transformations, filters, and term-document matrix construction
255 :     now use \code{\link[parallel]{mclapply}} (package \pkg{parallel}).
256 :     Packages \pkg{snow} and \pkg{Rmpi} are no longer used.
257 : feinerer 1224 }
258 :     }
259 : feinerer 1227 \subsection{DEPRECATED & DEFUNCT}{
260 :     \itemize{
261 :     \item Following functions have been removed:
262 :     \itemize{
263 : feinerer 1432 \item \code{tm_startCluster()} and \code{tm_stopCluster()}.
264 : feinerer 1227 }
265 :     }
266 :     }
267 : feinerer 1224 }
268 : feinerer 1173 \section{Changes in tm version 0.5-8}{
269 :     \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
270 :     \itemize{
271 :     \item The function \code{termFreq()} now processes the
272 : feinerer 1535 \code{tolower} and \code{tokenize} options first.
273 : feinerer 1173 }
274 :     }
275 : feinerer 1174 \subsection{NEW FEATURES}{
276 :     \itemize{
277 :     \item Catalan stopwords. Requested by Xavier Fernández i Marín.
278 :     }
279 :     }
280 : feinerer 1173 \subsection{BUG FIXES}{
281 :     \itemize{
282 :     \item The function \code{termFreq()} now correctly accepts
283 :     user-provided stopwords. Reported by Bettina Grün.
284 :     \item The function \code{termFreq()} now correctly handles the
285 :     lower bound of the option \code{wordLength}. Reported by Steven
286 :     C. Bagley.
287 :     }
288 :     }
289 :     }
290 : feinerer 1170 \section{Changes in tm version 0.5-7}{
291 :     \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
292 :     \itemize{
293 :     \item The function \code{termFreq()} provides two new arguments for
294 :     generalized bounds checking of term frequencies and word
295 :     lengths. This replaces the arguments minDocFreq and
296 :     minWordLength.
297 :     \item The function \code{termFreq()} is now sensitive to the order of
298 :     control options.
299 :     }
300 :     }
301 :     \subsection{NEW FEATURES}{
302 :     \itemize{
303 :     \item Weighting schemata for term-document matrices in SMART notation.
304 :     \item Local and global options for term-document matrix
305 :     construction.
306 :     \item SMART stopword list was added.
307 :     }
308 :     }
309 :     }
310 :     \section{Changes in tm version 0.5-5}{
311 :     \subsection{NEW FEATURES}{
312 :     \itemize{
313 :     \item Access documents in a corpus by names (fallback to IDs if names are
314 :     not set), i.e., allow a string for the corpus operator `[[`.
315 :     }
316 :     }
317 :     \subsection{BUG FIXES}{
318 :     \itemize{
319 :     \item The function \code{findFreqTerms()} now checks bounds on a global level
320 :     (to comply with the manual page) instead per document. Reported
321 :     and fixed by Thomas Zapf-Schramm.
322 :     }
323 :     }
324 :     }
325 :     \section{Changes in tm version 0.5-4}{
326 :     \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
327 :     \itemize{
328 :     \item Use IETF language tags for language codes (instead of ISO 639-2).
329 :     }
330 :     }
331 :     \subsection{NEW FEATURES}{
332 :     \itemize{
333 :     \item The function \code{tm_tag_score()} provides functionality to score
334 :     documents based on the number of tags found. This is useful for
335 :     sentiment analysis.
336 :     \item The weighting function for term frequency-inverse document
337 :     frequency \code{weightTfIdf()} has now an option for term
338 :     normalization.
339 :     \item Plotting functions to test for Zipf's and Heaps' law on a
340 :     term-document matrix were added: \code{Zipf_plot()} and
341 :     \code{Heaps_plot()}. Contributed by Kurt Hornik.
342 :     }
343 :     }
344 :     }
345 :     \section{Changes in tm version 0.5-3}{
346 :     \subsection{NEW FEATURES}{
347 :     \itemize{
348 :     \item The reader function \code{readRCV1asPlain()} was added and combines the
349 :     functionality of \code{readRCV1()} and \code{as.PlainTextDocument()}.
350 :     \item The function \code{stemCompletion()} has a set of new heuristics.
351 :     }
352 :     }
353 :     }
354 :     \section{Changes in tm version 0.5-2}{
355 :     \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
356 :     \itemize{
357 :     \item The function \code{termFreq()} which is used for building a
358 :     term-document matrix now uses a whitespace oriented tokenizer
359 :     as default.
360 :     }
361 :     }
362 :     \subsection{NEW FEATURES}{
363 :     \itemize{
364 :     \item A combine method for merging multiple term-document matrices
365 :     was added (\code{c.TermDocumentMatrix()}).
366 :     \item The function \code{termFreq()} has now an option to remove
367 :     punctuation characters.
368 :     }
369 :     }
370 :     \subsection{DEPRECATED & DEFUNCT}{
371 :     \itemize{
372 :     \item Following functions have been removed:
373 :     \itemize{
374 : feinerer 1432 \item \code{CSVSource()} (use \code{DataframeSource(read.csv(..., stringsAsFactors = FALSE))} instead), and
375 : feinerer 1170 \item \code{TermDocMatrix()} (use \code{DocumentTermMatrix()} instead).
376 :     }
377 :     }
378 :     }
379 :     \subsection{BUG FIXES}{
380 :     \itemize{
381 :     \item \code{removeWords()} no longer skips words at the beginning or the end
382 :     of a line. Reported by Mark Kimpel.
383 :     }
384 :     }
385 :     }
386 :     \section{Changes in tm version 0.5-1}{
387 :     \subsection{BUG FIXES}{
388 :     \itemize{
389 :     \item \code{preprocessReut21578XML()} no longer generates invalid file names.
390 :     }
391 :     }
392 :     }
393 :     \section{Changes in tm version 0.5}{
394 :     \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
395 :     \itemize{
396 :     \item All classes, functions, and generics are reimplemented using
397 :     the S3 class system.
398 :     \item Following functions have been renamed:
399 :     \itemize{
400 : feinerer 1432 \item \code{activateCluster()} to \code{tm_startCluster()},
401 : feinerer 1170 \item \code{asPlain()} to \code{as.PlainTextDocument()},
402 :     \item \code{deactivateCluster()} to \code{tm_stopCluster()},
403 :     \item \code{tmFilter()} to \code{tm_filter()},
404 :     \item \code{tmIndex()} to \code{tm_index()},
405 :     \item \code{tmIntersect()} to \code{tm_intersect()}, and
406 :     \item \code{tmMap()} to \code{tm_map()}.
407 :     }
408 :     \item Mail handling functionality is factored out to the
409 : feinerer 1233 \pkg{tm.plugin.mail} package.
410 : feinerer 1170 }
411 :     }
412 :     \subsection{DEPRECATED & DEFUNCT}{
413 :     \itemize{
414 :     \item Following functions have been removed:
415 :     \itemize{
416 :     \item \code{tmTolower()} (use \code{tolower()} instead), and
417 : feinerer 1172 \item \code{replacePatterns()} (use \code{gsub()} instead).
418 : feinerer 1170 }
419 :     }
420 :     }
421 :     }
422 :     \section{Changes in tm version 0.4}{
423 :     \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
424 :     \itemize{
425 :     \item The Corpus class is now virtual providing an abstract
426 :     interface.
427 :     \item VCorpus, the default implementation of the abstract corpus
428 :     interface (by subclassing), provides a corpus with volatile (=
429 : feinerer 1438 standard \R object) semantics. It loads all documents into
430 : feinerer 1170 memory, and is designed for small to medium-sized data sets.
431 :     \item PCorpus, an implementation of the abstract corpus interface (by
432 :     subclassing), provides a corpus with permanent storage
433 :     semantics. The actual data is stored in an external database
434 : feinerer 1233 (file) object (as supported by the \pkg{filehash} package), with
435 : feinerer 1170 automatic (un-)loading into memory. It is designed for systems
436 :     with small memory.
437 :     \item Language codes are now in ISO 639-2 (instead of ISO 639-1).
438 :     \item Reader functions no longer have a load argument for lazy
439 :     loading.
440 :     }
441 :     }
442 :     \subsection{NEW FEATURES}{
443 :     \itemize{
444 :     \item The reader function \code{readReut21578XMLasPlain()} was added and
445 :     combines the functionality of \code{readReut21578XML()} and \code{asPlain()}.
446 :     }
447 :     }
448 :     \subsection{BUG FIXES}{
449 :     \itemize{
450 :     \item \code{weightTfIdf()} no longer applies a binary weighting to an input
451 :     matrix in term frequency format (which happened only in 0.3-4).
452 :     }
453 :     }
454 :     }
455 :     \section{Changes in tm version 0.3-4}{
456 :     \subsection{SIGNIFICANT USER-VISIBLE CHANGES}{
457 :     \itemize{
458 :     \item \code{.onLoad()} no longer tries to start a MPI cluster (which often
459 :     failed in misconfigured environments). Use \code{activateCluster()}
460 :     and \code{deactivateCluster()} instead.
461 :     \item DocumentTermMatrix (the improved reimplementation of defunct
462 : feinerer 1233 TermDocMatrix) does not use the \pkg{Matrix} package anymore.
463 : feinerer 1170 }
464 :     }
465 :     \subsection{NEW FEATURES}{
466 :     \itemize{
467 :     \item The \code{DirSource()} constructor now accepts the two new (optional)
468 :     arguments pattern and ignore.case. With pattern one can define
469 :     a regular expression for selecting only matching files, and
470 :     ignore.case specifies whether pattern-matching is
471 :     case-sensitive.
472 :     \item The \code{readNewsgroup()} reader function can now be configured for
473 :     custom date formats (via the DateFormat argument).
474 :     \item The \code{readPDF()} reader function can now be configured (via the
475 :     PdfinfoOptions and PdftotextOptions arguments).
476 :     \item The \code{readDOC()} reader function can now be configured (via the
477 :     AntiwordOptions argument).
478 :     \item Sources now can be vectorized. This allows faster corpus
479 :     construction.
480 :     \item New XMLSource class for arbitrary XML files.
481 :     \item The new \code{readTabular()} reader function allows to create a custom
482 :     tailor-made reader configured via mappings from a tabular data
483 :     structure.
484 :     \item The new \code{readXML()} reader function allows to read in arbitrary
485 :     XML files which are described with a specification.
486 :     \item The new \code{tmReduce()} transformation allows to combine multiple
487 :     maps into one transformation.
488 :     }
489 :     }
490 :     \subsection{DEPRECATED & DEFUNCT}{
491 :     \itemize{
492 :     \item CSVSource is defunct (use DataframeSource instead).
493 :     \item weightLogical is defunct.
494 :     \item TermDocMatrix is defunct (use DocumentTermMatrix or
495 :     TermDocumentMatrix instead).
496 :     }
497 :     }
498 :     }
499 :     \section{Changes in tm version 0.3-3}{
500 :     \subsection{NEW FEATURES}{
501 :     \itemize{
502 :     \item The abstract Source class gets a default implementation for
503 :     the \code{stepNext()} method. It increments the position counter by
504 :     one, a reasonable value for most sources. For special purposes
505 :     custom methods can be created via overloading \code{stepNext()} of
506 :     the subclass.
507 :     \item New URISource class for a single document identified by a
508 :     Uniform Resource Identifier.
509 :     \item New DataframeSource for documents stored in a data frame. Each
510 :     row is interpreted as a single document.
511 :     }
512 :     }
513 :     \subsection{BUG FIXES}{
514 :     \itemize{
515 :     \item Fix off-by-one error in \code{convertMboxEml()} function. Reported by
516 :     Angela Bohn.
517 :     \item Sort row indices in sparse term-document matrices. Kudos to
518 :     Martin Mächler for his suggestions.
519 :     \item Sources and readers no longer evaluate calls in a non-standard
520 :     way.
521 :     }
522 :     }
523 :     }
524 :     \section{Changes in tm version 0.3-2}{
525 :     \subsection{NEW FEATURES}{
526 :     \itemize{
527 :     \item Weighting functions now have an Acronym slot containing
528 :     abbreviations of the weighting functions' names. This is highly
529 :     useful when generating tables with indications which weighting
530 :     scheme was actually used for your experiments.
531 :     \item The functions \code{tmFilter()}, \code{tmIndex()}, \code{tmMap()} and \code{TermDocMatrix()}
532 : feinerer 1233 now can use a MPI cluster (via the \pkg{snow} and \pkg{Rmpi} packages) if
533 : feinerer 1170 available. Use \code{(de)activateCluster()} to manually override
534 :     cluster usage settings. Special thanks to Stefan Theussl for
535 :     his constructive comments.
536 :     \item The Source class receives a new Length slot. It contains the
537 :     number of elements provided by the source (although there
538 :     might be rare cases where the number cannot be determined in
539 :     advance---then it should be set to zero).
540 :     }
541 :     }
542 :     }

root@r-forge.r-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business Powered By FusionForge