SCM

SCM Repository

[tm] Log of /pkg/DESCRIPTION
[tm] / pkg / DESCRIPTION  
ViewVC logotype

Log of /pkg/DESCRIPTION

Parent Directory Parent Directory


Links to HEAD: (view) (download) (annotate)
Sticky Revision:

Revision 1481 - (view) (download) (annotate) - [select for diffs]
Modified Sat May 20 10:28:00 2017 UTC (16 months, 1 week ago) by feinerer
File length: 839 byte(s)
Diff to previous 1480
Support TIF for DataframeSource

See Text Interchange Formats (TIF, https://github.com/ropensci/tif) and
readtext (https://github.com/kbenoit/readtext).

Revision 1480 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 28 15:13:08 2017 UTC (16 months, 4 weeks ago) by feinerer
File length: 839 byte(s)
Diff to previous 1477
Use antiword and pdftools packages for DOC and PDF extraction

Revision 1477 - (view) (download) (annotate) - [select for diffs]
Modified Sat Mar 25 18:12:22 2017 UTC (18 months ago) by feinerer
File length: 979 byte(s)
Diff to previous 1475
Fix term-document matrix construction

Terms with a match in the user-provided dictionary were counted only once when
constructing a term-document matrix from a VCorpus (resulting in a binary
weighting). Reported by Mark Rosenstein.

Include a unit test to avoid future regressions.

Revision 1475 - (view) (download) (annotate) - [select for diffs]
Modified Wed Mar 22 14:53:08 2017 UTC (18 months ago) by feinerer
File length: 969 byte(s)
Diff to previous 1474
Fix variable name

Revision 1474 - (view) (download) (annotate) - [select for diffs]
Modified Tue Mar 21 19:26:21 2017 UTC (18 months ago) by feinerer
File length: 969 byte(s)
Diff to previous 1473
Fix 'dictionary' argument handling.

Revision 1473 - (view) (download) (annotate) - [select for diffs]
Modified Thu Mar 2 13:59:04 2017 UTC (18 months, 3 weeks ago) by feinerer
File length: 962 byte(s)
Diff to previous 1470
Try to fix fallout from Clang fix.

Revision 1470 - (view) (download) (annotate) - [select for diffs]
Modified Mon Feb 27 08:03:50 2017 UTC (18 months, 4 weeks ago) by khornik
File length: 960 byte(s)
Diff to previous 1468
New release.

Revision 1468 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 23 16:55:16 2017 UTC (20 months ago) by feinerer
File length: 960 byte(s)
Diff to previous 1461
Update location of XML-encoded version of Reuters-21578

Revision 1461 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jan 14 14:51:22 2017 UTC (20 months, 1 week ago) by feinerer
File length: 960 byte(s)
Diff to previous 1455
Implement [ and [[ for selected sources

Both [ and [[ are not considered part of the API but are provided as
convenience. Moreover, it is considered good practice as sources typically
report a length.

Revision 1455 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jan 7 12:06:13 2017 UTC (20 months, 2 weeks ago) by feinerer
File length: 960 byte(s)
Diff to previous 1446
Gracefully handle content_transformer in tm_map.SimpleCorpus()

Revision 1446 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 2 15:41:49 2016 UTC (22 months, 3 weeks ago) by feinerer
File length: 960 byte(s)
Diff to previous 1445
Revive parallel::mclapply()

Experiments show that, with the right hardware, mclapply() gives you measurable
performance gains. So reenable it --- despite substantial drawbacks (RAM and
CPU overhead) in some scenarios.

Revision 1445 - (view) (download) (annotate) - [select for diffs]
Modified Sun Oct 9 09:30:58 2016 UTC (23 months, 2 weeks ago) by feinerer
File length: 950 byte(s)
Diff to previous 1443
Speed up termFreq(), general cleanup

- Avoid parallel::mclapply()
- Use custom .table()
- Use rep.int(), rep_len() and lengths()
- Fix typos
- Shorten overlong lines
- Consistent formatting

Revision 1443 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 22 11:26:41 2016 UTC (2 years, 1 month ago) by feinerer
File length: 960 byte(s)
Diff to previous 1442
Process all arguments in tm_map.SimpleCorpus()

Revision 1442 - (view) (download) (annotate) - [select for diffs]
Modified Sat Aug 6 17:19:22 2016 UTC (2 years, 1 month ago) by feinerer
File length: 960 byte(s)
Diff to previous 1441
Recheck local bounds after stemming in TermDocumentMatrix.SimpleCorpus()

Revision 1441 - (view) (download) (annotate) - [select for diffs]
Modified Sat Aug 6 16:46:33 2016 UTC (2 years, 1 month ago) by feinerer
File length: 960 byte(s)
Diff to previous 1440
Simplify termFreq()

- Return table instead of named integer vector (avoids internal conversion)
- Always skip terms with a zero count

Revision 1440 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jul 30 06:34:57 2016 UTC (2 years, 1 month ago) by feinerer
File length: 960 byte(s)
Diff to previous 1439
Corpus() now chooses between SimpleCorpus and VCorpus based on its arguments

Revision 1439 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jul 17 07:02:40 2016 UTC (2 years, 2 months ago) by feinerer
File length: 964 byte(s)
Diff to previous 1438
Polish TermDocumentMatrix.SimpleCorpus()

Revision 1438 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jul 16 18:32:59 2016 UTC (2 years, 2 months ago) by feinerer
File length: 964 byte(s)
Diff to previous 1437
Use Rcpp for efficient term-document matrix construction from a SimpleCorpus

Revision 1437 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 13 19:23:49 2016 UTC (2 years, 2 months ago) by feinerer
File length: 938 byte(s)
Diff to previous 1435
Add SimpleCorpus

SimpleCorpus provides a corpus which is optimized for the most common usage
scenario: importing plain texts from files in a directory or directly from a
vector in R, preprocessing and transforming the texts, and finally exporting
them to a term-document matrix. The aim is to boost performance and minimize
memory pressure. It loads all documents into memory, and is designed for
medium-sized to large data sets.

Revision 1435 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 18 09:53:21 2015 UTC (2 years, 10 months ago) by feinerer
File length: 938 byte(s)
Diff to previous 1433
Provide inspect.TextDocument() as shorthand for writeLines(as.character())

Revision 1433 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jul 2 10:52:03 2015 UTC (3 years, 2 months ago) by feinerer
File length: 936 byte(s)
Diff to previous 1432
Avoid simplification to ensure that the result is a named list

Revision 1432 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 1 19:17:31 2015 UTC (3 years, 2 months ago) by feinerer
File length: 936 byte(s)
Diff to previous 1431
Update NEWS

Revision 1431 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 1 10:32:42 2015 UTC (3 years, 2 months ago) by khornik
File length: 938 byte(s)
Diff to previous 1430
Improve namespace.

Revision 1430 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jun 9 12:34:44 2015 UTC (3 years, 3 months ago) by feinerer
File length: 922 byte(s)
Diff to previous 1426
Highlight the character representation of documents

Revision 1426 - (view) (download) (annotate) - [select for diffs]
Modified Wed May 6 16:43:10 2015 UTC (3 years, 4 months ago) by feinerer
File length: 922 byte(s)
Diff to previous 1421
Avoid overlong line

Revision 1421 - (view) (download) (annotate) - [select for diffs]
Modified Mon May 4 19:21:11 2015 UTC (3 years, 4 months ago) by feinerer
File length: 922 byte(s)
Diff to previous 1420
Do not require() Rgraphviz

Revision 1420 - (view) (download) (annotate) - [select for diffs]
Modified Mon May 4 19:04:00 2015 UTC (3 years, 4 months ago) by feinerer
File length: 913 byte(s)
Diff to previous 1419
Accept NLP::Span_Tokenizer

Revision 1419 - (view) (download) (annotate) - [select for diffs]
Modified Sat May 2 17:23:47 2015 UTC (3 years, 4 months ago) by feinerer
File length: 913 byte(s)
Diff to previous 1417
Sync format()/print() with NLP

Revision 1417 - (view) (download) (annotate) - [select for diffs]
Modified Tue Apr 28 18:02:42 2015 UTC (3 years, 4 months ago) by feinerer
File length: 913 byte(s)
Diff to previous 1413
Mark scan_tokenizer() and MC_tokenizer() as NLP::Token_Tokenizer

Revision 1413 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 4 08:21:38 2015 UTC (3 years, 5 months ago) by feinerer
File length: 911 byte(s)
Diff to previous 1409
Correctly process words being truncations of others

Revision 1409 - (view) (download) (annotate) - [select for diffs]
Modified Fri Feb 27 16:10:18 2015 UTC (3 years, 6 months ago) by feinerer
File length: 911 byte(s)
Diff to previous 1406
Add as.VCorpus.list()

Revision 1406 - (view) (download) (annotate) - [select for diffs]
Modified Mon Feb 23 17:21:49 2015 UTC (3 years, 7 months ago) by feinerer
File length: 911 byte(s)
Diff to previous 1401
Add readTagged(): a reader for text documents containing POS-tagged words

Revision 1401 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jan 28 18:38:56 2015 UTC (3 years, 7 months ago) by feinerer
File length: 911 byte(s)
Diff to previous 1399
Sync documentation with code (log2 vs. log)

Reported by Marcus Spies.

Revision 1399 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jan 21 15:31:33 2015 UTC (3 years, 8 months ago) by feinerer
File length: 911 byte(s)
Diff to previous 1397
Show TOPICS categories

Reported by Diego M. Barreiro FandiƱo.

Revision 1397 - (view) (download) (annotate) - [select for diffs]
Modified Fri Sep 12 19:30:27 2014 UTC (4 years ago) by feinerer
File length: 911 byte(s)
Diff to previous 1390
Add open() and close() for sources

Useful for sources with complex or expensive setup, e.g., database connections
or file handles.

Revision 1390 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jun 6 12:37:33 2014 UTC (4 years, 3 months ago) by feinerer
File length: 909 byte(s)
Diff to previous 1384
Ensure data types

Revision 1384 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jun 1 07:59:56 2014 UTC (4 years, 3 months ago) by feinerer
File length: 909 byte(s)
Diff to previous 1379
Improve handling of empty matrices

Revision 1379 - (view) (download) (annotate) - [select for diffs]
Modified Tue May 27 17:55:29 2014 UTC (4 years, 4 months ago) by feinerer
File length: 909 byte(s)
Diff to previous 1376
Provide names<-() for VCorpus and PCorpus

Revision 1376 - (view) (download) (annotate) - [select for diffs]
Modified Wed May 21 14:36:35 2014 UTC (4 years, 4 months ago) by feinerer
File length: 909 byte(s)
Diff to previous 1375
Remove names() from Source API

Revision 1375 - (view) (download) (annotate) - [select for diffs]
Modified Tue May 20 18:21:27 2014 UTC (4 years, 4 months ago) by feinerer
File length: 909 byte(s)
Diff to previous 1369
Do not force author to be a person object

Suggested by Milan Bouchet-Valat.

Revision 1369 - (view) (download) (annotate) - [select for diffs]
Modified Tue Apr 29 07:42:53 2014 UTC (4 years, 4 months ago) by feinerer
File length: 909 byte(s)
Diff to previous 1365
Fallback to English if meta(doc, "language") is invalid

Revision 1365 - (view) (download) (annotate) - [select for diffs]
Modified Mon Apr 28 14:02:53 2014 UTC (4 years, 4 months ago) by feinerer
File length: 909 byte(s)
Diff to previous 1360
Fix and improve documentation, suggest tm.lexicon.GeneralInquirer

Revision 1360 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 25 18:10:01 2014 UTC (4 years, 5 months ago) by khornik
File length: 881 byte(s)
Diff to previous 1358
Add info on additional repository hosting Rcampdf.

Revision 1358 - (view) (download) (annotate) - [select for diffs]
Modified Thu Apr 24 07:43:38 2014 UTC (4 years, 5 months ago) by feinerer
File length: 831 byte(s)
Diff to previous 1348
Document content_transformer()

Revision 1348 - (view) (download) (annotate) - [select for diffs]
Modified Tue Apr 22 07:09:41 2014 UTC (4 years, 5 months ago) by feinerer
File length: 831 byte(s)
Diff to previous 1345
Provide as.VCorpus() generic

Revision 1345 - (view) (download) (annotate) - [select for diffs]
Modified Sun Apr 20 16:48:32 2014 UTC (4 years, 5 months ago) by feinerer
File length: 831 byte(s)
Diff to previous 1336
Update NEWS

Revision 1336 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 19 08:59:39 2014 UTC (4 years, 5 months ago) by feinerer
File length: 831 byte(s)
Diff to previous 1332
Implement and describe Source API

Revision 1332 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 18 09:00:55 2014 UTC (4 years, 5 months ago) by feinerer
File length: 824 byte(s)
Diff to previous 1323
Update TextDocument documentation

Revision 1323 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 12 17:16:38 2014 UTC (4 years, 5 months ago) by feinerer
File length: 824 byte(s)
Diff to previous 1320
Use tools::find_gs_cmd()

Revision 1320 - (view) (download) (annotate) - [select for diffs]
Modified Sun Apr 6 07:05:45 2014 UTC (4 years, 5 months ago) by feinerer
File length: 817 byte(s)
Diff to previous 1319
Use words() as default tokenizer in termFreq()

Revision 1319 - (view) (download) (annotate) - [select for diffs]
Modified Wed Apr 2 18:03:37 2014 UTC (4 years, 5 months ago) by feinerer
File length: 817 byte(s)
Diff to previous 1316
Provide words.PlainTextDocument(), clean NAMESPACE

Revision 1316 - (view) (download) (annotate) - [select for diffs]
Modified Mon Mar 31 14:41:41 2014 UTC (4 years, 5 months ago) by feinerer
File length: 817 byte(s)
Diff to previous 1309
Remove dissimilarity() (a trivial wrapper around proxy:dist())

Revision 1309 - (view) (download) (annotate) - [select for diffs]
Modified Wed Mar 26 09:15:04 2014 UTC (4 years, 6 months ago) by feinerer
File length: 824 byte(s)
Diff to previous 1307
Move content and meta generics to package NLP

Revision 1307 - (view) (download) (annotate) - [select for diffs]
Modified Tue Mar 25 12:15:51 2014 UTC (4 years, 6 months ago) by feinerer
File length: 808 byte(s)
Diff to previous 1300
Redesign corpora

Revision 1300 - (view) (download) (annotate) - [select for diffs]
Modified Fri Mar 21 14:30:05 2014 UTC (4 years, 6 months ago) by feinerer
File length: 808 byte(s)
Diff to previous 1299
Redesign text documents

This is a major change and causes fallout. Soon to be fixed ...

Revision 1299 - (view) (download) (annotate) - [select for diffs]
Modified Fri Mar 21 09:45:14 2014 UTC (4 years, 6 months ago) by feinerer
File length: 813 byte(s)
Diff to previous 1297
Use setNames() instead of structure(..., names)

Revision 1297 - (view) (download) (annotate) - [select for diffs]
Modified Thu Mar 20 18:43:22 2014 UTC (4 years, 6 months ago) by feinerer
File length: 806 byte(s)
Diff to previous 1295
Redesign sources

Revision 1295 - (view) (download) (annotate) - [select for diffs]
Modified Tue Feb 25 10:54:41 2014 UTC (4 years, 7 months ago) by feinerer
File length: 806 byte(s)
Diff to previous 1294
Export pGetElem.URISource

Revision 1294 - (view) (download) (annotate) - [select for diffs]
Modified Sun Feb 23 07:41:45 2014 UTC (4 years, 7 months ago) by feinerer
File length: 806 byte(s)
Diff to previous 1293
Avoid spurious duplicate results

Revision 1293 - (view) (download) (annotate) - [select for diffs]
Modified Thu Feb 20 14:39:33 2014 UTC (4 years, 7 months ago) by khornik
File length: 804 byte(s)
Diff to previous 1292
Commit to trigger rebuild.

Revision 1292 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jan 28 16:31:18 2014 UTC (4 years, 7 months ago) by feinerer
File length: 804 byte(s)
Diff to previous 1289
Process three letter codes; based on Kurt's contribution

Revision 1289 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 13 16:23:06 2014 UTC (4 years, 8 months ago) by khornik
File length: 804 byte(s)
Diff to previous 1279
Need R >= 3.0.0.

Revision 1279 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jan 7 06:29:58 2014 UTC (4 years, 8 months ago) by feinerer
File length: 805 byte(s)
Diff to previous 1268
Improve documentation, update ChangeLog, prepare for CRAN release

Revision 1268 - (view) (download) (annotate) - [select for diffs]
Modified Wed Dec 18 16:37:48 2013 UTC (4 years, 9 months ago) by feinerer
File length: 806 byte(s)
Diff to previous 1266
Show label for single result item, do not export findAssocs.matrix()

Revision 1266 - (view) (download) (annotate) - [select for diffs]
Modified Sun Dec 15 09:14:16 2013 UTC (4 years, 9 months ago) by feinerer
File length: 806 byte(s)
Diff to previous 1261
Allow multiple terms for findAssocs(), make it more efficient on spare matrices

Revision 1261 - (view) (download) (annotate) - [select for diffs]
Modified Fri Sep 27 09:37:35 2013 UTC (5 years ago) by feinerer
File length: 806 byte(s)
Diff to previous 1258
Allow multiple URIs for URISource, default to vectorized sources, simplify eoi()

Revision 1258 - (view) (download) (annotate) - [select for diffs]
Modified Fri Sep 20 12:15:42 2013 UTC (5 years ago) by feinerer
File length: 806 byte(s)
Diff to previous 1257
Remove GmaneSource() and readGmane(), simplify readers, improve documentation

Revision 1257 - (view) (download) (annotate) - [select for diffs]
Modified Thu Sep 19 10:48:07 2013 UTC (5 years ago) by feinerer
File length: 806 byte(s)
Diff to previous 1255
Export Source constructor, extend documentation

Revision 1255 - (view) (download) (annotate) - [select for diffs]
Modified Wed Sep 11 07:30:06 2013 UTC (5 years ago) by feinerer
File length: 806 byte(s)
Diff to previous 1254
Rename tm_tag_score() to tm_term_score()

Revision 1254 - (view) (download) (annotate) - [select for diffs]
Modified Sat Sep 7 08:45:50 2013 UTC (5 years ago) by feinerer
File length: 806 byte(s)
Diff to previous 1253
Avoid tm::

Revision 1253 - (view) (download) (annotate) - [select for diffs]
Modified Fri Aug 30 10:03:09 2013 UTC (5 years ago) by feinerer
File length: 806 byte(s)
Diff to previous 1252
Remove getFilters(), searchFullText(), and tm_intersect() (use grep() instead)

Revision 1252 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 26 14:00:31 2013 UTC (5 years, 1 month ago) by feinerer
File length: 806 byte(s)
Diff to previous 1251
Report non-existent or non-readable files

Revision 1251 - (view) (download) (annotate) - [select for diffs]
Modified Wed Aug 21 08:44:25 2013 UTC (5 years, 1 month ago) by feinerer
File length: 806 byte(s)
Diff to previous 1247
Document readPDF() rewrite

Revision 1247 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 20 16:45:26 2013 UTC (5 years, 1 month ago) by feinerer
File length: 806 byte(s)
Diff to previous 1245
Suggest Rcampdf

Revision 1245 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 20 07:48:15 2013 UTC (5 years, 1 month ago) by feinerer
File length: 797 byte(s)
Diff to previous 1243
Suggest Rpoppler

Revision 1243 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 19 09:37:32 2013 UTC (5 years, 1 month ago) by feinerer
File length: 787 byte(s)
Diff to previous 1242
Interface several PDF extraction engines (draft)

Revision 1242 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 19 05:33:57 2013 UTC (5 years, 1 month ago) by feinerer
File length: 787 byte(s)
Diff to previous 1240
Do not register VCorpus and PlainTextDocument as S4 classes anymore

Revision 1240 - (view) (download) (annotate) - [select for diffs]
Modified Sun Aug 18 13:18:28 2013 UTC (5 years, 1 month ago) by khornik
File length: 796 byte(s)
Diff to previous 1238
Mention pdf_info.ps copyright.

Revision 1238 - (view) (download) (annotate) - [select for diffs]
Modified Fri Aug 9 08:49:58 2013 UTC (5 years, 1 month ago) by feinerer
File length: 653 byte(s)
Diff to previous 1234
Switch to GPL-3

Revision 1234 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jul 25 17:45:00 2013 UTC (5 years, 2 months ago) by feinerer
File length: 658 byte(s)
Diff to previous 1231
Report NA instead of error for no completions in prevalent heuristic, reformatting

Revision 1231 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 10 06:51:26 2013 UTC (5 years, 2 months ago) by feinerer
File length: 658 byte(s)
Diff to previous 1228
Use pdfinfo command line tool

Revision 1228 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jun 17 08:28:18 2013 UTC (5 years, 3 months ago) by feinerer
File length: 644 byte(s)
Diff to previous 1227
s/Suggests/Imports/ for parallel package

Revision 1227 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jun 16 08:37:10 2013 UTC (5 years, 3 months ago) by feinerer
File length: 644 byte(s)
Diff to previous 1226
Use package parallel instead of Rmpi and snow

Revision 1226 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jun 16 07:38:58 2013 UTC (5 years, 3 months ago) by feinerer
File length: 646 byte(s)
Diff to previous 1223
Document SnowballC switch in NEWS

Revision 1223 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jun 15 10:59:56 2013 UTC (5 years, 3 months ago) by feinerer
File length: 648 byte(s)
Diff to previous 1220
Handle (but warn about) invalid/empty document IDs in term-document matrix construction

Revision 1220 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jun 11 08:37:43 2013 UTC (5 years, 3 months ago) by feinerer
File length: 648 byte(s)
Diff to previous 1216
Use SnowballC instead of Snowball and RWeka

Revision 1216 - (view) (download) (annotate) - [select for diffs]
Modified Thu Apr 11 12:05:53 2013 UTC (5 years, 5 months ago) by feinerer
File length: 654 byte(s)
Diff to previous 1211
Document UCP change

Revision 1211 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 28 08:40:24 2013 UTC (5 years, 7 months ago) by feinerer
File length: 654 byte(s)
Diff to previous 1205
Update version and date for CRAN release

Revision 1205 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jan 11 19:43:40 2013 UTC (5 years, 8 months ago) by khornik
File length: 654 byte(s)
Diff to previous 1201
Update version and date.

Revision 1201 - (view) (download) (annotate) - [select for diffs]
Modified Fri Dec 14 15:08:35 2012 UTC (5 years, 9 months ago) by feinerer
File length: 654 byte(s)
Diff to previous 1200
Update Version as well

Revision 1200 - (view) (download) (annotate) - [select for diffs]
Modified Fri Dec 14 15:07:38 2012 UTC (5 years, 9 months ago) by feinerer
File length: 652 byte(s)
Diff to previous 1199
Ensure dimnames of type character when generating a simple_triplet_matrix

Revision 1199 - (view) (download) (annotate) - [select for diffs]
Modified Mon Dec 10 14:37:54 2012 UTC (5 years, 9 months ago) by feinerer
File length: 652 byte(s)
Diff to previous 1198
Document right to left folding in tm_reduce

Revision 1198 - (view) (download) (annotate) - [select for diffs]
Modified Tue Dec 4 13:19:31 2012 UTC (5 years, 9 months ago) by feinerer
File length: 652 byte(s)
Diff to previous 1197
Prepare for CRAN release

Revision 1197 - (view) (download) (annotate) - [select for diffs]
Modified Tue Dec 4 12:54:31 2012 UTC (5 years, 9 months ago) by feinerer
File length: 654 byte(s)
Diff to previous 1195
Update version and date

Revision 1195 - (view) (download) (annotate) - [select for diffs]
Modified Mon Nov 26 10:10:07 2012 UTC (5 years, 10 months ago) by feinerer
File length: 654 byte(s)
Diff to previous 1194
Make termFreq() more visible in TermDocumentMatrix() documentation

Revision 1194 - (view) (download) (annotate) - [select for diffs]
Modified Fri Nov 2 15:15:03 2012 UTC (5 years, 10 months ago) by feinerer
File length: 654 byte(s)
Diff to previous 1191
Ensure data types for document creation

Revision 1191 - (view) (download) (annotate) - [select for diffs]
Modified Wed Oct 3 17:31:39 2012 UTC (5 years, 11 months ago) by feinerer
File length: 654 byte(s)
Diff to previous 1189
Gracefully handle empty columns and rows in weighting functions

Revision 1189 - (view) (download) (annotate) - [select for diffs]
Modified Thu Aug 16 05:31:22 2012 UTC (6 years, 1 month ago) by feinerer
File length: 654 byte(s)
Diff to previous 1188
Update Authors@R

Revision 1188 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 27 08:47:50 2012 UTC (6 years, 2 months ago) by feinerer
File length: 589 byte(s)
Diff to previous 1181
Allow more simultaneous (stop)words in removeWords()

Revision 1181 - (view) (download) (annotate) - [select for diffs]
Modified Thu Mar 8 11:22:56 2012 UTC (6 years, 6 months ago) by feinerer
File length: 589 byte(s)
Diff to previous 1176
Performance improvement as suggested by Milan Bouchet-Valat

Revision 1176 - (view) (download) (annotate) - [select for diffs]
Modified Fri Feb 3 07:22:32 2012 UTC (6 years, 7 months ago) by feinerer
File length: 589 byte(s)
Diff to previous 1175
Prepare for CRAN minor patch release

Revision 1175 - (view) (download) (annotate) - [select for diffs]
Modified Wed Feb 1 06:08:02 2012 UTC (6 years, 7 months ago) by feinerer
File length: 589 byte(s)
Diff to previous 1174
Readers can now set the document language

Revision 1174 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 23 09:55:47 2012 UTC (6 years, 8 months ago) by feinerer
File length: 589 byte(s)
Diff to previous 1173
Add Catalan stopwords

Revision 1173 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 16 15:05:22 2012 UTC (6 years, 8 months ago) by feinerer
File length: 589 byte(s)
Diff to previous 1169
Process tolower and tokenize options first in termFreq()

Revision 1169 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jan 14 11:32:38 2012 UTC (6 years, 8 months ago) by feinerer
File length: 589 byte(s)
Diff to previous 1168
Simplify XMLSource; Use vignettes directory

Revision 1168 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jan 11 10:35:44 2012 UTC (6 years, 8 months ago) by feinerer
File length: 589 byte(s)
Diff to previous 1167
Fix processing of user provided stopwords

Revision 1167 - (view) (download) (annotate) - [select for diffs]
Modified Fri Dec 23 09:44:33 2011 UTC (6 years, 9 months ago) by feinerer
File length: 589 byte(s)
Diff to previous 1166
Fix invalid handling of control[1] argument to termFreq()

Revision 1166 - (view) (download) (annotate) - [select for diffs]
Modified Sat Dec 17 10:32:05 2011 UTC (6 years, 9 months ago) by feinerer
File length: 587 byte(s)
Diff to previous 1164
Prepare for CRAN Christmas release

Revision 1164 - (view) (download) (annotate) - [select for diffs]
Modified Mon Dec 12 06:42:28 2011 UTC (6 years, 9 months ago) by feinerer
File length: 658 byte(s)
Diff to previous 1161
Map empty input to 'porter'

Revision 1161 - (view) (download) (annotate) - [select for diffs]
Modified Wed Dec 7 06:10:32 2011 UTC (6 years, 9 months ago) by feinerer
File length: 657 byte(s)
Diff to previous 1159
Add option to removePunctuation() to preserve intra-word dashes

Revision 1159 - (view) (download) (annotate) - [select for diffs]
Modified Tue Dec 6 15:11:45 2011 UTC (6 years, 9 months ago) by feinerer
File length: 657 byte(s)
Diff to previous 1157
Make termFreq() sensitive to the order of control options

Revision 1157 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 17 17:20:31 2011 UTC (6 years, 10 months ago) by feinerer
File length: 657 byte(s)
Diff to previous 1155
Depend on R >= 2.14.0

Revision 1155 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 17 16:53:26 2011 UTC (6 years, 10 months ago) by feinerer
File length: 657 byte(s)
Diff to previous 1153
Use tools:::pdf_info() instead of external pdfinfo tool

Revision 1153 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 17 15:45:31 2011 UTC (6 years, 10 months ago) by feinerer
File length: 671 byte(s)
Diff to previous 1151
Add SMART stopword list

Revision 1151 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 17 14:21:49 2011 UTC (6 years, 10 months ago) by feinerer
File length: 671 byte(s)
Diff to previous 1150
Add generalized bounds checking

Revision 1150 - (view) (download) (annotate) - [select for diffs]
Modified Tue Nov 15 15:37:17 2011 UTC (6 years, 10 months ago) by feinerer
File length: 569 byte(s)
Diff to previous 1149
Document MC_tokenizer(), scan_tokenizer(), and getTokenizers()

Revision 1149 - (view) (download) (annotate) - [select for diffs]
Modified Fri Nov 4 15:48:50 2011 UTC (6 years, 10 months ago) by feinerer
File length: 569 byte(s)
Diff to previous 1142
Export and document c.term_frequency() and as.TermDocumentMatrix.term_frequency()

Revision 1142 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 30 10:16:29 2011 UTC (7 years ago) by feinerer
File length: 569 byte(s)
Diff to previous 1139
Documentation for weighting schemata in SMART notation

Revision 1139 - (view) (download) (annotate) - [select for diffs]
Modified Wed Aug 24 15:21:02 2011 UTC (7 years, 1 month ago) by feinerer
File length: 569 byte(s)
Diff to previous 1136
Raise error if no stopwords are available for requested language

Revision 1136 - (view) (download) (annotate) - [select for diffs]
Modified Fri May 27 11:50:39 2011 UTC (7 years, 4 months ago) by feinerer
File length: 569 byte(s)
Diff to previous 1135
Improve SMART weighting (still buggy)

Revision 1135 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 15 06:18:54 2011 UTC (7 years, 5 months ago) by khornik
File length: 567 byte(s)
Diff to previous 1128
Export and document Blei et al reader.

Revision 1128 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 8 17:36:10 2011 UTC (7 years, 5 months ago) by khornik
File length: 567 byte(s)
Diff to previous 1122
Add functionality for obtaining DTMs and TDMs from t/f matrices coercible
to simple triplet matrices.

Revision 1122 - (view) (download) (annotate) - [select for diffs]
Modified Sun Feb 20 07:38:31 2011 UTC (7 years, 7 months ago) by feinerer
File length: 567 byte(s)
Diff to previous 1121
Use document language for stemDocument().

Revision 1121 - (view) (download) (annotate) - [select for diffs]
Modified Thu Feb 17 17:13:45 2011 UTC (7 years, 7 months ago) by feinerer
File length: 569 byte(s)
Diff to previous 1117
Bug fix. Use language argument for stemDocument().

Revision 1117 - (view) (download) (annotate) - [select for diffs]
Modified Fri Feb 4 20:44:37 2011 UTC (7 years, 7 months ago) by feinerer
File length: 569 byte(s)
Diff to previous 1114
Sources now store strings and connections instead of unevaluated calls. Improve documentation.

Revision 1114 - (view) (download) (annotate) - [select for diffs]
Modified Fri Nov 26 14:05:54 2010 UTC (7 years, 10 months ago) by feinerer
File length: 569 byte(s)
Diff to previous 1113
Allow init and exit hooks for readers

Revision 1113 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 11 15:22:22 2010 UTC (7 years, 10 months ago) by feinerer
File length: 569 byte(s)
Diff to previous 1110
First draft of words()

Revision 1110 - (view) (download) (annotate) - [select for diffs]
Modified Fri Oct 29 13:59:52 2010 UTC (7 years, 10 months ago) by feinerer
File length: 569 byte(s)
Diff to previous 1108
Add OpenOffice reader, getTokenizer() lists available tokenizers

Revision 1108 - (view) (download) (annotate) - [select for diffs]
Modified Fri Oct 22 18:32:47 2010 UTC (7 years, 11 months ago) by feinerer
File length: 569 byte(s)
Diff to previous 1107
Change Weighting from list element to attribute, access documents by name

Revision 1107 - (view) (download) (annotate) - [select for diffs]
Modified Mon Oct 18 09:26:16 2010 UTC (7 years, 11 months ago) by khornik
File length: 569 byte(s)
Diff to previous 1102
Improve code/docs for system requirements.

Revision 1102 - (view) (download) (annotate) - [select for diffs]
Modified Sat Oct 16 10:01:09 2010 UTC (7 years, 11 months ago) by feinerer
File length: 486 byte(s)
Diff to previous 1101
Access documents by their document ID

Revision 1101 - (view) (download) (annotate) - [select for diffs]
Modified Thu Oct 14 13:03:25 2010 UTC (7 years, 11 months ago) by feinerer
File length: 486 byte(s)
Diff to previous 1098
Update NEWS

Revision 1098 - (view) (download) (annotate) - [select for diffs]
Modified Mon Sep 27 13:44:38 2010 UTC (8 years ago) by khornik
File length: 486 byte(s)
Diff to previous 1093
New release.

Revision 1093 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 23 18:36:33 2010 UTC (8 years, 1 month ago) by feinerer
File length: 486 byte(s)
Diff to previous 1092
Allow removePunctuation parameter for termFreq() to be a function or a list

Revision 1092 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 23 18:19:53 2010 UTC (8 years, 1 month ago) by feinerer
File length: 486 byte(s)
Diff to previous 1091
Add SystemRequirements for antiword, pdfinfo, and pdftotext

Revision 1091 - (view) (download) (annotate) - [select for diffs]
Modified Thu Aug 19 08:22:22 2010 UTC (8 years, 1 month ago) by feinerer
File length: 390 byte(s)
Diff to previous 1084
Prepare for new CRAN release

Revision 1084 - (view) (download) (annotate) - [select for diffs]
Modified Fri Aug 6 21:47:23 2010 UTC (8 years, 1 month ago) by feinerer
File length: 393 byte(s)
Diff to previous 1080
Remove convert_UTF_8() (use enc2utf8() instead)

Revision 1080 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jun 17 13:47:05 2010 UTC (8 years, 3 months ago) by feinerer
File length: 393 byte(s)
Diff to previous 1075
Use all words from a dictionary when tabulating against it in a term-document matrix

Revision 1075 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jun 2 17:52:04 2010 UTC (8 years, 3 months ago) by feinerer
File length: 393 byte(s)
Diff to previous 1073
Plotting functions for Zipf's and Heaps' law

Revision 1073 - (view) (download) (annotate) - [select for diffs]
Modified Fri May 28 12:32:46 2010 UTC (8 years, 4 months ago) by feinerer
File length: 392 byte(s)
Diff to previous 1070
Use IETF language tags for language codes

Revision 1070 - (view) (download) (annotate) - [select for diffs]
Modified Tue May 18 08:58:22 2010 UTC (8 years, 4 months ago) by feinerer
File length: 392 byte(s)
Diff to previous 1068
Use element names as document IDs if provided by a source

Revision 1068 - (view) (download) (annotate) - [select for diffs]
Modified Wed May 5 10:09:47 2010 UTC (8 years, 4 months ago) by feinerer
File length: 392 byte(s)
Diff to previous 1067
Improve stem completion.

Revision 1067 - (view) (download) (annotate) - [select for diffs]
Modified Sun Apr 11 06:38:47 2010 UTC (8 years, 5 months ago) by feinerer
File length: 392 byte(s)
Diff to previous 1063
Use match() instead of %in%

Revision 1063 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 9 10:36:39 2010 UTC (8 years, 5 months ago) by feinerer
File length: 392 byte(s)
Diff to previous 1062
Sources can now provide document names

Revision 1062 - (view) (download) (annotate) - [select for diffs]
Modified Wed Apr 7 17:25:20 2010 UTC (8 years, 5 months ago) by feinerer
File length: 392 byte(s)
Diff to previous 1061
content_or_meta utility function

Revision 1061 - (view) (download) (annotate) - [select for diffs]
Modified Fri Mar 19 11:41:37 2010 UTC (8 years, 6 months ago) by feinerer
File length: 392 byte(s)
Diff to previous 1059
Extract TOPICS, LEWISSPLIT, CGISPLIT, and OLDID meta tags from Reuters-21578 documents

Revision 1059 - (view) (download) (annotate) - [select for diffs]
Modified Mon Mar 15 21:34:29 2010 UTC (8 years, 6 months ago) by feinerer
File length: 392 byte(s)
Diff to previous 1055
Depend on recent slam version.

Revision 1055 - (view) (download) (annotate) - [select for diffs]
Modified Mon Mar 15 15:55:02 2010 UTC (8 years, 6 months ago) by feinerer
File length: 391 byte(s)
Diff to previous 1054
First attempt for weightings using SMART notation.

Revision 1054 - (view) (download) (annotate) - [select for diffs]
Modified Fri Mar 12 15:56:30 2010 UTC (8 years, 6 months ago) by feinerer
File length: 391 byte(s)
Diff to previous 1048
Restore names of dimnames after subsetting.

Revision 1048 - (view) (download) (annotate) - [select for diffs]
Modified Wed Mar 3 06:14:10 2010 UTC (8 years, 6 months ago) by feinerer
File length: 391 byte(s)
Diff to previous 1047
Add General Inquirer example for sentiment analysis.

Revision 1047 - (view) (download) (annotate) - [select for diffs]
Modified Fri Feb 26 15:08:01 2010 UTC (8 years, 7 months ago) by feinerer
File length: 391 byte(s)
Diff to previous 1042
Avoid Internet access for examples in the documentation.

Revision 1042 - (view) (download) (annotate) - [select for diffs]
Modified Fri Feb 19 07:26:44 2010 UTC (8 years, 7 months ago) by feinerer
File length: 389 byte(s)
Diff to previous 1041
Prepare for new CRAN release.

Revision 1041 - (view) (download) (annotate) - [select for diffs]
Modified Thu Feb 18 06:15:15 2010 UTC (8 years, 7 months ago) by feinerer
File length: 391 byte(s)
Diff to previous 1040
Added new stem completion heuristics. Improved plot function for term-document matrices.

Revision 1040 - (view) (download) (annotate) - [select for diffs]
Modified Sat Feb 6 10:33:03 2010 UTC (8 years, 7 months ago) by feinerer
File length: 391 byte(s)
Diff to previous 1039
Depend on R (>= 2.10.0).

Revision 1039 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jan 22 13:01:33 2010 UTC (8 years, 8 months ago) by feinerer
File length: 390 byte(s)
Diff to previous 1038
Add stemDocument.character().

Revision 1038 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jan 15 12:12:41 2010 UTC (8 years, 8 months ago) by feinerer
File length: 390 byte(s)
Diff to previous 1035
Extract more meta data from Reuters Corpus Volume 1 data set.

Revision 1035 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 14 08:59:43 2010 UTC (8 years, 8 months ago) by feinerer
File length: 390 byte(s)
Diff to previous 1034
Add readRCV1asPlain reader.

Revision 1034 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jan 12 16:47:41 2010 UTC (8 years, 8 months ago) by feinerer
File length: 390 byte(s)
Diff to previous 1033
Be careful with names attribute.

Revision 1033 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jan 9 09:33:54 2010 UTC (8 years, 8 months ago) by feinerer
File length: 388 byte(s)
Diff to previous 1032
Clean up and prepare for CRAN release.

Revision 1032 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 7 12:09:51 2010 UTC (8 years, 8 months ago) by stefan7th
File length: 389 byte(s)
Diff to previous 1030
changelog, new version

Revision 1030 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jan 5 17:38:58 2010 UTC (8 years, 8 months ago) by dmeyer
File length: 389 byte(s)
Diff to previous 1029
rowSums -> row_sums due to change in slam


Revision 1029 - (view) (download) (annotate) - [select for diffs]
Modified Tue Dec 22 13:40:25 2009 UTC (8 years, 9 months ago) by feinerer
File length: 390 byte(s)
Diff to previous 1026
Use encoding argument in URISource.

Revision 1026 - (view) (download) (annotate) - [select for diffs]
Modified Fri Dec 11 10:31:42 2009 UTC (8 years, 9 months ago) by feinerer
File length: 390 byte(s)
Diff to previous 1025
Fix c.TermDocumentMatrix().

Revision 1025 - (view) (download) (annotate) - [select for diffs]
Modified Fri Dec 11 08:56:22 2009 UTC (8 years, 9 months ago) by feinerer
File length: 390 byte(s)
Diff to previous 1023
Register S3 document classes to be recognized by S4 methods.

Revision 1023 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 25 06:08:20 2009 UTC (8 years, 10 months ago) by feinerer
File length: 390 byte(s)
Diff to previous 1022
Add option to termFreq() to remove punctuation characters.

Revision 1022 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 19 21:33:19 2009 UTC (8 years, 10 months ago) by feinerer
File length: 390 byte(s)
Diff to previous 1020
Added a combine method for merging multiple term-document matrices.

Revision 1020 - (view) (download) (annotate) - [select for diffs]
Modified Tue Nov 17 09:16:13 2009 UTC (8 years, 10 months ago) by feinerer
File length: 381 byte(s)
Diff to previous 1019
Use \dontrun{} in plot.TermDocumentMatrix \examples{} section in the hope that CRAN Mac OS X builds do not fail any longer due to missing Rgraphviz dependencies.

Revision 1019 - (view) (download) (annotate) - [select for diffs]
Modified Mon Nov 16 08:20:55 2009 UTC (8 years, 10 months ago) by feinerer
File length: 381 byte(s)
Diff to previous 1018
Use whitespace oriented tokenizer instead of AlphabeticTokenizer (from RWeka) as default.

Revision 1018 - (view) (download) (annotate) - [select for diffs]
Modified Sun Nov 15 15:53:49 2009 UTC (8 years, 10 months ago) by feinerer
File length: 381 byte(s)
Diff to previous 1017
Fix bug in removeWords(). Refactoring of term-document matrix constructor. Clean up of defunct functions.

Revision 1017 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 12 16:18:54 2009 UTC (8 years, 10 months ago) by feinerer
File length: 381 byte(s)
Diff to previous 1015
Improve DirSource().

Revision 1015 - (view) (download) (annotate) - [select for diffs]
Modified Sat Nov 7 11:15:19 2009 UTC (8 years, 10 months ago) by feinerer
File length: 381 byte(s)
Diff to previous 1014
Avoid prefixes from named documents when building a term-document matrix.

Revision 1014 - (view) (download) (annotate) - [select for diffs]
Modified Tue Oct 27 15:14:55 2009 UTC (8 years, 11 months ago) by feinerer
File length: 379 byte(s)
Diff to previous 1013
Update version for CRAN upload.

Revision 1013 - (view) (download) (annotate) - [select for diffs]
Modified Wed Oct 21 12:34:39 2009 UTC (8 years, 11 months ago) by feinerer
File length: 381 byte(s)
Diff to previous 1011
Improve regular expressions in removeWords().

Revision 1011 - (view) (download) (annotate) - [select for diffs]
Modified Mon Oct 19 12:20:43 2009 UTC (8 years, 11 months ago) by feinerer
File length: 381 byte(s)
Diff to previous 1010
Allow lower case Dublin Core tags.

Revision 1010 - (view) (download) (annotate) - [select for diffs]
Modified Fri Oct 9 12:48:37 2009 UTC (8 years, 11 months ago) by feinerer
File length: 381 byte(s)
Diff to previous 1009
Use xmlChildren().

Revision 1009 - (view) (download) (annotate) - [select for diffs]
Modified Sat Oct 3 07:00:48 2009 UTC (8 years, 11 months ago) by feinerer
File length: 381 byte(s)
Diff to previous 1007
Fix typo.

Revision 1007 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 15 18:02:44 2009 UTC (9 years ago) by feinerer
File length: 381 byte(s)
Diff to previous 1004
Fix generated file names.

Revision 1004 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 8 10:28:28 2009 UTC (9 years ago) by feinerer
File length: 377 byte(s)
Diff to previous 1003
Improve vignette.

Revision 1003 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 8 06:00:14 2009 UTC (9 years ago) by feinerer
File length: 377 byte(s)
Diff to previous 1001
Remove extra LICENCE file, as we want GPL.

Revision 1001 - (view) (download) (annotate) - [select for diffs]
Modified Mon Sep 7 20:23:44 2009 UTC (9 years ago) by feinerer
File length: 392 byte(s)
Diff to previous 996
Add copyright and licence statements.

Revision 996 - (view) (download) (annotate) - [select for diffs]
Modified Mon Sep 7 08:27:30 2009 UTC (9 years ago) by feinerer
File length: 372 byte(s)
Diff to previous 993
Small fix in meta().

Revision 993 - (view) (download) (annotate) - [select for diffs]
Modified Sun Sep 6 17:51:08 2009 UTC (9 years ago) by feinerer
File length: 372 byte(s)
Diff to previous 988
Update NEWS.

Revision 988 - (view) (download) (annotate) - [select for diffs]
Modified Fri Sep 4 12:27:12 2009 UTC (9 years ago) by feinerer
File length: 372 byte(s)
Diff to previous 987
Update documentation.

Revision 987 - (view) (download) (annotate) - [select for diffs]
Modified Wed Sep 2 17:54:45 2009 UTC (9 years ago) by feinerer
File length: 386 byte(s)
Diff to previous 986
Update documentation.

Revision 986 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 1 15:33:30 2009 UTC (9 years ago) by feinerer
File length: 386 byte(s)
Diff to previous 985
Further changes due to S3 class system.

Revision 985 - (view) (download) (annotate) - [select for diffs]
Modified Thu Aug 27 18:09:05 2009 UTC (9 years, 1 month ago) by feinerer
File length: 386 byte(s)
Diff to previous 981
Use S3 instead of S4 class system.

Revision 981 - (view) (download) (annotate) - [select for diffs]
Modified Fri Aug 7 09:04:37 2009 UTC (9 years, 1 month ago) by feinerer
File length: 399 byte(s)
Diff to previous 973
Factor out mail handling functionality to tm.plugin.mail package.

Revision 973 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jul 4 08:10:25 2009 UTC (9 years, 2 months ago) by feinerer
File length: 399 byte(s)
Diff to previous 972
Rename readNewsgroup to readMail.

Revision 972 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 3 16:16:59 2009 UTC (9 years, 2 months ago) by feinerer
File length: 399 byte(s)
Diff to previous 969
Move removeCitation, removeMultipart, and removeSignature to the tau package.

Revision 969 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jun 30 09:31:12 2009 UTC (9 years, 2 months ago) by feinerer
File length: 395 byte(s)
Diff to previous 968
Imports slam (instead of Depends).

Revision 968 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jun 30 07:08:54 2009 UTC (9 years, 2 months ago) by feinerer
File length: 387 byte(s)
Diff to previous 963
Remove internal tm functions provided by slam.

Revision 963 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jun 29 07:01:19 2009 UTC (9 years, 2 months ago) by feinerer
File length: 376 byte(s)
Diff to previous 962
Rename SCorpus to VCorpus (Volatile Corpus).

Revision 962 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jun 28 15:52:33 2009 UTC (9 years, 3 months ago) by feinerer
File length: 376 byte(s)
Diff to previous 960
Fix documentation.

Revision 960 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jun 26 17:43:45 2009 UTC (9 years, 3 months ago) by feinerer
File length: 376 byte(s)
Diff to previous 959
Add slam dependency and readReut21578XMLasPlain reader.

Revision 959 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jun 17 18:22:35 2009 UTC (9 years, 3 months ago) by feinerer
File length: 370 byte(s)
Diff to previous 958
Fix character(0) handling in stemDoc().

Revision 958 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jun 13 06:06:42 2009 UTC (9 years, 3 months ago) by feinerer
File length: 370 byte(s)
Diff to previous 957
Code cleanup.

Revision 957 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jun 12 12:47:57 2009 UTC (9 years, 3 months ago) by feinerer
File length: 370 byte(s)
Diff to previous 956
Pretty print.

Revision 956 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jun 12 06:41:09 2009 UTC (9 years, 3 months ago) by gruen
File length: 370 byte(s)
Diff to previous 954
year 2009

Revision 954 - (view) (download) (annotate) - [select for diffs]
Modified Wed May 27 18:33:32 2009 UTC (9 years, 4 months ago) by feinerer
File length: 370 byte(s)
Diff to previous 952
Handle empty matrices gracefully.

Revision 952 - (view) (download) (annotate) - [select for diffs]
Modified Mon May 18 13:43:01 2009 UTC (9 years, 4 months ago) by feinerer
File length: 370 byte(s)
Diff to previous 950
Further work on FCorpus integration.

Revision 950 - (view) (download) (annotate) - [select for diffs]
Modified Thu May 14 15:17:18 2009 UTC (9 years, 4 months ago) by feinerer
File length: 370 byte(s)
Diff to previous 946
Experimental FCorpus (fast corpus).

Revision 946 - (view) (download) (annotate) - [select for diffs]
Modified Wed May 13 18:07:35 2009 UTC (9 years, 4 months ago) by feinerer
File length: 370 byte(s)
Diff to previous 945
A lot of major improvements (see NEWS).

Revision 945 - (view) (download) (annotate) - [select for diffs]
Modified Mon May 4 10:57:01 2009 UTC (9 years, 4 months ago) by feinerer
File length: 461 byte(s)
Diff to previous 942
Export some simple_triplet_matrix functions.

Revision 942 - (view) (download) (annotate) - [select for diffs]
Modified Tue Apr 28 11:02:24 2009 UTC (9 years, 5 months ago) by feinerer
File length: 459 byte(s)
Diff to previous 941
Adapt tf-idf to new matrix format.

Revision 941 - (view) (download) (annotate) - [select for diffs]
Modified Mon Apr 27 15:36:43 2009 UTC (9 years, 5 months ago) by feinerer
File length: 459 byte(s)
Diff to previous 938
Create two distinct classes for term-document and document-term matrices.

Revision 938 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 25 19:05:50 2009 UTC (9 years, 5 months ago) by feinerer
File length: 459 byte(s)
Diff to previous 937
Get rid of Matrix package dependency.

Revision 937 - (view) (download) (annotate) - [select for diffs]
Modified Thu Apr 16 21:09:49 2009 UTC (9 years, 5 months ago) by feinerer
File length: 469 byte(s)
Diff to previous 930
Documentation update. Remove some require() calls.

Revision 930 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 11 08:49:37 2009 UTC (9 years, 5 months ago) by feinerer
File length: 469 byte(s)
Diff to previous 929
Fix code/documentation mismatch in vignette.

Revision 929 - (view) (download) (annotate) - [select for diffs]
Modified Thu Apr 9 06:22:21 2009 UTC (9 years, 5 months ago) by feinerer
File length: 469 byte(s)
Diff to previous 928
Always use Snowball for stemming.

Revision 928 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 4 18:27:35 2009 UTC (9 years, 5 months ago) by feinerer
File length: 480 byte(s)
Diff to previous 926
Update documentation.

Revision 926 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 4 06:50:02 2009 UTC (9 years, 5 months ago) by feinerer
File length: 480 byte(s)
Diff to previous 923
tmReduce() allows to combine multiple maps into one transformation.

Revision 923 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 3 08:07:20 2009 UTC (9 years, 5 months ago) by feinerer
File length: 480 byte(s)
Diff to previous 922
Further work on new TermDocumentMatrix.

Revision 922 - (view) (download) (annotate) - [select for diffs]
Modified Tue Mar 31 16:41:02 2009 UTC (9 years, 5 months ago) by feinerer
File length: 480 byte(s)
Diff to previous 919
Fix invalid slot access in subset method for TermDocumentMatrix.

Revision 919 - (view) (download) (annotate) - [select for diffs]
Modified Sat Mar 28 17:13:29 2009 UTC (9 years, 6 months ago) by feinerer
File length: 480 byte(s)
Diff to previous 917
Finished vignette 'Extensions: How to Handle Custom File Formats'.

Revision 917 - (view) (download) (annotate) - [select for diffs]
Modified Fri Mar 27 11:55:45 2009 UTC (9 years, 6 months ago) by feinerer
File length: 480 byte(s)
Diff to previous 915
Update documentation.

Revision 915 - (view) (download) (annotate) - [select for diffs]
Modified Wed Mar 25 20:04:35 2009 UTC (9 years, 6 months ago) by feinerer
File length: 480 byte(s)
Diff to previous 914
Improve readCustom().

Revision 914 - (view) (download) (annotate) - [select for diffs]
Modified Tue Mar 24 20:10:57 2009 UTC (9 years, 6 months ago) by feinerer
File length: 480 byte(s)
Diff to previous 909
Use readXML() for readGmane().

Revision 909 - (view) (download) (annotate) - [select for diffs]
Modified Sun Mar 22 12:45:59 2009 UTC (9 years, 6 months ago) by feinerer
File length: 480 byte(s)
Diff to previous 904
Sources now can be vectorized.

Revision 904 - (view) (download) (annotate) - [select for diffs]
Modified Sat Mar 21 08:15:11 2009 UTC (9 years, 6 months ago) by feinerer
File length: 480 byte(s)
Diff to previous 900
No longer try to start a MPI cluster in .onLoad().

Revision 900 - (view) (download) (annotate) - [select for diffs]
Modified Fri Mar 20 16:50:27 2009 UTC (9 years, 6 months ago) by feinerer
File length: 480 byte(s)
Diff to previous 895
Add URL to DESCRIPTION. Use Reduce() function.

Revision 895 - (view) (download) (annotate) - [select for diffs]
Modified Tue Mar 10 17:59:34 2009 UTC (9 years, 6 months ago) by feinerer
File length: 442 byte(s)
Diff to previous 886
Add pattern and ignore.case arguments to DirSource constructor.

Revision 886 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 29 22:47:34 2009 UTC (9 years, 7 months ago) by feinerer
File length: 442 byte(s)
Diff to previous 885
Speed up package loading (Depends -> Suggests).

Revision 885 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 29 09:34:44 2009 UTC (9 years, 7 months ago) by stefan7th
File length: 436 byte(s)
Copied from: pkg/tm/DESCRIPTION revision 884
Diff to previous 884
moved package to /pkg

Revision 884 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jan 28 10:24:27 2009 UTC (9 years, 7 months ago) by stefan7th
Original Path: pkg/tm/DESCRIPTION
File length: 436 byte(s)
Diff to previous 882
R-Forge transition completed

Revision 882 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 8 15:35:49 2009 UTC (9 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 436 byte(s)
Diff to previous 881
The readNewsgroup() reader function can now be configured for

Revision 881 - (view) (download) (annotate) - [select for diffs]
Modified Sat Dec 20 09:06:13 2008 UTC (9 years, 9 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 434 byte(s)
Diff to previous 877
Fix off-by-one error in convertMboxEml() function.

Revision 877 - (view) (download) (annotate) - [select for diffs]
Modified Tue Dec 16 11:31:47 2008 UTC (9 years, 9 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 436 byte(s)
Diff to previous 875
Sort row indices when generating a term-document matrix (fixes a problem with the Matrix package).

Revision 875 - (view) (download) (annotate) - [select for diffs]
Modified Sat Dec 6 13:25:03 2008 UTC (9 years, 9 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 436 byte(s)
Diff to previous 874
Fixed non-standard call evaluation.

Revision 874 - (view) (download) (annotate) - [select for diffs]
Modified Sat Nov 29 16:24:45 2008 UTC (9 years, 9 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 436 byte(s)
Diff to previous 873
New URISource.

Revision 873 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 27 08:26:53 2008 UTC (9 years, 10 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 436 byte(s)
Diff to previous 872
Code refactoring for sources.

Revision 872 - (view) (download) (annotate) - [select for diffs]
Modified Tue Nov 25 16:36:08 2008 UTC (9 years, 10 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 436 byte(s)
Diff to previous 870
Use tryCatch() to handle misconfigured Rmpi installations more gracefully.

Revision 870 - (view) (download) (annotate) - [select for diffs]
Modified Mon Nov 10 15:29:22 2008 UTC (9 years, 10 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 434 byte(s)
Diff to previous 869
Fix documentation and codoc mismatches.

Revision 869 - (view) (download) (annotate) - [select for diffs]
Modified Sat Nov 8 09:16:37 2008 UTC (9 years, 10 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 434 byte(s)
Diff to previous 868
Sources now have a Length slot. Knowing the length in advance makes corpus construction a lot faster (~ 8 times faster).

Revision 868 - (view) (download) (annotate) - [select for diffs]
Modified Mon Nov 3 16:43:04 2008 UTC (9 years, 10 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 434 byte(s)
Diff to previous 866
Add Rmpi to Suggests of tm.

Revision 866 - (view) (download) (annotate) - [select for diffs]
Modified Sun Nov 2 09:11:00 2008 UTC (9 years, 10 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 428 byte(s)
Diff to previous 865
Fixed variable binding warning and signature mismatch in documentation.

Revision 865 - (view) (download) (annotate) - [select for diffs]
Modified Sun Aug 3 13:20:22 2008 UTC (10 years, 1 month ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 428 byte(s)
Diff to previous 862
Introduce name abbreviations for weighting functions.

Revision 862 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jul 24 10:41:25 2008 UTC (10 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 428 byte(s)
Diff to previous 861
Use namespace.

Revision 861 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jul 24 09:55:09 2008 UTC (10 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 434 byte(s)
Diff to previous 860
tmIndex(), tmFilter(), tmMap(), and TermDocMatrix() now use a MPI cluster if available.

Revision 860 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 18 05:05:20 2008 UTC (10 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 424 byte(s)
Diff to previous 857
Removed some forgotten debug print out.

Revision 857 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jul 8 16:01:47 2008 UTC (10 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 429 byte(s)
Diff to previous 856
Removed tm-internal. Better (consistent) naming for dictionary functions.

Revision 856 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jun 6 11:45:39 2008 UTC (10 years, 3 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 429 byte(s)
Diff to previous 854
Improved meta data extraction from Reuters Corpus Volume 1 documents.

Revision 854 - (view) (download) (annotate) - [select for diffs]
Modified Sun May 25 13:15:06 2008 UTC (10 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 429 byte(s)
Diff to previous 853
searchFullText is now the default function used for tmFilter and tmIndex.

Revision 853 - (view) (download) (annotate) - [select for diffs]
Modified Sun May 18 13:09:35 2008 UTC (10 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 429 byte(s)
Diff to previous 852
Improved stem completion. Some documentation fixes.

Revision 852 - (view) (download) (annotate) - [select for diffs]
Modified Wed May 14 14:35:32 2008 UTC (10 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 429 byte(s)
Diff to previous 850
Minor documentation fix.

Revision 850 - (view) (download) (annotate) - [select for diffs]
Modified Thu May 1 16:37:19 2008 UTC (10 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 429 byte(s)
Diff to previous 849
Removed Encoding tag in DESCRIPTION.

Revision 849 - (view) (download) (annotate) - [select for diffs]
Modified Wed Apr 30 06:05:48 2008 UTC (10 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 445 byte(s)
Diff to previous 848
Removed PDF example from vignette to avoid R CMD check warnings under Windows.

Revision 848 - (view) (download) (annotate) - [select for diffs]
Modified Tue Apr 29 16:51:43 2008 UTC (10 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 443 byte(s)
Diff to previous 847
Improved vignette.

Revision 847 - (view) (download) (annotate) - [select for diffs]
Modified Sun Apr 27 16:16:47 2008 UTC (10 years, 5 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 443 byte(s)
Diff to previous 843
Improved manuals.

Revision 843 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 25 12:31:51 2008 UTC (10 years, 5 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 445 byte(s)
Diff to previous 839
Added Dublin Core documentation.

Revision 839 - (view) (download) (annotate) - [select for diffs]
Modified Wed Apr 23 12:35:01 2008 UTC (10 years, 5 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 445 byte(s)
Diff to previous 837
Added documentation for VectorSource.

Revision 837 - (view) (download) (annotate) - [select for diffs]
Modified Wed Apr 23 09:16:25 2008 UTC (10 years, 5 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 445 byte(s)
Diff to previous 836
Improved show methods.

Revision 836 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 19 17:08:07 2008 UTC (10 years, 5 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 445 byte(s)
Diff to previous 834
Improved meta data handling. Added coerce method from list to corpus. Updated CITATION file.

Revision 834 - (view) (download) (annotate) - [select for diffs]
Modified Wed Mar 26 13:57:07 2008 UTC (10 years, 6 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 445 byte(s)
Diff to previous 833
Commented out faulty code parts (relevant under Windows) in vignette.

Revision 833 - (view) (download) (annotate) - [select for diffs]
Modified Fri Mar 21 10:55:11 2008 UTC (10 years, 6 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 445 byte(s)
Diff to previous 832
Included improvements suggested by Christian Buchta. Added CITATION file.

Revision 832 - (view) (download) (annotate) - [select for diffs]
Modified Wed Mar 12 12:59:48 2008 UTC (10 years, 6 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 358 byte(s)
Diff to previous 831
Added VectorSource.

Revision 831 - (view) (download) (annotate) - [select for diffs]
Modified Wed Mar 12 09:10:46 2008 UTC (10 years, 6 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 358 byte(s)
Diff to previous 829
Fixed bug in [[<- (reported by Christian Buchta).

Revision 829 - (view) (download) (annotate) - [select for diffs]
Modified Mon Mar 10 22:55:39 2008 UTC (10 years, 6 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 358 byte(s)
Diff to previous 828
First version of working lazy mapping.

Revision 828 - (view) (download) (annotate) - [select for diffs]
Modified Sun Mar 9 07:47:15 2008 UTC (10 years, 6 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 358 byte(s)
Diff to previous 827
Some preliminary code for lazy mapping.

Revision 827 - (view) (download) (annotate) - [select for diffs]
Modified Mon Feb 25 16:52:55 2008 UTC (10 years, 7 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 358 byte(s)
Diff to previous 825
Small bug fix.

Revision 825 - (view) (download) (annotate) - [select for diffs]
Modified Sat Feb 23 09:47:28 2008 UTC (10 years, 7 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 358 byte(s)
Diff to previous 824
Update documentation: language codes should be in ISO 639-1 format.

Revision 824 - (view) (download) (annotate) - [select for diffs]
Modified Sun Feb 10 09:58:43 2008 UTC (10 years, 7 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 383 byte(s)
Diff to previous 822
Documentation update.

Revision 822 - (view) (download) (annotate) - [select for diffs]
Modified Wed Feb 6 13:06:15 2008 UTC (10 years, 7 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 379 byte(s)
Diff to previous 820
Renamed completeStems to stemCompletion (suggested by David Meyer).

Revision 820 - (view) (download) (annotate) - [select for diffs]
Modified Fri Feb 1 10:05:21 2008 UTC (10 years, 7 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 379 byte(s)
Diff to previous 816
Documentation update.

Revision 816 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 24 14:36:41 2008 UTC (10 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 379 byte(s)
Diff to previous 813
Renamed TextDocCol to Corpus, and Corpus to Content.

Revision 813 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jan 22 18:46:13 2008 UTC (10 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 384 byte(s)
Diff to previous 810
New function meta() for consistent access to meta data of document collections, repositories, and texts.

Revision 810 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 21 17:14:06 2008 UTC (10 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 384 byte(s)
Diff to previous 808
Better support for encodings.

Revision 808 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jan 13 16:18:27 2008 UTC (10 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 384 byte(s)
Diff to previous 807
Fixed bug regarding default reader selection when no reader argument is given.

Revision 807 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jan 5 10:35:53 2008 UTC (10 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 384 byte(s)
Diff to previous 806
CSVSource now uses read.csv instead of scan internally.

Revision 806 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jan 2 10:29:14 2008 UTC (10 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 384 byte(s)
Diff to previous 805
Modular TermDocMatrix constructor is now default.

Revision 805 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jan 1 14:10:40 2008 UTC (10 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 384 byte(s)
Diff to previous 802
Added function (getReaders) returning all available reader functions.

Revision 802 - (view) (download) (annotate) - [select for diffs]
Modified Sun Dec 2 09:28:41 2007 UTC (10 years, 9 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 384 byte(s)
Diff to previous 799
See ChangeLog.

Revision 799 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 29 11:05:23 2007 UTC (10 years, 9 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 368 byte(s)
Diff to previous 796
Better handling of empty arguments in TextDocCol. Exported readDOC.

Revision 796 - (view) (download) (annotate) - [select for diffs]
Modified Tue Nov 6 15:22:34 2007 UTC (10 years, 10 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 368 byte(s)
Diff to previous 795
Correct processing of empty documents.

Revision 795 - (view) (download) (annotate) - [select for diffs]
Modified Sat Oct 27 09:14:35 2007 UTC (10 years, 11 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 368 byte(s)
Diff to previous 790
Updated documentation

Revision 790 - (view) (download) (annotate) - [select for diffs]
Modified Sun Oct 21 08:27:13 2007 UTC (10 years, 11 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 367 byte(s)
Diff to previous 785
Exported termFreq to NAMESPACE. New modular constructor for TermDocMatrix (called TermDocMatrix2 at the moment).

Revision 785 - (view) (download) (annotate) - [select for diffs]
Modified Sat Oct 13 10:46:28 2007 UTC (10 years, 11 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 367 byte(s)
Diff to previous 780
Added plot function for term-document matrices.

Revision 780 - (view) (download) (annotate) - [select for diffs]
Modified Sat Sep 29 13:24:17 2007 UTC (10 years, 11 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 356 byte(s)
Diff to previous 777
Added three transformations often used for e-mail analyses.

Revision 777 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 28 07:19:12 2007 UTC (11 years, 1 month ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 356 byte(s)
Diff to previous 776
Function generators are now real S4 classes instead of S3 attributes.

Revision 776 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jul 29 15:27:41 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 356 byte(s)
Diff to previous 775
Removed manual pdftotext and pdfinfo checks (the system call gives a warning anyway).

Revision 775 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jul 28 13:57:02 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 356 byte(s)
Diff to previous 774
Added conversion (asPlain) from StructuredTextDocuments to PlainTextDocuments.

Revision 774 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jul 21 16:25:54 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 356 byte(s)
Diff to previous 773
Added convenience methods for term-document matrices.

Revision 773 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jul 21 12:05:08 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 356 byte(s)
Diff to previous 772
Vignette: readPDF is only called if pdftotext and pdfinfo are installed.

Revision 772 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 20 14:00:58 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 356 byte(s)
Diff to previous 771
Updated TODO list.

Revision 771 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jul 19 07:59:20 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 354 byte(s)
Diff to previous 770
Updated version for new CRAN release.

Revision 770 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jul 17 12:41:04 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 354 byte(s)
Diff to previous 769
Improved TermDocMatrix's efficiency. Kudos to Christian Buchta.

Revision 769 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jul 15 16:31:59 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 340 byte(s)
Diff to previous 766
Fixed bug in tmUpdate.

Revision 766 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jul 14 08:46:23 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 340 byte(s)
Diff to previous 765
Added PDF reader based on pdftotext and pdfinfo.

Revision 765 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 13 15:53:45 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 340 byte(s)
Diff to previous 764
See ChangeLog.

Revision 764 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 11 17:36:17 2007 UTC (11 years, 2 months ago) by hornik
Original Path: trunk/tm/DESCRIPTION
File length: 332 byte(s)
Diff to previous 763
Canonicalize license info.

Revision 763 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 11 11:56:44 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 340 byte(s)
Diff to previous 762
Changed from cba to new proxy package for computing (dis)similarities.

Revision 762 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 11 06:46:17 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 761
Updated vignette.

Revision 761 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jul 10 14:59:57 2007 UTC (11 years, 2 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 760
Updated vignette.

Revision 760 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jun 21 22:40:15 2007 UTC (11 years, 3 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 757
require() uses the quietly option to suppress loading messages.

Revision 757 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jun 7 17:41:56 2007 UTC (11 years, 3 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 756
Added classes for Reuters21578 XML and RCV1 documents.

Revision 756 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jun 6 17:12:11 2007 UTC (11 years, 3 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 755
Fixed some typos in vignette.

Revision 755 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jun 3 17:20:40 2007 UTC (11 years, 3 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 754
Added replaceWords function.

Revision 754 - (view) (download) (annotate) - [select for diffs]
Modified Tue May 22 18:11:22 2007 UTC (11 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 752
Fixed documentation.

Revision 752 - (view) (download) (annotate) - [select for diffs]
Modified Sat May 19 22:39:04 2007 UTC (11 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 751
Small bug fix in textvector(). Added new function removeSparseTerms().

Revision 751 - (view) (download) (annotate) - [select for diffs]
Modified Tue May 15 18:01:43 2007 UTC (11 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 750
Fixed documentation for tmUpdate.

Revision 750 - (view) (download) (annotate) - [select for diffs]
Modified Fri May 11 16:46:15 2007 UTC (11 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 749
Fixed documentation.

Revision 749 - (view) (download) (annotate) - [select for diffs]
Modified Tue May 8 17:26:09 2007 UTC (11 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 748
StructuredTextDocument inherits from TextDocument.

Revision 748 - (view) (download) (annotate) - [select for diffs]
Modified Fri May 4 18:52:42 2007 UTC (11 years, 4 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 747
findFreqTerms operates now (very) efficiently on (big) sparse matrices. Thanks to Martin Maechler.

Revision 747 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 27 18:16:53 2007 UTC (11 years, 5 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 745
Removed dbDisconnect calls since deprecated by last filehash release.

Revision 745 - (view) (download) (annotate) - [select for diffs]
Modified Mon Apr 23 00:57:26 2007 UTC (11 years, 5 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 336 byte(s)
Diff to previous 741
Fixed dimnames in sparse matrix. Updated date in DESCRIPTION.

Revision 741 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 21 18:35:16 2007 UTC (11 years, 5 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 336 byte(s)
Diff to previous 732
Switched back to filehash instead of filehashSQLite.

Revision 732 - (view) (download) (annotate) - [select for diffs]
Modified Wed Apr 11 18:11:54 2007 UTC (11 years, 5 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 342 byte(s)
Diff to previous 730
Added stopwords for various languages.

Revision 730 - (view) (download) (annotate) - [select for diffs]
Modified Wed Apr 11 02:15:10 2007 UTC (11 years, 5 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 344 byte(s)
Diff to previous 716
Updated documentation.

Revision 716 - (view) (download) (annotate) - [select for diffs]
Modified Thu Mar 15 17:22:39 2007 UTC (11 years, 6 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 338 byte(s)
Diff to previous 713
Some improvements for TermDocMatrix.

Revision 713 - (view) (download) (annotate) - [select for diffs]
Modified Wed Mar 14 13:44:11 2007 UTC (11 years, 6 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 330 byte(s)
Diff to previous 712
Added Snowball support. Added function returning stopwords (English, German, French).

Revision 712 - (view) (download) (annotate) - [select for diffs]
Modified Sun Mar 4 15:18:36 2007 UTC (11 years, 6 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 320 byte(s)
Diff to previous 711
Started to implement database support to optimize RAM usage, i.e., minimize RAM demand if necessary.

Revision 711 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jan 30 13:03:55 2007 UTC (11 years, 7 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 310 byte(s)
Diff to previous 708
Fixed bug in documentation.

Revision 708 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 22 10:34:12 2007 UTC (11 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 310 byte(s)
Diff to previous 704
Fixed bug in documentation.

Revision 704 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jan 12 10:05:15 2007 UTC (11 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 310 byte(s)
Diff to previous 702
Update to version 0.1-1.

Revision 702 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jan 9 09:39:33 2007 UTC (11 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 308 byte(s)
Diff to previous 698
wordStem now explicitly uses Rstem namespace.

Revision 698 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jan 6 17:05:44 2007 UTC (11 years, 8 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 297 byte(s)
Diff to previous 693
Changes due to Kurt's review.

Revision 693 - (view) (download) (annotate) - [select for diffs]
Modified Fri Dec 22 13:21:30 2006 UTC (11 years, 9 months ago) by feinerer
Original Path: trunk/tm/DESCRIPTION
File length: 297 byte(s)
Diff to previous 690
Renamed textmin to tm directory since the package name changed.

Revision 690 - (view) (download) (annotate) - [select for diffs]
Modified Sat Dec 16 17:22:56 2006 UTC (11 years, 9 months ago) by feinerer
Original Path: trunk/textmin/DESCRIPTION
File length: 297 byte(s)
Diff to previous 78
Renamed package to 'tm'. Updated documentation (man) for CRAN release.

Revision 78 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 29 14:56:36 2006 UTC (11 years, 9 months ago) by zeileis
Original Path: trunk/textmin/DESCRIPTION
File length: 306 byte(s)
Diff to previous 67
removed old repos structure, now only R packages

Revision 67 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 1 17:29:59 2006 UTC (11 years, 10 months ago) by feinerer
Original Path: trunk/R/textmin/DESCRIPTION
File length: 306 byte(s)
Diff to previous 63
See ChangeLog

Revision 63 - (view) (download) (annotate) - [select for diffs]
Modified Thu Oct 26 14:59:09 2006 UTC (11 years, 11 months ago) by feinerer
Original Path: trunk/R/textmin/DESCRIPTION
File length: 315 byte(s)
Diff to previous 47
See ChangeLog.

Revision 47 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jul 10 12:22:35 2006 UTC (12 years, 2 months ago) by feinerer
Original Path: trunk/R/textmin/DESCRIPTION
File length: 310 byte(s)
Copied from: trunk/R/trunk/DESCRIPTION revision 44
Diff to previous 46
Renamed tm to textmin directory.

Revision 46 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 5 18:08:41 2006 UTC (12 years, 2 months ago) by meyer
Original Path: trunk/R/tm/DESCRIPTION
File length: 310 byte(s)
Copied from: trunk/R/trunk/DESCRIPTION revision 44
Diff to previous 45
move


Revision 45 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 5 17:27:29 2006 UTC (12 years, 2 months ago) by meyer
Original Path: trunk/R/trunk/tm/DESCRIPTION
File length: 310 byte(s)
Copied from: trunk/R/trunk/DESCRIPTION revision 44
Diff to previous 28
move in subdir


Revision 28 - (view) (download) (annotate) - [select for diffs]
Modified Tue Dec 6 13:46:33 2005 UTC (12 years, 9 months ago) by feinerer
Original Path: trunk/R/trunk/DESCRIPTION
File length: 310 byte(s)
Diff to previous 25
See ChangeLog

Revision 25 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 30 18:53:50 2005 UTC (12 years, 9 months ago) by feinerer
Original Path: trunk/R/trunk/DESCRIPTION
File length: 294 byte(s)
Diff to previous 16
See ChangeLog

Revision 16 - (view) (download) (annotate) - [select for diffs]
Added Fri Oct 7 09:42:57 2005 UTC (12 years, 11 months ago) by feinerer
Original Path: trunk/R/trunk/DESCRIPTION
File length: 272 byte(s)
Textmatrix code runs. Simple k-means text clustering (similarity based upon word frequences) works.

This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.

  Diffs between and
  Type of Diff should be a

Sort log by:

R-Forge@R-project.org
ViewVC Help
Powered by ViewVC 1.0.0  
Thanks to:
Vienna University of Economics and Business University of Wisconsin - Madison Powered By FusionForge