July 2016
Intermediate to advanced
344 pages
10h 11m
English
Various data sets have been made available under text mining-friendly licences. Of these, a proportion have become a popular choice of data set for testing approaches to text and data mining. One benefit of using standard data sets is the ability to benchmark directly against other approaches to the same problem, which permits standardisation in evaluation.
In this section we list a number of standard data sets and competit ions, referencing the areas in which they are primarily used. This list is not exhaustive, and is intended to serve as an introduction. We indicate the availability and licencing of each data set where relevant. Although data sets are often used for more than one ...