C
Datasets
Besides the tiny weather family of datasets presented in Chapter 1 and artificially generated datasets in some chapters, the R code examples use a set of real datasets originating from various sources. They are all available for download from the UCI Machine Learning Repository. Except for those used by case studies in Chapter 20, the datasets do not actually have to be downloaded from the repository, since they are also available in R packages, mlbench
and datasets
. It still makes sense to check the corresponding UCI pages for some basic characteristics of the data as well as information about their origin and past usage. The table presented below lists all the UCI datasets used in this book, providing their original repository names as well R package names, where available. The corresponding links to the UCI pages can be constructed using the following simple template:
http://archive.ics.uci.edu/ml/datasets/
name
with name replaced by UCI dataset name.
Dataset | UCI name | R package/name |
Census Income | Census-Income+(KDD) | |
Communities and Crime | Communities+and+Crime | |
Cover Type | Covertype | |
Boston Housing | Housing | mlbench /BostonHousing |
Glass | Glass+Identification | mlbench /Glass |
HouseVotes84 | Congressional+Voting+Records | mlbench /HouseVotes84 |
Iris | Iris | datasets /iris |
Pima Indians Diabetes | Pima+Indians+Diabetes | mlbench /PimaIndiansDiabetes |
Soybean | Soybean+(Large) | mlbench /Soybean |
Vehicle Silhouettes | Statlog+(Vehicle+Silhouettes) | mlbench /Vehicle |
Get Data Mining Algorithms: Explained Using R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.