book

Machine Learning with R, the tidyverse, and mlr

by Hefin Rhys

April 2020

Intermediate to advanced

536 pages

16h 55m

English

Manning Publications

Read now

Unlock full access

1.1. What is machine learning?1.2. Classes of machine learning algorithms1.3. Thinking about the ethical impact of machine learning1.4. Why use R for machine learning?1.5. Which datasets will we use?1.6. What will you learn in this book?Summary

2.1. What is the tidyverse, and what is tidy data?2.2. Loading the tidyverse2.3. What the tibble package is and what it does2.4. What the dplyr package is and what it does2.5. What the ggplot2 package is and what it does2.6. What the tidyr package is and what it does2.7. What the purrr package is and what it doesSummarySolutions to exercises
3.1. What is the k-nearest neighbors algorithm?3.2. Building your first kNN model3.3. Balancing two sources of model error: The bias-variance trade-off3.4. Using cross-validation to tell if we’re overfitting or underfitting3.5. Cross-validating our kNN model3.6. What algorithms can learn, and what they must be told: Parameters- s and hyperparameters3.7. Tuning k to improve the model3.8. Strengths and weaknesses of kNNSummarySolutions to exercises
4.1. What is logistic regression?4.2. Building your first logistic regression model4.3. Cross-validating the logistic regression model4.4. Interpreting the model: The odds ratio4.5. Using our model to make predictions4.6. Strengths and weaknesses of logistic regressionSummarySolutions to exercises
5.1. What is discriminant analysis?5.2. Building your first linear and quadratic discriminant models5.3. Strengths and weaknesses of LDA and QDASummarySolutions to exercises
6.1. What is the naive Bayes algorithm?6.2. Building your first naive Bayes model6.3. Strengths and weaknesses of naive Bayes6.4. What is the support vector machine (SVM) algorithm?6.5. Building your first SVM model6.6. Cross-validating our SVM model6.7. Strengths and weaknesses of the SVM algorithmSummarySolutions to exercises
7.1. What is the recursive partitioning algorithm?7.2. Building your first decision tree model7.3. Loading and exploring the zoo dataset7.4. Training the decision tree model7.5. Cross-validating our decision tree model7.6. Strengths and weaknesses of tree-based algorithmsSummary
8.1. Ensemble techniques: Bagging, boosting, and stacking8.2. Building your first random forest model8.3. Building your first XGBoost model8.4. Strengths and weaknesses of tree-based algorithms8.5. Benchmarking algorithms against each otherSummary
9.1. What is linear regression?9.2. Building your first linear regression model9.3. Strengths and weaknesses of linear regressionSummarySolutions to exercises
10.1. Making linear regression nonlinear with polynomial terms10.2. More flexibility: Splines and generalized additive models10.3. Building your first GAM10.4. Strengths and weaknesses of GAMsSummarySolutions to exercises
11.1. What is regularization?11.2. What is ridge regression?11.3. What is the L2 norm, and how does ridge regression use it?11.4. What is the L1 norm, and how does LASSO use it?11.5. What is elastic net?11.6. Building your first ridge, LASSO, and elastic net models11.7. Benchmarking ridge, LASSO, elastic net, and OLS against each other11.8. Strengths and weaknesses of ridge, LASSO, and elastic netSummarySolutions to exercises
12.1. Using k-nearest neighbors to predict a continuous variable12.2. Using tree-based learners to predict a continuous variable12.3. Building your first kNN regression model12.4. Building your first random forest regression model12.5. Building your first XGBoost regression model12.6. Benchmarking the kNN, random forest, and XGBoost model-building processes12.7. Strengths and weaknesses of kNN, random forest, and XGBoostSummarySolutions to exercises
13.1. Why dimension reduction?13.2. What is principal component analysis?13.3. Building your first PCA model13.4. Strengths and weaknesses of PCASummarySolutions to exercises
14.1. What is t-SNE?14.2. Building your first t-SNE embedding14.3. What is UMAP?14.4. Building your first UMAP model14.5. Strengths and weaknesses of t-SNE and UMAPSummarySolutions to exercises
15.1. Prerequisites: Grids of nodes and manifolds15.2. What are self-organizing maps?15.3. Building your first SOM15.4. What is locally linear embedding?15.5. Building your first LLE15.6. Building an LLE of our flea data15.7. Strengths and weaknesses of SOMs and LLESummarySolutions to exercises
16.1. What is k-means clustering?16.2. Building your first k-means model16.3. Strengths and weaknesses of k-means clusteringSummarySolutions to exercises
17.1. What is hierarchical clustering?17.2. Building your first agglomerative hierarchical clustering model17.3. How stable are our clusters?17.4. Strengths and weaknesses of hierarchical clusteringSummarySolutions to exercises
18.1. What is density-based clustering?18.2. Building your first DBSCAN model18.3. Building your first OPTICS model18.4. Strengths and weaknesses of density-based clusteringSummarySolutions to exercises
19.1. What is mixture model clustering?19.2. Building your first Gaussian mixture model for clustering19.3. Strengths and weaknesses of mixture model clusteringSummarySolutions to exercises
20.1. A brief recap of machine learning concepts20.2. Where can you go from here?20.3. The last word

Content preview from Machine Learning with R, the tidyverse, and mlr

Chapter 3. Classifying based on similarities with k-nearest neighbors

This chapter covers

Understanding the bias-variance trade-off
Underfitting vs. overfitting
Using cross-validation to assess model performance
Building a k-nearest neighbors classifier
Tuning hyperparameters

This is probably the most important chapter of the entire book. In it, I’m going to show you how the k-nearest neighbors (kNN) algorithm works, and we’re going to use it to classify potential diabetes patients. In addition, I’m going to use the kNN algorithm to teach you some essential concepts in machine learning that we will rely on for the rest of the book.

By the end of this chapter, not only will you understand and be able to use the kNN algorithm to make classification ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Advanced Machine Learning with R

Cory Lesmeister, Dr. Sunil Kumar Chinnamgari

Deep Learning with R

J.J. Allaire

Machine Learning with R - Third Edition

Brett Lantz

R Machine Learning Projects

Dr. Sunil Kumar Chinnamgari

Publisher Resources

ISBN: 9781617296574Publisher Support Other Publisher Website Supplemental Content Errata Page Purchase Link