book

Tidy Modeling with R

by Max Kuhn, Julia Silge

July 2022

Beginner to intermediate

381 pages

9h 22m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Introduction
1. Software for Modeling
Fundamentals for Modeling SoftwareTypes of ModelsDescriptive ModelsInferential ModelsPredictive ModelsConnections Between Types of ModelsSome TerminologyHow Does Modeling Fit into the Data Analysis Process?Chapter Summary
2. A Tidyverse Primer
Tidyverse PrinciplesDesign for HumansReuse Existing Data StructuresDesign for the Pipe and Functional ProgrammingExamples of Tidyverse SyntaxChapter Summary
3. A Review of R Modeling Fundamentals
An ExampleWhat Does the R Formula Do?Why Tidiness Is Important for ModelingCombining Base R Models and the TidyverseThe tidymodels MetapackageChapter Summary
II. Modeling Basics
4. The Ames Housing Data
Exploring Features of Homes in AmesChapter Summary
5. Spending Our Data
Common Methods for Splitting DataWhat About a Validation Set?Multilevel DataOther Considerations for a Data BudgetChapter Summary
6. Fitting Models with parsnip
Create a ModelUse the Model ResultsMake Predictionsparsnip-Extension PackagesCreating Model SpecificationsChapter Summary
7. A Model Workflow
Where Does the Model Begin and End?Workflow BasicsAdding Raw Variables to the workflow()How Does a workflow() Use the Formula?Tree-Based ModelsSpecial Formulas and Inline FunctionsCreating Multiple Workflows at OnceEvaluating the Test SetChapter Summary

8. Feature Engineering with Recipes
A Simple recipe() for the Ames Housing DataUsing RecipesHow Data Are Used by the recipe()Examples of StepsEncoding Qualitative Data in a Numeric FormatInteraction TermsSpline FunctionsFeature ExtractionRow Sampling StepsGeneral TransformationsNatural Language ProcessingSkipping Steps for New DataTidy a recipe()Column RolesChapter Summary
9. Judging Model Effectiveness
Performance Metrics and InferenceRegression MetricsBinary Classification MetricsMulticlass Classification MetricsChapter Summary
III. Tools for Creating Effective Models
10. Resampling for Evaluating Performance
The Resubstitution ApproachResampling MethodsCross-ValidationRepeated Cross-ValidationLeave-One-Out Cross-ValidationMonte Carlo Cross-ValidationValidation SetsBootstrappingRolling Forecasting Origin ResamplingEstimating PerformanceParallel ProcessingSaving the Resampled ObjectsChapter Summary
11. Comparing Models with Resampling
Creating Multiple Models with Workflow SetsComparing Resampled Performance StatisticsSimple Hypothesis Testing MethodsBayesian MethodsA Random Intercept ModelThe Effect of the Amount of ResamplingChapter Summary
12. Model Tuning and the Dangers of Overfitting
Model ParametersTuning Parameters for Different Types of ModelsWhat Do We Optimize?The Consequences of Poor Parameter EstimatesTwo General Strategies for OptimizationTuning Parameters in tidymodelsChapter Summary
13. Grid Search
Regular and Nonregular GridsRegular GridsNonregular GridsEvaluating the GridFinalizing the ModelTools for Creating Tuning SpecificationsTools for Efficient Grid SearchSubmodel OptimizationParallel ProcessingBenchmarking Boosted TreesAccess to Global VariablesRacing MethodsChapter Summary
14. Iterative Search
A Support Vector Machine ModelBayesian OptimizationA Gaussian Process ModelAcquisition FunctionsThe tune_bayes() FunctionSimulated AnnealingSimulated Annealing Search ProcessThe tune_sim_anneal() FunctionChapter Summary
15. Screening Many Models
Modeling Concrete Mixture StrengthCreating the Workflow SetTuning and Evaluating the ModelsEfficiently Screening ModelsFinalizing a ModelChapter Summary
IV. Beyond the Basics
16. Dimensionality Reduction
What Problems Can Dimensionality Reduction Solve?A Picture Is Worth a Thousand…BeansA Starter RecipeRecipes in the WildPreparing a RecipeBaking the RecipeFeature Extraction TechniquesPrincipal Component AnalysisPartial Least SquaresIndependent Component AnalysisUniform Manifold Approximation and ProjectionModelingChapter Summary
17. Encoding Categorical Data
Is an Encoding Necessary?Encoding Ordinal PredictorsUsing the Outcome for Encoding PredictorsEffect Encodings in tidymodelsEffect Encodings with Partial PoolingFeature HashingMore Encoding OptionsChapter Summary
18. Explaining Models and Predictions
Software for Model ExplanationsLocal ExplanationsGlobal ExplanationsBuilding Global Explanations from Local ExplanationsBack to Beans!Chapter Summary
19. When Should You Trust Your Predictions?
Equivocal ResultsDetermining Model ApplicabilityChapter Summary
20. Ensembles of Models
Creating the Training Set for StackingBlend the PredictionsFit the Member ModelsTest Set ResultsChapter Summary
21. Inferential Analysis
Inference for Count DataComparisons with Two-Sample TestsLog-Linear ModelsA More Complex ModelMore Inferential AnalysisChapter Summary
A. Recommended Preprocessing
References
Index
About the Authors

Content preview from Tidy Modeling with R

Preface

Welcome to Tidy Modeling with R! This book is a guide to using a collection of software in the R programming language for model building called tidymodels, and it has two main goals:

First and foremost, this book provides a practical introduction to how to use these specific R packages to create models. We focus on a dialect of R called the tidyverse that is designed with a consistent, human-centered philosophy and demonstrate how the tidyverse and the tidymodels packages can be used to produce high quality statistical and machine learning models.
Second, this book will show you how to develop good methodology and statistical practices. Whenever possible, our software, documentation, and other materials attempt to prevent common pitfalls.

In Chapter 1, we outline a taxonomy for models and highlight what good software for modeling is like. The ideas and syntax of the tidyverse, which we introduce (or review) in Chapter 2, are the basis for the tidymodels approach to these challenges of methodology and practice. Chapter 3 provides a quick tour of conventional base R modeling functions and summarizes the unmet needs in that area.

After that, this book is separated into parts, starting with the basics of modeling with tidy data principles. Chapters 4–9 introduce an example data set on house prices and demonstrate how to use the fundamental tidymodels packages: recipes, parsnip, workflows, yardstick, and others.

The next part of the book moves forward with more details ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492096474Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Tidy Modeling with R

by Max Kuhn, Julia Silge

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.