Preface
Statistics is a subject of amazingly many uses and surprisingly few effective practitioners.
Bradley Efron and R. J. Tibshirani, An Introduction to the Bootstrap (1993)
Welcome to Behavioral Data Analysis with R and Python! That we live in the age of data has become a platitude. Engineers now routinely use data from sensors on machines and turbines to predict when these will fail and do preventive maintenance. Similarly, marketers use troves of data, from your demographic information to your past purchases, to determine which ad to serve you and when. As the phrase goes, “Data is the new oil,” and algorithms are the new combustion engine powering our economy forward.
Most books on analytics, machine learning, and data science implicitly presume that the problems that engineers and marketers are trying to solve can be handled with the same approaches and tools. Sure, the variables have different names and there is some domain-specific knowledge to acquire, but k-means clustering is k-means clustering, whether you’re clustering data about turbines or posts on social media. By adopting machine learning tools wholesale this way, businesses have often been able to accurately predict behaviors, but at the expense of a deeper and richer understanding of what’s actually going on. This has fed into the criticism of data science models as “black boxes.”
Instead of aiming for accurate but opaque predictions, this book strives to answer the question, “What drives behavior?” If we ...