Preface
The goal of statistical data analysis is to extract the maximum information from the data, and to present a product that is as accurate and as useful as possible.
— David W. Scott, Multivariate Density Estimation: Theory, Practice and Visualization, 1992
My purpose in writing this book is to introduce the mathematically sophisticated reader to a large number of topics and techniques in the field variously known as machine learning, statistical learning, or predictive modeling. I believe that a deeper understanding of the subject as a whole will be obtained from reflection on an intuitive understanding of many techniques rather than a very detailed understanding of only one or two, and the book is structured accordingly. I have omitted many details while focusing on what I think shows “what is really going on.” For details, the reader will be directed to the relevant literature or to the exercises, which form an integral part of the text.
No work this small on a subject this large can be self-contained. Some undergraduate-level calculus, linear algebra, and probability is assumed without reference, as are a few basic ideas from statistics. All of the techniques discussed here can, I hope, be implemented using this book and a mid-level programming language (such as C), and explicit implementation of many techniques using both R and Python is presented in the last chapter.
The reader may detect a coverage bias in favor of classification over regression. This is deliberate. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access