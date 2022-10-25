Essential Math for Data Science

Essential Math for Data Science

by Thomas Nield
Released October 2022
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781098102869

Book description

To succeed in data science you need some math proficiency. But not just any math. This common-sense guide provides a clear, plain English survey of the math you'll need in data science, including probability, statistics, hypothesis testing, linear algebra, machine learning, and calculus.

Practical examples with Python code will help you see how the math applies to the work you'll be doing, providing a clear understanding of how concepts work under the hood while connecting them to applications like machine learning. You'll get a solid foundation in the math essential for data science, but more importantly, you'll be able to use it to:

  • Recognize the nuances and pitfalls of probability math
  • Master statistics and hypothesis testing (and avoid common pitfalls)
  • Discover practical applications of probability, statistics, calculus, and machine learning
  • Intuitively understand linear algebra as a transformation of space, not just grids of numbers being multiplied and added
  • Perform calculus derivatives and integrals completely from scratch in Python
  • Apply what you've learned to machine learning, including linear regression, logistic regression, and neural networks

Table of contents

  1. Preface
  2. 1. Basic Math and Calculus Review
    1. Number Theory
    2. Order of Operations
    3. Variables
    4. Functions
    5. Summations
    6. Exponents
    7. Logarithms
    8. Euler’s Number and Natural Logarithms
      1. Natural Logarithms
    9. Limits
    10. Derivatives
    11. Integrals
    12. Conclusion
    13. Exercises
  3. 2. Probability
    1. Understanding Probability
      1. Probability versus Statistics
    2. Probability Math
      1. Joint Probabilities
      2. Union Probabilities
      3. Conditional Probability and Bayes Theorem
      4. Joint and Union Conditional Probabilities
    3. Binomial Distribution
    4. Beta Distribution
    5. Conclusion
    6. Exercises
  4. 3. Descriptive and Inferential Statistics
    1. What is Data?
    2. Descriptive versus Inferential Statistics
    3. Populations, Samples, and Bias
    4. Descriptive Statistics
      1. Mean and Weighted Mean
      2. Median
      3. Mode
      4. Variance and Standard Deviation
      5. The Normal Distribution
      6. The Inverse Cumulative Density Function (CDF)
    5. Inferential Statistics
      1. The Central Limit Theorem
      2. Confidence Intervals
      3. Understanding P-Values
      4. Hypothesis Testing
    6. The T-Distribution: Dealing with Small Samples
    7. Big Data Considerations and Texas Sharpshooter Fallacy
    8. Conclusions
    9. Exercises
  5. 4. Linear Algebra
    1. What is a Vector?
      1. Adding and Combining Vectors
      2. Scaling Vectors
      3. Span and Linear Dependence
    2. Linear Transformations
      1. Basis Vectors
      2. Matrix Vector Multiplication
    3. Matrix Multiplication
    4. Determinants
    5. Systems of Equations and Inverse Matrices
    6. Eigenvectors and Eigenvalues
    7. Conclusion
    8. Exercises
  6. 5. Linear Regression
    1. A Basic Linear Regression
    2. Residuals and Squared Errors
    3. Finding the Best Fit Line
      1. Closed Form Equation
      2. Inverse Matrix Techniques
      3. Gradient Descent
    4. Overfitting and Variance
    5. Stochastic Gradient Descent
    6. The Correlation Coefficient
    7. Statistical Significance
    8. Coefficient of Determination
    9. Standard Error of the Estimate
    10. Prediction Intervals
    11. Train/Test Splits
    12. Multiple Linear Regression
      1. Conclusions
      2. Exercises
  7. 6. Logistic Regression and Classification
    1. Understanding Logistic Regression
    2. Performing a Logistic Regression
      1. Logistic Function
      2. Fitting the Logistic Curve
    3. Multivariable Logistic Regression
    4. Understanding the Log-Odds
    5. R-Squared
    6. P-Values
    7. Train/Test Splits
    8. Confusion Matrices
      1. Bayes Theorem and the Confusion Matrix
    9. Reciever Operator Characteristics (ROC)/Area Under Curve (AUC)
    10. Class Imbalance
    11. Conclusions
    12. Exercises
  8. 7. Neural Networks
    1. When to Use Neural Networks and Deep Learning
    2. A Simple Neural Network
      1. Activation Functions
      2. Forward Propogation
    3. Backpropogation
      1. The Chain Rule
      2. Calculating the Weight and Bias Derivatives
      3. Stochastic Gradient Descent
    4. Using Scikit-Learn
    5. Limitations of Neural Networks and Deep Learning
    6. Conclusions
