Essential Math for Data Science

Book description

Master the math needed to excel in data science, machine learning, and statistics. In this book author Thomas Nield guides you through areas like calculus, probability, linear algebra, and statistics and how they apply to techniques like linear regression, logistic regression, and neural networks. Along the way you'll also gain practical insights into the state of data science and how to use those insights to maximize your career.

Learn how to:

  • Use Python code and libraries like SymPy, NumPy, and scikit-learn to explore essential mathematical concepts like calculus, linear algebra, statistics, and machine learning
  • Understand techniques like linear regression, logistic regression, and neural networks in plain English, with minimal mathematical notation and jargon
  • Perform descriptive statistics and hypothesis testing on a dataset to interpret p-values and statistical significance
  • Manipulate vectors and matrices and perform matrix decomposition
  • Integrate and build upon incremental knowledge of calculus, probability, statistics, and linear algebra, and apply it to regression models including neural networks
  • Navigate practically through a data science career and avoid common pitfalls, assumptions, and biases while tuning your skill set to stand out in the job market

Table of contents

  1. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. O’Reilly Online Learning
    4. How to Contact Us
    5. Acknowledgments
  2. 1. Basic Math and Calculus Review
    1. Number Theory
    2. Order of Operations
    3. Variables
    4. Functions
    5. Summations
    6. Exponents
    7. Logarithms
    8. Euler’s Number and Natural Logarithms
      1. Euler’s Number
      2. Natural Logarithms
    9. Limits
    10. Derivatives
      1. Partial Derivatives
      2. The Chain Rule
    11. Integrals
    12. Conclusion
    13. Exercises
  3. 2. Probability
    1. Understanding Probability
      1. Probability Versus Statistics
    2. Probability Math
      1. Joint Probabilities
      2. Union Probabilities
      3. Conditional Probability and Bayes’ Theorem
      4. Joint and Union Conditional Probabilities
    3. Binomial Distribution
    4. Beta Distribution
    5. Conclusion
    6. Exercises
  4. 3. Descriptive and Inferential Statistics
    1. What Is Data?
    2. Descriptive Versus Inferential Statistics
    3. Populations, Samples, and Bias
    4. Descriptive Statistics
      1. Mean and Weighted Mean
      2. Median
      3. Mode
      4. Variance and Standard Deviation
      5. The Normal Distribution
      6. The Inverse CDF
      7. Z-Scores
    5. Inferential Statistics
      1. The Central Limit Theorem
      2. Confidence Intervals
      3. Understanding P-Values
      4. Hypothesis Testing
    6. The T-Distribution: Dealing with Small Samples
    7. Big Data Considerations and the Texas Sharpshooter Fallacy
    8. Conclusion
    9. Exercises
  5. 4. Linear Algebra
    1. What Is a Vector?
      1. Adding and Combining Vectors
      2. Scaling Vectors
      3. Span and Linear Dependence
    2. Linear Transformations
      1. Basis Vectors
      2. Matrix Vector Multiplication
    3. Matrix Multiplication
    4. Determinants
    5. Special Types of Matrices
      1. Square Matrix
      2. Identity Matrix
      3. Inverse Matrix
      4. Diagonal Matrix
      5. Triangular Matrix
      6. Sparse Matrix
    6. Systems of Equations and Inverse Matrices
    7. Eigenvectors and Eigenvalues
    8. Conclusion
    9. Exercises
  6. 5. Linear Regression
    1. A Basic Linear Regression
    2. Residuals and Squared Errors
    3. Finding the Best Fit Line
      1. Closed Form Equation
      2. Inverse Matrix Techniques
      3. Gradient Descent
    4. Overfitting and Variance
    5. Stochastic Gradient Descent
    6. The Correlation Coefficient
    7. Statistical Significance
    8. Coefficient of Determination
    9. Standard Error of the Estimate
    10. Prediction Intervals
    11. Train/Test Splits
    12. Multiple Linear Regression
    13. Conclusion
    14. Exercises
  7. 6. Logistic Regression and Classification
    1. Understanding Logistic Regression
    2. Performing a Logistic Regression
      1. Logistic Function
      2. Fitting the Logistic Curve
    3. Multivariable Logistic Regression
    4. Understanding the Log-Odds
    5. R-Squared
    6. P-Values
    7. Train/Test Splits
    8. Confusion Matrices
    9. Bayes’ Theorem and Classification
    10. Receiver Operator Characteristics/Area Under Curve
    11. Class Imbalance
    12. Conclusion
    13. Exercises
  8. 7. Neural Networks
    1. When to Use Neural Networks and Deep Learning
    2. A Simple Neural Network
      1. Activation Functions
      2. Forward Propagation
    3. Backpropagation
      1. Calculating the Weight and Bias Derivatives
      2. Stochastic Gradient Descent
    4. Using scikit-learn
    5. Limitations of Neural Networks and Deep Learning
    6. Conclusion
    7. Exercise
  9. 8. Career Advice and the Path Forward
    1. Redefining Data Science
    2. A Brief History of Data Science
    3. Finding Your Edge
      1. SQL Proficiency
      2. Programming Proficiency
      3. Data Visualization
      4. Knowing Your Industry
      5. Productive Learning
      6. Practitioner Versus Advisor
    4. What to Watch Out For in Data Science Jobs
      1. Role Definition
      2. Organizational Focus and Buy-In
      3. Adequate Resources
      4. Reasonable Objectives
      5. Competing with Existing Systems
      6. A Role Is Not What You Expected
    5. Does Your Dream Job Not Exist?
    6. Where Do I Go Now?
    7. Conclusion
  10. A. Supplemental Topics
    1. Using LaTeX Rendering with SymPy
    2. Binomial Distribution from Scratch
    3. Beta Distribution from Scratch
    4. Deriving Bayes’ Theorem
    5. CDF and Inverse CDF from Scratch
    6. Use e to Predict Event Probability Over Time
    7. Hill Climbing and Linear Regression
    8. Hill Climbing and Logistic Regression
    9. A Brief Intro to Linear Programming
    10. MNIST Classifier Using scikit-learn
  11. B. Exercise Answers
    1. Chapter 1
    2. Chapter 2
    3. Chapter 3
    4. Chapter 4
    5. Chapter 5
    6. Chapter 6
    7. Chapter 7
  12. Index
  13. About the Author

Product information

  • Title: Essential Math for Data Science
  • Author(s): Thomas Nield
  • Release date: June 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098102937