book

Essential Math for Data Science

Name: Essential Math for Data Science
Author: Thomas Nield
ISBN: 9781098102937

by Thomas Nield

May 2022

Intermediate to advanced

352 pages

9h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Sandbox

Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Basic Math and Calculus Review
Number TheoryOrder of OperationsVariablesFunctionsSummationsExponentsLogarithmsEuler’s Number and Natural LogarithmsEuler’s NumberNatural LogarithmsLimitsDerivativesPartial DerivativesThe Chain RuleIntegralsConclusionExercises
2. Probability
Understanding ProbabilityProbability Versus StatisticsProbability MathJoint ProbabilitiesUnion ProbabilitiesConditional Probability and Bayes’ TheoremJoint and Union Conditional ProbabilitiesBinomial DistributionBeta DistributionConclusionExercises
3. Descriptive and Inferential Statistics
What Is Data?Descriptive Versus Inferential StatisticsPopulations, Samples, and BiasDescriptive StatisticsMean and Weighted MeanMedianModeVariance and Standard DeviationThe Normal DistributionThe Inverse CDFZ-ScoresInferential StatisticsThe Central Limit TheoremConfidence IntervalsUnderstanding P-ValuesHypothesis TestingThe T-Distribution: Dealing with Small SamplesBig Data Considerations and the Texas Sharpshooter FallacyConclusionExercises
4. Linear Algebra
What Is a Vector?Adding and Combining VectorsScaling VectorsSpan and Linear DependenceLinear TransformationsBasis VectorsMatrix Vector MultiplicationMatrix MultiplicationDeterminantsSpecial Types of MatricesSquare MatrixIdentity MatrixInverse MatrixDiagonal MatrixTriangular MatrixSparse MatrixSystems of Equations and Inverse MatricesEigenvectors and EigenvaluesConclusionExercises
5. Linear Regression
A Basic Linear RegressionResiduals and Squared ErrorsFinding the Best Fit LineClosed Form EquationInverse Matrix TechniquesMatrix DecompositionGradient DescentOverfitting and VarianceStochastic Gradient DescentThe Correlation CoefficientStatistical SignificanceCoefficient of DeterminationStandard Error of the EstimatePrediction IntervalsTrain/Test SplitsMultiple Linear RegressionConclusionExercises
6. Logistic Regression and Classification
Understanding Logistic RegressionPerforming a Logistic RegressionLogistic FunctionFitting the Logistic CurveMultivariable Logistic RegressionUnderstanding the Log-OddsR-SquaredP-ValuesTrain/Test SplitsConfusion MatricesBayes’ Theorem and ClassificationReceiver Operator Characteristics/Area Under CurveClass ImbalanceConclusionExercises
7. Neural Networks
When to Use Neural Networks and Deep LearningA Simple Neural NetworkActivation FunctionsForward PropagationBackpropagationCalculating the Weight and Bias DerivativesStochastic Gradient DescentUsing scikit-learnLimitations of Neural Networks and Deep LearningConclusionExercise
8. Career Advice and the Path Forward
Redefining Data ScienceA Brief History of Data ScienceFinding Your EdgeSQL ProficiencyProgramming ProficiencyData VisualizationKnowing Your IndustryProductive LearningPractitioner Versus AdvisorWhat to Watch Out For in Data Science JobsRole DefinitionOrganizational Focus and Buy-InAdequate ResourcesReasonable ObjectivesCompeting with Existing SystemsA Role Is Not What You ExpectedDoes Your Dream Job Not Exist?Where Do I Go Now?Conclusion
A. Supplemental Topics
Using LaTeX Rendering with SymPyBinomial Distribution from ScratchBeta Distribution from ScratchDeriving Bayes’ TheoremCDF and Inverse CDF from ScratchUse e to Predict Event Probability Over TimeHill Climbing and Linear RegressionHill Climbing and Logistic RegressionA Brief Intro to Linear ProgrammingMNIST Classifier Using scikit-learn

B. Exercise Answers
Chapter 1Chapter 2Chapter 3Chapter 4Chapter 5Chapter 6Chapter 7
Index
About the Author

Overview

Master the math needed to excel in data science, machine learning, and statistics. In this book author Thomas Nield guides you through areas like calculus, probability, linear algebra, and statistics and how they apply to techniques like linear regression, logistic regression, and neural networks. Along the way you'll also gain practical insights into the state of data science and how to use those insights to maximize your career.

Learn how to:

Use Python code and libraries like SymPy, NumPy, and scikit-learn to explore essential mathematical concepts like calculus, linear algebra, statistics, and machine learning
Understand techniques like linear regression, logistic regression, and neural networks in plain English, with minimal mathematical notation and jargon
Perform descriptive statistics and hypothesis testing on a dataset to interpret p-values and statistical significance
Manipulate vectors and matrices and perform matrix decomposition
Integrate and build upon incremental knowledge of calculus, probability, statistics, and linear algebra, and apply it to regression models including neural networks
Navigate practically through a data science career and avoid common pitfalls, assumptions, and biases while tuning your skill set to stand out in the job market

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098102920Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills