Book description
As an aspiring data scientist, you appreciate why organizations rely on data for important decisions--whether it's for companies designing websites, cities deciding how to improve services, or scientists discovering how to stop the spread of disease. And you want the skills required to distill a messy pile of data into actionable insights. We call this the data science lifecycle: the process of collecting, wrangling, analyzing, and drawing conclusions from data.
Learning Data Science is the first book to cover foundational skills in both programming and statistics that encompass this entire lifecycle. It's aimed at those who wish to become data scientists or who already work with data scientists, and at data analysts who wish to cross the "technical/nontechnical" divide. If you have a basic knowledge of Python programming, you'll learn how to work with data using industry-standard tools like pandas.
- Refine a question of interest to one that can be studied with data
- Pursue data collection that may involve text processing, web scraping, etc.
- Glean valuable insights about data through data cleaning, exploration, and visualization
- Learn how to use modeling to describe the data
- Generalize findings beyond the data
Publisher resources
Table of contents
- I. The Data Science Lifecycle
- 1. The Data Science Lifecycle
- 2. Questions and Data Scope
-
3. Simulation and Data Design
- The Urn Model
- Example: Simulating Election Poll Bias and Variance
- Example: Simulating a Randomized Trial for a Vaccine
- Example: Measuring Air Quality
- Summary
- 4. Modeling with Summary Statistics
- 5. Case Study: Why is my Bus Always Late?
- II. Rectangular Data
-
6. Working With Dataframes Using pandas
- Subsetting
- Aggregating
- Joining
- Transforming
- How are Dataframes Different from Other Data Representations?
- Summary
-
7. Working With Relations Using SQL
- Subsetting
- Aggregating
- Joining
- Transforming and Common Table Expressions
- Summary
- III. Understanding The Data
-
8. Wrangling Files
- Data Source Examples
- File Formats
- File Encoding
- File Size
- The Shell and Command Line Tools
- Table Shape and Granularity
- Summary
-
9. Wrangling Dataframes
- Example: Wrangling CO2 Measurements from Mauna Loa Observatory
- Quality Checks
- Missing Values and Records
- Transformations and Timestamps
- Modifying Structure
- Example: Wrangling Restaurant Safety Violations
- Summary
-
10. Exploratory Data Analysis
- Feature Types
- What to Look For in a Distribution
- What to Look For in a Relationship
- Comparisons in Multivariate Settings
- Guidelines for Exploration
- Example: Sale Prices for Houses
- Summary
-
11. Data Visualization
- Choosing Scale to Reveal Structure
- Smoothing and Aggregating Data
- Facilitating Meaningful Comparisons
- Incorporating the Data Design
- Adding Context
- Creating Plots Using plotly
- Other Tools for Visualization
- Summary
-
12. Case Study: How Accurate are Air Quality Measurements?
- Question, Design, and Scope
- Finding Collocated Sensors
- Wrangling and Cleaning AQS Sensor Data
- Wrangling PurpleAir Sensor Data
- Exploring PurpleAir and AQS Measurements
- Creating a Model to Correct PurpleAir Measurements
- Summary
- IV. Other Data Sources
-
13. Working with Text
- Examples of Text and Tasks
- String Manipulation
- Regular Expressions
- Text Analysis
- Summary
- 14. Data Exchange
- V. Linear Modeling
-
15. Linear Models
- Simple Linear Model
- Example: A Simple Linear Model for Air Quality
- Fitting the Simple Linear Model
- Multiple Linear Model
- Fitting the Multiple Linear Model
- Example: Where is the Land of Opportunity?
- Feature Engineering for Numeric Measurements
- Feature Engineering for Categorical Measurements
- Summary
- 16. Model Selection
-
17. Theory for Inference and Prediction
- Distributions: Population, Empirical, Sampling
- Basics of Hypothesis Testing
- Bootstrapping for Inference
- Basics of Confidence Intervals
- Basics of Prediction Intervals
- Probability for Inference and Prediction
- Summary
-
18. Case Study: How to Weigh a Donkey
- Donkey Study Question and Scope
- Wrangling and Transforming
- Exploring
- Modeling a Donkey’s Weight
- Summary
- VI. Classification
-
19. Classification
- Example: Wind Damaged Trees
- Modeling and Classification
- Modeling Proportions (and Probabilities)
- A Loss Function for the Logistic Model
- From Probabilities to Classification
- Summary
- 20. Numerical Optimization
- 21. Case Study: Detecting Fake News
- About the Authors
Product information
- Title: Learning Data Science
- Author(s):
- Release date: September 2023
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098113001
You might also like
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition
Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. …
book
Training Data for Machine Learning
Your training data has as much to do with the success of your data project as …
book
Python for Data Analysis, 3rd Edition
Get the definitive handbook for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python …
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …