Video description
If you're a fledgling data scientist with only cursory statistical training and little experience with real world data sets, you may feel like you're stumbling around in the dark when you're asked to interpret and present data to decision makers. How do you validate the data? What analytic model should you use? How do you differentiate between correlation and causation? How do you ensure that your data is solid and your conclusions are on target?
Allen Downey, Professor of Computer Science at Olin College of Engineering, author of Think Stats, Think Python, and Think Complexity, provides safe passage around the common pitfalls of exploratory data analysis, so you can manage, analyze, and present data with confidence.
- Learn the fundamental tools and methodologies used in data science
- Discover best practices regarding the ETL (Extract, Transform, and Load) process and data validation
- Use the open science framework: practice version control, replication, and data pipelining
- Grasp the effectiveness of CDFs (Common Data Formats) in visualizing distributions
- Choose the correct analytic model for your data
- Comprehend statistical inference, effect size, confidence intervals, and hypothesis testing
- Discern the relationship between variables: understand scatter plots and scatter plot alternatives
- Understand correlation, linear least squares, linear regression, and logistic regression
- Master the Zen of testing your data and your conclusions
Publisher resources
Table of contents
-
Introduction to Data Exploration
- Opportunities and Goals 00:04:34
- The State of Data 00:03:04
- Data Optimism 00:02:52
-
Getting Started
- Software Setup, IPython, and Import and Validation 00:11:54
- Data Organization 00:04:45
-
Visualizing Distributions
- PMFs and CDFs 00:15:13
-
Relationships Between Variables
- Scatterplots 00:13:53
- Correlation and Least Squares 00:11:48
-
Statistical Inference
- Introduction to Statistical Inference 00:05:44
- Effect Size 00:13:00
- Effect Size, Difference in Proportions 00:06:18
- Quantifying Precision 00:20:46
- Hypothesis Testing 00:16:35
-
Regression
- Linear Regression 00:20:33
- Logistic Regression 00:11:48
-
Modeling Distributions
- Modeling Distributions 00:14:16
-
Survival Analysis
- Survival Analysis 00:17:03
-
Inspection Paradox
- Inspection Paradox 00:16:04
Product information
- Title: Data Exploration in Python
- Author(s):
- Release date: November 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491938324
You might also like
video
Python for Data Science Complete Video Course (Video Training)
9+ Hours of Video Instruction While there are resources for Data Science and resources for Machine …
book
Practical Statistics for Data Scientists, 2nd Edition
Statistical methods are a key part of data science, yet few data scientists have formal statistical …
book
Practical Time Series Analysis
Time series data analysis is increasingly important due to the massive production of such data through …
video
Data Analytics and Machine Learning Fundamentals LiveLessons Video Training
More than 7.5 Hours of Video Instruction Overview Nearly every company in the world is evaluating …