Video Description
The perfect follow up to Pandas Data Analysis with Python Fundamentals LiveLessons for the aspiring data scientist
Overview
In Pandas Data Cleaning and Modeling with Python LiveLessons, Daniel Y. Chen builds upon the foundation he built in Pandas Data Analysis with Python Fundamentals LiveLessons. In this LiveLesson Dan teaches you the techniques and skills you need to know to be able to clean and process your data. Dan shows you how to do data munging using some of the builtin Python libraries that can be used to clean data loaded into Pandas. Once your data is clean you are going to want to analyze it, so next Dan introduces you to other libraries that are used for model fitting.
About the Instructor
Daniel Y. Chen is a graduate student in the interdisciplinary Ph.D. program in Genetics, Bioinformatics & Computational Biology (GBCB) at Virginia Tech. He is involved with Software Carpentry as an instructor and lesson maintainer. He completed his master’s degree in public health at Columbia University Mailman School of Public Health in Epidemiology, and currently works at the Social and Decision Analytics Laboratory under the Biocomplexity Institute of Virginia Tech where he is working with data to inform policy decisionmaking. He is the author of Pandas for Everyone and Pandas Data Analysis with Python Fundamentals LiveLessons.
Skill Level
 Beginner to Intermediate
Learn How To
 Use pandas data types
 Convert data types
 Use string methods and regular expressions
 Apply functions to data
 Aggregate, transform, and filter data
 Use pandas and Python date and time methods
 Model data
Who Should Take This Course
 Those new to data science, particularly those with Python programming experience
Course Requirements
 Basic programming skills, particularly in Python
Lesson 1: Pandas Data Types
These lessons pick up where Pandas Data Analysis with Python Fundamentals LiveLessons left off. You learned the basics of subsetting, combining, and reshaping data. Now you can start learning how to cleaning your data. That begins with learning data types and how to find them in your data. Next comes the converting from one type to another, including converting data into numeric and string values. The lesson finishes with categorical data.
Lesson 2: Unstructured Text and Strings in Pandas
There are vast stores of data available as unstructured text. Understanding how to work with text data in Python is important when your dataset has text data that needs to be processed. The lesson begins with a basic overview of strings and the builtin python string methods. Next, Dan covers how to format strings. This will make your code more legible and can make the output more consistent and “prettier.” Dan then introduces regular expressions with the builtin regular expressions library (2.5) and how you can use regular expressions to do pattern matching. Finally, Dan shows you a quick example of the better, but not builtin, regex library.
Lesson 3: Applying Functions to Data
Applying functions is a fundamental skill when working with data. Application of functions incorporates many skills used in programming and data analytics. Instead of writing for loops to perform calculations and data manipulations, we write functions that work on a columnbycolumn or rowbyrow basis. Dan begins with a quick introduction to functions in Python. Then, he turns to using simple functions on a toy dataset to see how apply works. Next, he applies functions on an actual dataset. You then learn how to write vectorized functions, functions that work on an elementwise basis. Finally, Dan takes a look at lambda functions for oneoff calculations.
Lesson 4: Breaking Up Computations Using groupby Operations: splitapplycombine
groupby operations follow the mantra of splitapplycombine. Where your data is split and partitioned by a variable or variables, functions are applied to each partition, and the results are combined back into a single result. This technique is utilized heavily on distributed systems when the data no longer can fit on a single machine. There are three common operations when performing a groupby. First, there is aggregation where you summarize your data into a single value. For example, calculating the average life expectancy across each year in your data would be aggregation. Transformation is done when you perform a specific calculation for each individual group. Next, there is filtration, where you reduce your data based on a calculation within a group.
Dan also looks further into the groupby object itself and how you can iterate over your groups. And finally, he demonstrates the multiindex and how you can chain multiple groupby calculations together.
Lesson 5: Dates and Times in Python and Pandas
One of pandas’ strong suits is handling dates and times in timeseries data. There are many convenient functions and methods that make working and processing datetime data much easier in pandas. Dan begins by looking at Python’s datetime object and how to create them. Next, you learn how you can convert columns in your data into datetime objects. He then shows you how you can directly load data into a datetime without having an intermediate step and then convert it later. Once you have your data stored as a proper date and time object, Dan shows you how you can extract various datetime components and how you can perform calculations and create Timedeltas. Then Dan shows you other functions and methods you can perform on datetimes, and how you can download stock data from the internet. Once you have your data processed the way you want, Dan takes you back to the basics and you learn how you can leverage dates and times to subset your data. From there you learn how you can create ranges of dates, followed by an example of shifting date values. Finally, Dan covers how you can resample your dates and how you can convert dates and times across various time zones.
Lesson 6: Modeling: Connecting to the World Outside of Pandas
Once you have your data processed the way you want, you can begin modeling your data to gain insights. This lesson begins to expand our world within pandas to other Python libraries used to model data. Dan begins with linear regression and how it is performed in two very popular modeling libraries: statsmodels and scikitlearn. While linear regression is great if your outcome or response variable is continuous, you can use logistic regression when your outcome of interest is a binary variable. When you begin working with count data, you use a Poisson or negative binomial model, depending on the assumptions and characteristics of your data. Next, Dan introduces you to survival models, when you have censored data and want to model the time a particular event will occur. Dan then covers how you can perform model diagnostics and compare model performance by looking at residuals, ANOVA, AIC, BIC, and kfold cross validation. He then covers how you can have a more parsimonious model that can better predict future data points by using regularization techniques, and the lesson concludes by introducing clustering techniques and how you can use principal components analysis to visualize your kmeans results.
About Pearson Video Training
Pearson publishes expertled video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. These professional and personal technology videos feature worldleading author instructors published by your trusted technology brands: AddisonWesley, Cisco Press, Pearson IT Certification, Prentice Hall, Sams, and Que Topics include: IT Certification, Network Security, Cisco Technology, Programming, Web Development, Mobile Development, and more. Learn more about Pearson Video training at http://www.informit.com/video.
Table of Contents
 Introduction

Lesson 1: Pandas Data Types
 Learning objectives 00:00:55
 1.1 Understand Pandas data types 00:01:19
 1.2 Convert types 00:07:39
 1.3 Convert and manipulate categorical data 00:03:33

Lesson 2: Unstructured Text and Strings in Pandas
 Learning objectives 00:00:50
 2.1 Understand strings 00:02:41
 2.2 Use string methods 00:04:08
 2.3 Use more string methods 00:03:03
 2.4 Utilize string formatting 00:07:05
 2.5 Utilize regular expressions (regex) 00:13:08
 2.6 Access the regex library 00:01:28

Lesson 3: Applying Functions to Data
 Learning objectives 00:00:53
 3.1 Use functions 00:02:43
 3.2 Use apply basics 00:07:21
 3.3 Use apply columnwise and rowwise 00:07:15
 3.4 Use vectorized functions 00:06:24
 3.5 Use lambda functions 00:02:53

Lesson 4: Breaking up Computations Using groupby Operations: Splitapplycombine
 Learning objectives 00:01:09
 4.1 Aggregate data 00:07:50
 4.2 Transform data 00:05:36
 4.3 Filter data 00:01:55
 4.4 Use the pandas.core .groupby.DataFrameGroupBy object 00:06:02
 4.5 Work with a multiIndex 00:07:27

Lesson 5: Dates and Times in Python and Pandas
 Learning objectives 00:01:21
 5.1 Use Python’s datetime 00:01:39
 5.2 Convert to datetime 00:03:08
 5.3 Load data with dates 00:01:27
 5.4 Extract date components 00:03:46
 5.5 Implement date calculations and Timedeltas 00:01:55
 5.6 Use datetime methods 00:02:50
 5.7 Get stock data 00:01:28
 5.8 Subset data based on dates 00:04:04
 5.9 Use date ranges 00:05:05
 5.10 Shift values 00:09:24
 5.11 Do resampling 00:02:15
 5.12 Work with time zones 00:04:19

Lesson 6: Modeling: Connecting to the World Outside of Pandas
 Learning objectives 00:01:24
 6.1 Use linear 00:12:54
 6.2 Use logistic 00:07:17
 6.3 Use a Poisson or Negative Binomial model 00:04:12
 6.4 Use Survival 00:08:15
 6.5 Use Diagnostics 00:20:46
 6.6 Use Regularization 00:05:49
 6.7 Use clustering and PCA 00:09:13
 Summary
Product Information
 Title: Pandas Data Cleaning and Modeling with Python
 Author(s):
 Release date: January 2018
 Publisher(s): AddisonWesley Professional
 ISBN: 0135170192