O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Applied Data Science with Python and Jupyter

Book Description

Become the master player of data exploration by creating reproducible data processing pipelines, visualizations, and prediction models for your applications.

Key Features

  • Get up and running with the Jupyter ecosystem and some example datasets
  • Learn about key machine learning concepts such as SVM, KNN classifiers, and Random Forests
  • Discover how you can use web scraping to gather and parse your own bespoke datasets

Book Description

Getting started with data science doesn't have to be an uphill battle. Applied Data Science with Python and Jupyter is a step-by-step guide ideal for beginners who know a little Python and are looking for a quick, fast-paced introduction to these concepts. In this book, you'll learn every aspect of the standard data workflow process, including collecting, cleaning, investigating, visualizing, and modeling data. You'll start with the basics of Jupyter, which will be the backbone of the book. After familiarizing ourselves with its standard features, you'll look at an example of it in practice with our first analysis. In the next lesson, you dive right into predictive analytics, where multiple classification algorithms are implemented. Finally, the book ends by looking at data collection techniques. You'll see how web data can be acquired with scraping techniques and via APIs, and then briefly explore interactive visualizations.

What you will learn

  • Get up and running with the Jupyter ecosystem
  • Identify potential areas of investigation and perform exploratory data analysis
  • Plan a machine learning classification strategy and train classification models
  • Use validation curves and dimensionality reduction to tune and enhance your models
  • Scrape tabular data from web pages and transform it into Pandas DataFrames
  • Create interactive, web-friendly visualizations to clearly communicate your findings

Who this book is for

Applied Data Science with Python and Jupyter is ideal for professionals with a variety of job descriptions across a large range of industries, given the rising popularity and accessibility of data science. You'll need some prior experience with Python, with any prior work with libraries such as Pandas, Matplotlib, and Pandas providing you a useful head start.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Preface
    1. About the Book
      1. About the Author
      2. Objectives
      3. Audience
      4. Approach
      5. Minimum Hardware Requirements
      6. Software Requirements
      7. Installation and Setup
      8. Installing Anaconda
      9. Updating Jupyter and Installing Dependencies
      10. Additional Resources
      11. Conventions
  2. Jupyter Fundamentals
    1. Introduction
    2. Basic Functionality and Features
      1. What is a Jupyter Notebook and Why is it Useful?
      2. Navigating the Platform
      3. Exercise 1: Introducing Jupyter Notebooks
      4. Jupyter Features
      5. Exercise 2: Implementing Jupyter's Most Useful Features
      6. Converting a Jupyter Notebook to a Python Script
      7. Python Libraries
      8. Exercise 3: Importing the External Libraries and Setting Up the Plotting Environment
    3. Our First Analysis - The Boston Housing Dataset
      1. Loading the Data into Jupyter Using a Pandas DataFrame
      2. Exercise 4: Loading the Boston Housing Dataset
      3. Data Exploration
      4. Exercise 5: Analyzing the Boston Housing Dataset
      5. Introduction to Predictive Analytics with Jupyter Notebooks
      6. Exercise 6: Applying Linear Models With Seaborn and Scikit-learn
      7. Activity 1: Building a Third-Order Polynomial Model
      8. Using Categorical Features for Segmentation Analysis
      9. Exercise 7: Creating Categorical Fields From Continuous Variables and Make Segmented Visualizations
    4. Summary
  3. Data Cleaning and Advanced Machine Learning
    1. Introduction
    2. Preparing to Train a Predictive Model
      1. Determining a Plan for Predictive Analytics
      2. Exercise 8: Explore Data Preprocessing Tools and Methods
      3. Activity 2: Preparing to Train a Predictive Model for the Employee-Retention Problem
    3. Training Classification Models
      1. Introduction to Classification Algorithms
      2. Exercise 9: Training Two-Feature Classification Models With Scikit-learn
      3. The plot_decision_regions Function
      4. Exercise 10: Training K-nearest Neighbors for Our Model
      5. Exercise 11: Training a Random Forest
      6. Assessing Models With K-fold Cross-Validation and Validation Curves
      7. Exercise 12: Using K-fold Cross Validation and Validation Curves in Python With Scikit-learn
      8. Dimensionality Reduction Techniques
      9. Exercise 13: Training a Predictive Model for the Employee Retention Problem
    4. Summary
  4. Web Scraping and Interactive Visualizations
    1. Introduction
    2. Scraping Web Page Data
      1. Introduction to HTTP Requests
      2. Making HTTP Requests in the Jupyter Notebook
      3. Exercise 14: Handling HTTP Requests With Python in a Jupyter Notebook
      4. Parsing HTML in the Jupyter Notebook
      5. Exercise 15: Parsing HTML With Python in a Jupyter Notebook
      6. Activity 3: Web Scraping With Jupyter Notebooks
    3. Interactive Visualizations
      1. Building a DataFrame to Store and Organize Data
      2. Exercise 16: Building and Merging Pandas DataFrames
      3. Introduction to Bokeh
      4. Exercise 17: Introduction to Interactive Visualization With Bokeh
      5. Activity 4: Exploring Data with Interactive Visualizations
    4. Summary
  5. Appendix A
    1. Chapter 1: Jupyter Fundamentals
      1. Activity 1: Building a Third-Order Polynomial Model
    2. Chapter 2: Data Cleaning and Advanced Machine
      1. Activity 2: Preparing to Train a Predictive Model for the Employee-Retention Problem
    3. Chapter 3: Web Scraping and Interactive Visualizations
      1. Activity 3: Web Scraping with Jupyter Notebooks
      2. Activity 4: Exploring Data with Interactive Visualizations