O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Practical Data Science Cookbook - Second Edition

Book Description

Over 85 recipes to help you complete real-world data science projects in R and Python

About This Book

  • Tackle every step in the data science pipeline and use it to acquire, clean, analyze, and visualize your data

  • Get beyond the theory and implement real-world projects in data science using R and Python

  • Easy-to-follow recipes will help you understand and implement the numerical computing concepts

  • Who This Book Is For

    If you are an aspiring data scientist who wants to learn data science and numerical programming concepts through hands-on, real-world project examples, this is the book for you. Whether you are brand new to data science or you are a seasoned expert, you will benefit from learning about the structure of real-world data science projects and the programming examples in R and Python.

    What You Will Learn

  • Learn and understand the installation procedure and environment required for R and Python on various platforms

  • Prepare data for analysis by implement various data science concepts such as acquisition, cleaning and munging through R and Python

  • Build a predictive model and an exploratory model

  • Analyze the results of your model and create reports on the acquired data

  • Build various tree-based methods and Build random forest

  • In Detail

    As increasing amounts of data are generated each year, the need to analyze and create value out of it is more important than ever. Companies that know what to do with their data and how to do it well will have a competitive advantage over companies that don’t. Because of this, there will be an increasing demand for people that possess both the analytical and technical abilities to extract valuable insights from data and create valuable solutions that put those insights to use.

    Starting with the basics, this book covers how to set up your numerical programming environment, introduces you to the data science pipeline, and guides you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis—R and Python.

    Style and approach

    This step-by-step guide to data science is full of hands-on examples of real-world data science tasks. Each recipe focuses on a particular task involved in the data science pipeline, ranging from readying the dataset to analytics and visualization

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

    Table of Contents

    1. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Sections
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
        5. See also
      5. Conventions
      6. Reader feedback
      7. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    2. Preparing Your Data Science Environment
      1. Understanding the data science pipeline
        1. How to do it...
        2. How it works...
      2. Installing R on Windows, Mac OS X, and Linux
        1. How to do it...
        2. How it works...
        3. See also
      3. Installing libraries in R and RStudio
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      4. Installing Python on Linux and Mac OS X
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      5. Installing Python on Windows
        1. How to do it...
        2. How it works...
        3. See also
      6. Installing the Python data stack on Mac OS X and Linux
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      7. Installing extra Python packages
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      8. Installing and using virtualenv
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
    3. Driving Visual Analysis with Automobile Data with R
      1. Introduction
      2. Acquiring automobile fuel efficiency data
        1. Getting ready
        2. How to do it...
        3. How it works...
      3. Preparing R for your first project
        1. Getting ready
        2. How to do it...
        3. There's more...
        4. See also
      4. Importing automobile fuel efficiency data into R
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      5. Exploring and describing fuel efficiency data
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
      6. Analyzing automobile fuel efficiency over time
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      7. Investigating the makes and models of automobiles
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
    4. Creating Application-Oriented Analyses Using Tax Data and Python
      1. Introduction
        1. An introduction to application-oriented approaches
      2. Preparing for the analysis of top incomes
        1. Getting ready
        2. How to do it...
        3. How it works...
      3. Importing and exploring the world's top incomes dataset
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      4. Analyzing and visualizing the top income data of the US
        1. Getting ready
        2. How to do it...
        3. How it works...
      5. Furthering the analysis of the top income groups of the US
        1. Getting ready
        2. How to do it...
        3. How it works...
      6. Reporting with Jinja2
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      7. Repeating the analysis in R
        1. Getting ready
        2. How to do it...
        3. There's more...
    5. Modeling Stock Market Data
      1. Introduction
        1. Requirements
      2. Acquiring stock market data
        1. How to do it...
      3. Summarizing the data
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
      4. Cleaning and exploring the data
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      5. Generating relative valuations
        1. Getting ready
        2. How to do
        3. How it works...
      6. Screening stocks and analyzing historical prices
        1. Getting ready
        2. How to do it...
        3. How it works...
    6. Visually Exploring Employment Data
      1. Introduction
      2. Preparing for analysis
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      3. Importing employment data into R
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      4. Exploring the employment data
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      5. Obtaining and merging additional data
        1. Getting ready
        2. How to do it...
        3. How it works...
      6. Adding geographical information
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      7. Extracting state- and county-level wage and employment information
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      8. Visualizing geographical distributions of pay
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      9. Exploring where the jobs are, by industry
        1. How to do it...
        2. How it works...
        3. There's more...
        4. See also
      10. Animating maps for a geospatial time series
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There is more...
      11. Benchmarking performance for some common tasks
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
    7. Driving Visual Analyses with Automobile Data
      1. Introduction
      2. Getting started with IPython
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      3. Exploring Jupyter Notebook
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      4. Preparing to analyze automobile fuel efficiencies
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      5. Exploring and describing fuel efficiency data with Python
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      6. Analyzing automobile fuel efficiency over time with Python
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      7. Investigating the makes and models of automobiles with Python
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
    8. Working with Social Graphs
      1. Introduction
        1. Understanding graphs and networks
      2. Preparing to work with social networks in Python
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
      3. Importing networks
        1. Getting ready
        2. How to do it...
        3. How it works...
      4. Exploring subgraphs within a heroic network
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
      5. Finding strong ties
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
      6. Finding key players
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
          1. The betweenness centrality
          2. The closeness centrality
          3. The eigenvector centrality
          4. Deciding on centrality algorithm
      7. Exploring the characteristics of entire networks
        1. Getting ready
        2. How to do it...
        3. How it works...
      8. Clustering and community detection in social networks
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
      9. Visualizing graphs
        1. Getting ready
        2. How to do it...
        3. How it works...
      10. Social networks in R
        1. Getting ready
        2. How to do it...
        3. How it works...
    9. Recommending Movies at Scale (Python)
      1. Introduction
      2. Modeling preference expressions
        1. How to do it...
        2. How it works...
      3. Understanding the data
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
      4. Ingesting the movie review data
        1. Getting ready
        2. How to do it...
        3. How it works...
      5. Finding the highest-scoring movies
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      6. Improving the movie-rating system
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      7. Measuring the distance between users in the preference space
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      8. Computing the correlation between users
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
      9. Finding the best critic for a user
        1. Getting ready
        2. How to do it...
        3. How it works...
      10. Predicting movie ratings for users
        1. Getting ready
        2. How to do it...
        3. How it works...
      11. Collaboratively filtering item by item
        1. Getting ready
        2. How to do it...
        3. How it works...
      12. Building a non-negative matrix factorization model
        1. How to do it...
        2. How it works...
        3. See also
      13. Loading the entire dataset into the memory
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
      14. Dumping the SVD-based model to the disk
        1. How to do it...
        2. How it works...
      15. Training the SVD-based model
        1. How to do it...
        2. How it works...
        3. There's more...
      16. Testing the SVD-based model
        1. How to do it...
        2. How it works...
        3. There's more...
    10. Harvesting and Geolocating Twitter Data (Python)
      1. Introduction
      2. Creating a Twitter application
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      3. Understanding the Twitter API v1.1
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      4. Determining your Twitter followers and friends
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      5. Pulling Twitter user profiles
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      6. Making requests without running afoul of Twitter's rate limits
        1. Getting ready
        2. How to do it...
        3. How it works...
      7. Storing JSON data to disk
        1. Getting ready
        2. How to do it...
        3. How it works...
      8. Setting up MongoDB for storing Twitter data
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      9. Storing user profiles in MongoDB using PyMongo
        1. Getting ready
        2. How to do it...
        3. How it works...
      10. Exploring the geographic information available in profiles
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      11. Plotting geospatial data in Python
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
    11. Forecasting New Zealand Overseas Visitors
      1. Introduction
      2. The ts object
        1. Getting ready
        2. How to do it
        3. How it works...
      3. Visualizing time series data
        1. Getting ready
        2. How to do it...
        3. How it works...
      4. Simple linear regression models
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      5. ACF and PACF
        1. Getting ready
        2. How to do it...
        3. How it works...
      6. ARIMA models
        1. Getting ready
        2. How to do it...
        3. How it works...
      7. Accuracy measurements
        1. Getting ready
        2. How to do it...
        3. How it works...
      8. Fitting seasonal ARIMA models
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
    12. German Credit Data Analysis
      1. Introduction
      2. Simple data transformations
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
      3. Visualizing categorical data
        1. Getting ready
        2. How to do it...
        3. How it works...
      4. Discriminant analysis
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      5. Dividing the data and the ROC
        1. Getting ready
        2. How to do it...
      6. Fitting the logistic regression model
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      7. Decision trees and rules
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. See also
      8. Decision tree for german data
        1. Getting ready
        2. How to do it ...
        3. How it works...