O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Jupyter for Data Science

Book Description

Your one-stop guide to building an efficient data science pipeline using Jupyter

About This Book

  • Get the most out of your Jupyter notebook to complete the trickiest of tasks in Data Science
  • Learn all the tasks in the data science pipeline - from data acquisition to visualization - and implement them using Jupyter
  • Get ahead of the curve by mastering all the applications of Jupyter for data science with this unique and intuitive guide

Who This Book Is For

This book targets students and professionals who wish to master the use of Jupyter to perform a variety of data science tasks. Some programming experience with R or Python, and some basic understanding of Jupyter, is all you need to get started with this book.

What You Will Learn

  • Understand why Jupyter notebooks are a perfect fit for your data science tasks
  • Perform scientific computing and data analysis tasks with Jupyter
  • Interpret and explore different kinds of data visually with charts, histograms, and more
  • Extend SQL's capabilities with Jupyter notebooks
  • Combine the power of R and Python 3 with Jupyter to create dynamic notebooks
  • Create interactive dashboards and dynamic presentations
  • Master the best coding practices and deploy your Jupyter notebooks efficiently

In Detail

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations. This book is a comprehensive guide to getting started with data science using the popular Jupyter notebook.

If you are familiar with Jupyter notebook and want to learn how to use its capabilities to perform various data science tasks, this is the book for you! From data exploration to visualization, this book will take you through every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyter's features to share your documents and codes with your colleagues. The book also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks.

By the end of this book, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully.

Style and approach

This book is a perfect blend of concepts and practical examples, written in a way that is very easy to understand and implement. It follows a logical flow where you will be able to build on your understanding of the different Jupyter features with every chapter.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Errata
      3. Piracy
      4. Questions
  2. Jupyter and Data Science
    1. Jupyter concepts
    2. A first look at the Jupyter user interface
      1. Detailing the Jupyter tabs
      2. What actions can I perform with Jupyter?
      3. What objects can Jupyter manipulate?
      4. Viewing the Jupyter project display
        1. File menu
        2. Edit menu
        3. View menu
        4. Insert menu
        5. Cell menu
        6. Kernel menu
        7. Help menu
        8. Icon toolbar
      5. How does it look when we execute scripts?
      6. Industry data science usage
      7. Real life examples
        1. Finance, Python - European call option valuation
        2. Finance, Python - Monte Carlo pricing
        3. Gambling, R - betting analysis
        4. Insurance, R - non-life insurance pricing
        5. Consumer products, R - marketing effectiveness
      8. Using Docker with Jupyter
        1. Using a public Docker service
        2. Installing Docker on your machine
      9. How to share notebooks with others
        1. Can you email a notebook?
        2. Sharing a notebook on Google Drive
        3. Sharing on GitHub
        4. Store as HTML on a web server
        5. Install Jupyter on a web server
      10. How can you secure a notebook?
        1. Access control
        2. Malicious content
    3. Summary
  3. Working with Analytical Data on Jupyter
    1. Data scraping with a Python notebook
    2. Using heavy-duty data processing functions in Jupyter
      1. Using NumPy functions in Jupyter
      2. Using pandas in Jupyter
        1. Use pandas to read text files in Jupyter
        2. Use pandas to read Excel files in Jupyter
        3. Using pandas to work with data frames
          1. Using the groupby function in a data frame
          2. Manipulating columns in a data frame
          3. Calculating outliers in a data frame
    3. Using SciPy in Jupyter
      1. Using SciPy integration in Jupyter
      2. Using SciPy optimization in Jupyter
      3. Using SciPy interpolation in Jupyter
      4. Using SciPy Fourier Transforms in Jupyter
      5. Using SciPy linear algebra in Jupyter
    4. Expanding on panda data frames in Jupyter
      1. Sorting and filtering data frames in Jupyter/IPython
        1. Filtering a data frame
        2. Sorting a data frame
    5. Summary
  4. Data Visualization and Prediction
    1. Make a prediction using scikit-learn
    2. Make a prediction using R
    3. Interactive visualization
    4. Plotting using Plotly
    5. Creating a human density map
    6. Draw a histogram of social data
    7. Plotting 3D data
    8. Summary
  5. Data Mining and SQL Queries
    1. Special note for Windows installation
    2. Using Spark to analyze data
    3. Another MapReduce example
    4. Using SparkSession and SQL
    5. Combining datasets
    6. Loading JSON into Spark
    7. Using Spark pivot
    8. Summary
  6. R with Jupyter
    1. How to set up R for Jupyter
    2. R data analysis of the 2016 US election demographics
    3. Analyzing 2016 voter registration and voting
    4. Analyzing changes in college admissions
    5. Predicting airplane arrival time
    6. Summary
  7. Data Wrangling
    1. Reading a CSV file
    2. Reading another CSV file
    3. Manipulating data with dplyr
      1. Converting a data frame to a dplyr table
      2. Getting a quick overview of the data value ranges
    4. Sampling a dataset
      1. Filtering rows in a data frame
      2. Adding a column to a data frame
      3. Obtaining a summary on a calculated field
      4. Piping data between functions
      5. Obtaining the 99% quantile
      6. Obtaining a summary on grouped data
    5. Tidying up data with tidyr
    6. Summary
  8. Jupyter Dashboards
    1. Visualizing glyph ready data
    2. Publishing a notebook
      1. Font markdown
      2. List markdown
      3. Heading markdown
      4. Table markdown
      5. Code markdown
      6. More markdown
    3. Creating a Shiny dashboard
      1. R application coding
      2. Publishing your dashboard
    4. Building standalone dashboards
    5. Summary
  9. Statistical Modeling
    1. Converting JSON to CSV
    2. Evaluating Yelp reviews
      1. Summary data
      2. Review spread
      3. Finding the top rated firms
      4. Finding the most rated firms
      5. Finding all ratings for a top rated firm
      6. Determining the correlation between ratings and number of reviews
      7. Building a model of reviews
    3. Using Python to compare ratings
    4. Visualizing average ratings by cuisine
    5. Arbitrary search of ratings
    6. Determining relationships between number of ratings and ratings
      1. Summary
  10. Machine Learning Using Jupyter
    1. Naive Bayes
      1. Naive Bayes using R
      2. Naive Bayes using Python
    2. Nearest neighbor estimator
      1. Nearest neighbor using R
      2. Nearest neighbor using Python
    3. Decision trees
      1. Decision trees in R
      2. Decision trees in Python
    4. Neural networks
      1. Neural networks in R
    5. Random forests
      1. Random forests in R
    6. Summary
  11. Optimizing Jupyter Notebooks
    1. Deploying notebooks
      1. Deploying to JupyterHub
        1. Installing JupyterHub
        2. Accessing a JupyterHub Installation
      2. Jupyter hosting
    2. Optimizing your script
      1. Optimizing your Python scripts
        1. Determining how long a script takes
        2. Using Python regular expressions
        3. Using Python string handling
        4. Minimizing loop operations
        5. Profiling your script
      2. Optimizing your R scripts
        1. Using microbenchmark to profile R script
        2. Modifying provided functionality
        3. Optimizing name lookup
        4. Optimizing data frame value extraction
        5. Changing R Implementation
        6. Changing algorithms
    3. Monitoring Jupyter
    4. Caching your notebook
    5. Securing a notebook
      1. Managing notebook authorization
      2. Securing notebook content
    6. Scaling Jupyter Notebooks
    7. Sharing Jupyter Notebooks
      1. Sharing Jupyter Notebook on a notebook server
      2. Sharing encrypted Jupyter Notebook on a notebook server
      3. Sharing notebook on a web server
      4. Sharing notebook on Docker
    8. Converting a notebook
    9. Versioning a notebook
    10. Summary