Python for Data Science For Dummies, 2nd Edition

Book description

The fast and easy way to learn Python programming and statistics

Python is a general-purpose programming language created in the late 1980s—and named after Monty Python—that's used by thousands of people to do things from testing microchips at Intel, to powering Instagram, to building video games with the PyGame library. 

Python For Data Science For Dummies is written for people who are new to data analysis, and discusses the basics of Python data analysis programming and statistics. The book also discusses Google Colab, which makes it possible to write Python code in the cloud.

  • Get started with data science and Python
  • Visualize information
  • Wrangle data
  • Learn from data

The book provides the statistical background needed to get started in data science programming, including probability, random distributions, hypothesis testing, confidence intervals, and building regression models for prediction.

Table of contents

  1. Cover
  2. Introduction
    1. About This Book
    2. Foolish Assumptions
    3. Icons Used in This Book
    4. Beyond the Book
    5. Where to Go from Here
  3. Part 1: Getting Started with Data Science and Python
    1. Chapter 1: Discovering the Match between Data Science and Python
      1. Defining the Sexiest Job of the 21st Century
      2. Creating the Data Science Pipeline
      3. Understanding Python’s Role in Data Science
      4. Learning to Use Python Fast
    2. Chapter 2: Introducing Python’s Capabilities and Wonders
      1. Why Python?
      2. Working with Python
      3. Performing Rapid Prototyping and Experimentation
      4. Considering Speed of Execution
      5. Visualizing Power
      6. Using the Python Ecosystem for Data Science
    3. Chapter 3: Setting Up Python for Data Science
      1. Considering the Off-the-Shelf Cross-Platform Scientific Distributions
      2. Installing Anaconda on Windows
      3. Installing Anaconda on Linux
      4. Installing Anaconda on Mac OS X
      5. Downloading the Datasets and Example Code
    4. Chapter 4: Working with Google Colab
      1. Defining Google Colab
      2. Getting a Google Account
      3. Working with Notebooks
      4. Performing Common Tasks
      5. Using Hardware Acceleration
      6. Executing the Code
      7. Viewing Your Notebook
      8. Sharing Your Notebook
      9. Getting Help
  4. Part 2: Getting Your Hands Dirty with Data
    1. Chapter 5: Understanding the Tools
      1. Using the Jupyter Console
      2. Using Jupyter Notebook
      3. Performing Multimedia and Graphic Integration
    2. Chapter 6: Working with Real Data
      1. Uploading, Streaming, and Sampling Data
      2. Accessing Data in Structured Flat-File Form
      3. Sending Data in Unstructured File Form
      4. Managing Data from Relational Databases
      5. Interacting with Data from NoSQL Databases
      6. Accessing Data from the Web
    3. Chapter 7: Conditioning Your Data
      1. Juggling between NumPy and pandas
      2. Validating Your Data
      3. Manipulating Categorical Variables
      4. Dealing with Dates in Your Data
      5. Dealing with Missing Data
      6. Slicing and Dicing: Filtering and Selecting Data
      7. Concatenating and Transforming
      8. Aggregating Data at Any Level
    4. Chapter 8: Shaping Data
      1. Working with HTML Pages
      2. Working with Raw Text
      3. Using the Bag of Words Model and Beyond
      4. Working with Graph Data
    5. Chapter 9: Putting What You Know in Action
      1. Contextualizing Problems and Data
      2. Considering the Art of Feature Creation
      3. Performing Operations on Arrays
  5. Part 3: Visualizing Information
    1. Chapter 10: Getting a Crash Course in MatPlotLib
      1. Starting with a Graph
      2. Setting the Axis, Ticks, Grids
      3. Defining the Line Appearance
      4. Using Labels, Annotations, and Legends
    2. Chapter 11: Visualizing the Data
      1. Choosing the Right Graph
      2. Creating Advanced Scatterplots
      3. Plotting Time Series
      4. Plotting Geographical Data
      5. Visualizing Graphs
  6. Part 4: Wrangling Data
    1. Chapter 12: Stretching Python’s Capabilities
      1. Playing with Scikit-learn
      2. Performing the Hashing Trick
      3. Considering Timing and Performance
      4. Running in Parallel on Multiple Cores
    2. Chapter 13: Exploring Data Analysis
      1. The EDA Approach
      2. Defining Descriptive Statistics for Numeric Data
      3. Counting for Categorical Data
      4. Creating Applied Visualization for EDA
      5. Understanding Correlation
      6. Modifying Data Distributions
    3. Chapter 14: Reducing Dimensionality
      1. Understanding SVD
      2. Performing Factor Analysis and PCA
      3. Understanding Some Applications
    4. Chapter 15: Clustering
      1. Clustering with K-means
      2. Performing Hierarchical Clustering
      3. Discovering New Groups with DBScan
    5. Chapter 16: Detecting Outliers in Data
      1. Considering Outlier Detection
      2. Examining a Simple Univariate Method
      3. Developing a Multivariate Approach
  7. Part 5: Learning from Data
    1. Chapter 17: Exploring Four Simple and Effective Algorithms
      1. Guessing the Number: Linear Regression
      2. Moving to Logistic Regression
      3. Making Things as Simple as Naïve Bayes
      4. Learning Lazily with Nearest Neighbors
    2. Chapter 18: Performing Cross-Validation, Selection, and Optimization
      1. Pondering the Problem of Fitting a Model
      2. Cross-Validating
      3. Selecting Variables Like a Pro
      4. Pumping Up Your Hyperparameters
    3. Chapter 19: Increasing Complexity with Linear and Nonlinear Tricks
      1. Using Nonlinear Transformations
      2. Regularizing Linear Models
      3. Fighting with Big Data Chunk by Chunk
      4. Understanding Support Vector Machines
      5. Playing with Neural Networks
    4. Chapter 20: Understanding the Power of the Many
      1. Starting with a Plain Decision Tree
      2. Making Machine Learning Accessible
      3. Boosting Predictions
  8. Part 6: The Part of Tens
    1. Chapter 21: Ten Essential Data Resources
      1. Discovering the News with Subreddit
      2. Getting a Good Start with KDnuggets
      3. Locating Free Learning Resources with Quora
      4. Gaining Insights with Oracle’s Data Science Blog
      5. Accessing the Huge List of Resources on Data Science Central
      6. Learning New Tricks from the Aspirational Data Scientist
      7. Obtaining the Most Authoritative Sources at Udacity
      8. Receiving Help with Advanced Topics at Conductrics
      9. Obtaining the Facts of Open Source Data Science from Masters
      10. Zeroing In on Developer Resources with Jonathan Bower
    2. Chapter 22: Ten Data Challenges You Should Take
      1. Meeting the Data Science London + Scikit-learn Challenge
      2. Predicting Survival on the Titanic
      3. Finding a Kaggle Competition that Suits Your Needs
      4. Honing Your Overfit Strategies
      5. Trudging Through the MovieLens Dataset
      6. Getting Rid of Spam E-mails
      7. Working with Handwritten Information
      8. Working with Pictures
      9. Analyzing Amazon.com Reviews
      10. Interacting with a Huge Graph
  9. Index
  10. About the Authors
  11. Advertisement Page
  12. Connect with Dummies
  13. End User License Agreement

Product information

  • Title: Python for Data Science For Dummies, 2nd Edition
  • Author(s): John Paul Mueller, Luca Massaron
  • Release date: February 2019
  • Publisher(s): For Dummies
  • ISBN: 9781119547624