O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Python for Data Science For Dummies

Book Description

Unleash the power of Python for your data analysis projects with For Dummies!

Python is the preferred programming language for data scientists and combines the best features of Matlab, Mathematica, and R into libraries specific to data analysis and visualization. Python for Data Science For Dummies shows you how to take advantage of Python programming to acquire, organize, process, and analyze large amounts of information and use basic statistics concepts to identify trends and patterns. You'll get familiar with the Python development environment, manipulate data, design compelling visualizations, and solve scientific computing challenges as you work your way through this user-friendly guide.

  • Covers the fundamentals of Python data analysis programming and statistics to help you build a solid foundation in data science concepts like probability, random distributions, hypothesis testing, and regression models

  • Explains objects, functions, modules, and libraries and their role in data analysis

  • Walks you through some of the most widely-used libraries, including NumPy, SciPy, BeautifulSoup, Pandas, and MatPlobLib

  • Whether you're new to data analysis or just new to Python, Python for Data Science For Dummies is your practical guide to getting a grip on data overload and doing interesting things with the oodles of information you uncover.

    Table of Contents

      1. Cover
      2. Introduction
        1. About This Book
        2. Foolish Assumptions
        3. Icons Used in This Book
        4. Beyond the Book
        5. Where to Go from Here
      3. Part I: Getting Started with Python for Data Science
        1. Chapter 1: Discovering the Match between Data Science and Python
          1. Defining the Sexiest Job of the 21st Century
          2. Creating the Data Science Pipeline
          3. Understanding Python’s Role in Data Science
          4. Learning to Use Python Fast
        2. Chapter 2: Introducing Python’s Capabilities and Wonders
          1. Why Python?
          2. Working with Python
          3. Performing Rapid Prototyping and Experimentation
          4. Considering Speed of Execution
          5. Visualizing Power
          6. Using the Python Ecosystem for Data Science
        3. Chapter 3: Setting Up Python for Data Science
          1. Considering the Off-the-Shelf Cross-Platform Scientific Distributions
          2. Installing Anaconda on Windows
          3. Installing Anaconda on Linux
          4. Installing Anaconda on Mac OS X
          5. Downloading the Datasets and Example Code
        4. Chapter 4: Reviewing Basic Python
          1. Working with Numbers and Logic
          2. Creating and Using Strings
          3. Interacting with Dates
          4. Creating and Using Functions
          5. Using Conditional and Loop Statements
          6. Storing Data Using Sets, Lists, and Tuples
          7. Defining Useful Iterators
          8. Indexing Data Using Dictionaries
      4. Part II: Getting Your Hands Dirty with Data
        1. Chapter 5: Working with Real Data
          1. Uploading, Streaming, and Sampling Data
          2. Accessing Data in Structured Flat-File Form
          3. Sending Data in Unstructured File Form
          4. Managing Data from Relational Databases
          5. Interacting with Data from NoSQL Databases
          6. Accessing Data from the Web
        2. Chapter 6: Conditioning Your Data
          1. Juggling between NumPy and pandas
          2. Validating Your Data
          3. Manipulating Categorical Variables
          4. Dealing with Dates in Your Data
          5. Dealing with Missing Data
          6. Slicing and Dicing: Filtering and Selecting Data
          7. Concatenating and Transforming
          8. Aggregating Data at Any Level
        3. Chapter 7: Shaping Data
          1. Working with HTML Pages
          2. Working with Raw Text
          3. Using the Bag of Words Model and Beyond
          4. Working with Graph Data
        4. Chapter 8: Putting What You Know in Action
          1. Contextualizing Problems and Data
          2. Considering the Art of Feature Creation
          3. Performing Operations on Arrays
      5. Part III: Visualizing the Invisible
        1. Chapter 9: Getting a Crash Course in MatPlotLib
          1. Starting with a Graph
          2. Setting the Axis, Ticks, Grids
          3. Defining the Line Appearance
          4. Using Labels, Annotations, and Legends
        2. Chapter 10: Visualizing the Data
          1. Choosing the Right Graph
          2. Creating Advanced Scatterplots
          3. Plotting Time Series
          4. Plotting Geographical Data
          5. Visualizing Graphs
        3. Chapter 11: Understanding the Tools
          1. Using the IPython Console
          2. Using IPython Notebook
          3. Performing Multimedia and Graphic Integration
      6. Part IV: Wrangling Data
        1. Chapter 12: Stretching Python’s Capabilities
          1. Playing with Scikit-learn
          2. Performing the Hashing Trick
          3. Considering Timing and Performance
          4. Running in Parallel
        2. Chapter 13: Exploring Data Analysis
          1. The EDA Approach
          2. Defining Descriptive Statistics for Numeric Data
          3. Counting for Categorical Data
          4. Creating Applied Visualization for EDA
          5. Understanding Correlation
          6. Modifying Data Distributions
        3. Chapter 14: Reducing Dimensionality
          1. Understanding SVD
          2. Performing Factor and Principal Component Analysis
          3. Understanding Some Applications
        4. Chapter 15: Clustering
          1. Clustering with K-means
          2. Performing Hierarchical Clustering
          3. Moving Beyond the Round-Shaped Clusters: DBScan
        5. Chapter 16: Detecting Outliers in Data
          1. Considering Detection of Outliers
          2. Examining a Simple Univariate Method
          3. Developing a Multivariate Approach
      7. Part V: Learning from Data
        1. Chapter 17: Exploring Four Simple and Effective Algorithms
          1. Guessing the Number: Linear Regression
          2. Moving to Logistic Regression
          3. Making Things as Simple as Naïve Bayes
          4. Learning Lazily with Nearest Neighbors
        2. Chapter 18: Performing Cross-Validation, Selection, and Optimization
          1. Pondering the Problem of Fitting a Model
          2. Cross-Validating
          3. Selecting Variables Like a Pro
          4. Pumping Up Your Hyperparameters
        3. Chapter 19: Increasing Complexity with Linear and Nonlinear Tricks
          1. Using Nonlinear Transformations
          2. Regularizing Linear Models
          3. Fighting with Big Data Chunk by Chunk
          4. Understanding Support Vector Machines
        4. Chapter 20: Understanding the Power of the Many
          1. Starting with a Plain Decision Tree
          2. Making Machine Learning Accessible
          3. Boosting Predictions
      8. Part VI: The Part of Tens
        1. Chapter 21: Ten Essential Data Science Resource Collections
          1. Gaining Insights with Data Science Weekly
          2. Obtaining a Resource List at U Climb Higher
          3. Getting a Good Start with KDnuggets
          4. Accessing the Huge List of Resources on Data Science Central
          5. Obtaining the Facts of Open Source Data Science from Masters
          6. Locating Free Learning Resources with Quora
          7. Receiving Help with Advanced Topics at Conductrics
          8. Learning New Tricks from the Aspirational Data Scientist
          9. Finding Data Intelligence and Analytics Resources at AnalyticBridge
          10. Zeroing In on Developer Resources with Jonathan Bower
        2. Chapter 22: Ten Data Challenges You Should Take
          1. Meeting the Data Science London + Scikit-learn Challenge
          2. Predicting Survival on the Titanic
          3. Finding a Kaggle Competition that Suits Your Needs
          4. Honing Your Overfit Strategies
          5. Trudging Through the MovieLens Dataset
          6. Getting Rid of Spam Emails
          7. Working with Handwritten Information
          8. Working with Pictures
          9. Analyzing Amazon.com Reviews
          10. Interacting with a Huge Graph
      9. About the Authors
      10. Cheat Sheet
      11. Connect with Dummies
      12. End User License Agreement