Foundations for Analytics with Python

Book description

If you’re like many of Excel’s 750 million users, you want to do more with your data—like repeating similar analyses over hundreds of files, or combining data in many files for analysis at one time. This practical guide shows ambitious non-programmers how to automate and scale the processing and analysis of data in different formats—by using Python. After author Clinton Brownley takes you through Python basics, you’ll be able to write simple scripts for processing data in spreadsheets as well as databases.

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Why Read This Book? Why Learn These Skills?
    2. Who Is This Book For?
    3. Why Windows?
    4. Why Python?
    5. Base Python and pandas
    6. Anaconda Python
      1. Installing Anaconda Python (Windows or Mac)
    7. Text Editors
    8. Download Book Materials
    9. Overview of Chapters
    10. Conventions Used in This Book
    11. Using Code Examples
    12. O’Reilly Safari
    13. How to Contact Us
    14. Acknowledgments
  2. 1. Python Basics
    1. How to Create a Python Script
    2. How to Run a Python Script
    3. Useful Tips for Interacting with the Command Line
    4. Python’s Basic Building Blocks
      1. Numbers
      2. Strings
      3. Regular Expressions and Pattern Matching
      4. Dates
      5. Lists
      6. Tuples
      7. Dictionaries
      8. Control Flow
    5. Reading a Text File
      1. Create a Text File
      2. Script and Input File in Same Location
      3. Modern File-Reading Syntax
    6. Reading Multiple Text Files with glob
      1. Create Another Text File
    7. Writing to a Text File
      1. Add Code to
      2. Writing to a Comma-Separated Values (CSV) File
    8. print Statements
    9. Chapter Exercises
  3. 2. Comma-Separated Values (CSV) Files
    1. Base Python Versus pandas
      1. Read and Write a CSV File (Part 1)
      2. How Basic String Parsing Can Fail
      3. Read and Write a CSV File (Part 2)
    2. Filter for Specific Rows
      1. Value in Row Meets a Condition
      2. Value in Row Is in a Set of Interest
      3. Value in Row Matches a Pattern/Regular Expression
    3. Select Specific Columns
      1. Column Index Values
      2. Column Headings
    4. Select Contiguous Rows
    5. Add a Header Row
    6. Reading Multiple CSV Files
      1. Count Number of Files and Number of Rows and Columns in Each File
    7. Concatenate Data from Multiple Files
    8. Sum and Average a Set of Values per File
    9. Chapter Exercises
  4. 3. Excel Files
    1. Introspecting an Excel Workbook
    2. Processing a Single Worksheet
      1. Read and Write an Excel File
      2. Filter for Specific Rows
      3. Select Specific Columns
    3. Reading All Worksheets in a Workbook
      1. Filter for Specific Rows Across All Worksheets
      2. Select Specific Columns Across All Worksheets
    4. Reading a Set of Worksheets in an Excel Workbook
      1. Filter for Specific Rows Across a Set of Worksheets
    5. Processing Multiple Workbooks
      1. Count Number of Workbooks and Rows and Columns in Each Workbook
      2. Concatenate Data from Multiple Workbooks
      3. Sum and Average Values per Workbook and Worksheet
    6. Chapter Exercises
  5. 4. Databases
    1. Python’s Built-in sqlite3 Module
      1. Insert New Records into a Table
      2. Update Records in a Table
    2. MySQL Database
      1. Insert New Records into a Table
      2. Query a Table and Write Output to a CSV File
      3. Update Records in a Table
    3. Chapter Exercises
  6. 5. Applications
    1. Find a Set of Items in a Large Collection of Files
    2. Calculate a Statistic for Any Number of Categories from Data in a CSV File
    3. Calculate Statistics for Any Number of Categories from Data in a Text File
    4. Chapter Exercises
  7. 6. Figures and Plots
    1. matplotlib
      1. Bar Plot
      2. Histogram
      3. Line Plot
      4. Scatter Plot
      5. Box Plot
    2. pandas
    3. ggplot
    4. seaborn
  8. 7. Descriptive Statistics and Modeling
    1. Datasets
      1. Wine Quality
      2. Customer Churn
    2. Wine Quality
      1. Descriptive Statistics
      2. Grouping, Histograms, and t-tests
      3. Pairwise Relationships and Correlation
      4. Linear Regression with Least-Squares Estimation
      5. Interpreting Coefficients
      6. Standardizing Independent Variables
      7. Making Predictions
    3. Customer Churn
      1. Logistic Regression
      2. Interpreting Coefficients
      3. Making Predictions
  9. 8. Scheduling Scripts to Run Automatically
    1. Task Scheduler (Windows)
    2. The cron Utility (macOS and Unix)
      1. Crontab File: One-Time Set-up
      2. Adding Cron Jobs to the Crontab File
  10. 9. Where to Go from Here
    1. Additional Standard Library Modules and Built-in Functions
      1. Python Standard Library (PSL): A Few More Standard Modules
      2. Built-in Functions
    2. Python Package Index (PyPI): Additional Add-in Modules
      1. NumPy
      2. SciPy
      3. Scikit-Learn
      4. A Few Additional Add-in Packages
    3. Additional Data Structures
      1. Stacks
      2. Queues
      3. Graphs
      4. Trees
    4. Where to Go from Here
  11. A. Download Instructions
    1. Download Python 3
      1. Windows
      2. macOS
    2. Download the xlrd Package
      1. Windows
      2. macOS
    3. Download the MySQL Database Server
      1. Windows
      2. macOS
      3. Setting Up MySQL
    4. Download mysqlclient (Python 3.x)/MySQL-python (Python 2.x)
      1. Windows
      2. macOS
  12. B. Answers to Exercises
    1. Chapter 1
  13. Bibliography
  14. Index

Product information

  • Title: Foundations for Analytics with Python
  • Author(s): Clinton W. Brownley
  • Release date: August 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491922538