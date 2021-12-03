Book description
The world around us is full of data that holds unique insights and valuable stories, and this book will help you uncover them. Whether you already work with data or want to learn more about its possibilities, the examples and techniques in this practical book will help you more easily clean, evaluate, and analyze data so that you can generate meaningful insights and compelling visualizations.
Complementing foundational concepts with expert advice, author Susan E. McGregor provides the resources you need to extract, evaluate, and analyze a wide variety of data sources and formats, along with the tools to communicate your findings effectively. This book delivers a methodical, jargon-free way for data practitioners at any level, from true novices to seasoned professionals, to harness the power of data.
- Use Python 3.8+ to read, write, and transform data from a variety of sources
- Understand and use programming basics in Python to wrangle data at scale
- Organize, document, and structure your code using best practices
- Collect data from structured data files, web pages, and APIs
- Perform basic statistical analyses to make meaning from datasets
- Visualize and present data in clear and compelling ways
Table of contents
- Preface
-
1. Introduction to Data Wrangling
and Data Quality
- What Is “Data Wrangling”?
- What Is “Data Quality”?
- Why Python?
- Writing and “Running” Python
- Working with Python on Your Own Device
- Working with Python Online
- Hello World!
- Adding the Code
- Running the Code
- Documenting, Saving, and Versioning Your Work
- Conclusion
-
2. Introduction to Python
- The Programming “Parts of Speech”
- Taking Control: Loops and Conditionals
- Understanding Errors
- Hitting the Road with Citi Bike Data
- Conclusion
- 3. Understanding Data Quality
-
4. Working with File-Based and Feed-Based Data in Python
- Structured Versus Unstructured Data
- Working with Structured Data
- Real-World Data Wrangling: Understanding Unemployment
- Working with Unstructured Data
- Conclusion
-
5. Accessing Web-Based Data
- Accessing Online XML and JSON
- Introducing APIs
- Basic APIs: A Search Engine Example
- Specialized APIs: Adding Basic Authentication
- Reading API Documentation
- Protecting Your API Key When Using Python
- Specialized APIs: Working With OAuth
- API Ethics
- Web Scraping: The Data Source of Last Resort
- Conclusion
-
6. Assessing Data Quality
- The Pandemic and the PPP
- Assessing Data Integrity
- Assessing Data Fit
- Conclusion
-
7. Cleaning, Transforming,
and Augmenting Data
- Selecting a Subset of Citi Bike Data
- De-crufting Data Files
- Decrypting Excel Dates
- Generating True CSVs from Fixed-Width Data
- Correcting for Spelling Inconsistencies
- The Circuitous Path to “Simple” Solutions
- Gotchas That Will Get Ya!
- Augmenting Your Data
- Conclusion
-
8. Structuring and Refactoring Your Code
- Revisiting Custom Functions
- Understanding Scope
- Defining the Parameters for Function “Ingredients”
- Return Values
- Climbing the “Stack”
- Refactoring for Fun and Profit
- Documenting Your Custom Scripts and Functions with pydoc
- The Case for Command-Line Arguments
- Where Scripts and Notebooks Diverge
- Conclusion
-
9. Introduction to Data Analysis
- Context Is Everything
- Same but Different
- What’s Typical? Evaluating Central Tendency
- Think Different: Identifying Outliers
- Visualization for Data Analysis
- The $2 Million Question
- Proportional Response
- Conclusion
-
10. Presenting Your Data
- Foundations for Visual Eloquence
- Making Your Data Statement
- Charts, Graphs, and Maps: Oh My!
- Elements of Eloquent Visuals
- From Basic to Beautiful: Customizing a Visualization with seaborn and matplotlib
- Beyond the Basics
- Conclusion
- 11. Beyond Python
- A. More Python Programming Resources
- B. A Bit More About Git
- C. Finding Data
- D. Resources for Visualization and Information Design
- Index
- About the Author
Product information
- Title: Practical Python Data Wrangling and Data Quality
- Author(s):
- Release date: December 2021
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492091509
