Book description
A key task that any aspiring data-driven organization needs to learn is data wrangling, the process of converting raw data into something truly useful. This practical guide provides business analysts with an overview of various data wrangling techniques and tools, and puts the practice of data wrangling into context by asking, "What are you trying to do and why?"
Wrangling data consumes roughly 50-80% of an analyst’s time before any kind of analysis is possible. Written by key executives at Trifacta, this book walks you through the wrangling process by exploring several factors—time, granularity, scope, and structure—that you need to consider as you begin to work with data. You’ll learn a shared language and a comprehensive understanding of data wrangling, with an emphasis on recent agile analytic processes used by many of today’s data-driven organizations.
Appreciate the importance—and the satisfaction—of wrangling data the right way.
- Understand what kind of data is available
- Choose which data to use and at what level of detail
- Meaningfully combine multiple sources of data
- Decide how to distill the results to a size and shape that can drive downstream analysis
Publisher resources
Table of contents
- Foreword
- 1. Introduction
-
2. A Data Workflow Framework
- How Data Flows During and Across Projects
- Connecting Analytic Actions to Data Movement: A Holistic Workflow Framework for Data Projects
- Raw Data Stage Actions: Ingest Data and Create Metadata
- Refined Data Stage Actions: Create Canonical Data and Conduct Ad Hoc Analyses
- Production Data Stage Actions: Create Production Data and Build Automated Systems
- Data Wrangling within the Workflow Framework
- 3. The Dynamics of Data Wrangling
- 4. Profiling
- 5. Transformation: Structuring
- 6. Transformation: Enriching
- 7. Using Transformation to Clean Data
- 8. Roles and Responsibilities
- 9. Data Wrangling Tools
Product information
- Title: Principles of Data Wrangling
- Author(s):
- Release date: July 2017
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491938928
You might also like
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition
Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. …
book
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
book
Training Data for Machine Learning
Your training data has as much to do with the success of your data project as …
book
Robust Python
Does it seem like your Python projects are getting bigger and bigger? Are you feeling the …