4Essentials of Data Wrangling

Menal Dahiya, Nikita Malik* and Sakshi Rana

Dept. of Computer Applications, Maharaja Surajmal Institute, Janakpuri, New Delhi, India

Abstract

Fundamentally, data wrangling is an elaborate process of transforming, enriching, and mapping data from one raw data form into another, to make it more valuable for analysis and enhancing its quality. It is considered as a core task within every action that is performed in the workflow framework of data projects. Wrangling of data begins from accessing the data, followed by transforming it and profiling the transformed data. These wrangling tasks differ according to the types of transformations used. Sometimes, data wrangling can resemble traditional extraction, transformation, and loading (ETL) processes. Through this chapter, various kinds of data wrangling and how data wrangling actions differ across the workflow are described. The dynamics of data wrangling, core transformation and profiling tasks are also explored. This is followed by a case study based on a dataset on forest fires, modified using Excel or Python language, performing the desired transformation and profiling, and presenting statistical and visualization analyses.

Keywords: Data wrangling, workflow framework, data transformation, profiling, core profiling

4.1 Introduction

Data wrangling, which is also known as data munging, is a term that involves mapping data fields in a dataset starting from the source (its original raw form) to destination ...

Get Data Wrangling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.