PART IVData Wrangling
Data wrangling refers to the process of transforming a dataset into a form that is easier to understand and easier to work with. Data Wrangling skills will help you to work faster and more efficient. Good preparation of data will also lead to better models.
We will use data wrangling in its narrow sense: transforming data. Some businesses will use the word in a larger sense that also includes data visualization and modelling. This might be useful if the “Data and Analytics Team” will work on data end-to-end (from collecting the data from the database systems up to the final presentation). We will treat this in a separate section of this book: Part VII “Reporting” on page 685.
In many companies, “data wrangling” is used as somehow equivalent to “building a datamart.” A data-mart is the something like a supermarket for data, where the modeller can pick up the data in a format that is ready to use. The data-mart can also be seen as the product of data-wrangling.
Data wrangling, just as modelling and writing code, is as much a form of art as it is a science. Wrangling in particular cannot be done without knowing the steps before and the steps after: we need to understand the whole story before we can do a good job. The main goal is transforming the data that is obtained from the transaction system or data-warehouse in such form that it becomes directly useful for making a model.
For example, consider that we are making a credit scorecard ...
Get The Big R-Book now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.