Chapter TwentyStyle Guide Example

Going through and creating a clean data warehouse and marts makes the data easier to understand for everyone. As an organization grows, more than one person will help clean and transform the data. There are aspects of cleaning that are subjective, and we need to codify what we want to happen. For example, we will want to rename a column to make it more readable, but should we use snake case (e.g. example_new_column) or camel case (e.g. ExampleNewColumn)? In the book we said snake case, but this is a judgment call and you could go with another convention.

Document cleaning goals and stylistic preferences in a style guide so that we clean the data consistently and others can help.

We have provided sample style guides at the end of the chapter.

Simplify

It's quite common for raw data to be extremely complicated. When data comes into existence, in most cases, it's intended to be used by applications and not directly by business users. Taking time to simplify data significantly improves the ability for business users to successfully query it.

Only Include Fields That Have an Apparent Analytical Purpose

If your system makes it easy to update/add new columns, it's best to start modeling with only the most relevant columns, excluding any columns that have no direct or apparent analytical purpose.

Extract Relevant Data from Complex Data Tpes

Application data sources may contain JSON, arrays, Hstore, and other complex data types. These are typically ...

Get The Informed Company now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.