Chapter ThreeSource Data Best Practices

After establishing an SQL‐based BI tool to blend and analyze data, there are some best practices to keep in mind to support other people using the data. It's not crucial to spend too much time modeling or cleaning data at this point, but giving some direction can be invaluable at this stage.

Keep a Complexity Wiki Page

Data is inherently messy at the source stage and will be difficult for everyone to gain experience with and use. Since the amount of data sources is small, it is prudent to create a document or wiki page with warnings about known problems so others can navigate the data themselves.

Here are common data problems to document:

  • Descriptions of poorly named columns and tables.
  • Columns with nulls or mostly useless data.
  • Business logic.
  • Complex or confusing join paths.
  • Old or unused columns and tables.

Snippet Dictionary

Another useful document to keep is a dictionary of base queries or SQL snippets for standard metrics (Figure 3.1). When working with unmodeled data, it's common to repeat a lot of the same filters and joins. It's helpful to conveniently copy paste/snippets when needed, rather than to recreate each time they're needed.

These can be stored on a wiki, an SQL file, or even as a dashboard of these queries saved as charts. With the right BI product, the dashboard method is ideal, because these snippets are available as starter charts that can be easily duplicated and adjusted.

Figure 3.1 Using an SQL file in ...

Get The Informed Company now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.