Chapter 1. Why Self-Service Data Prep?

With every organization swimming in data lakes, repositories, and warehouses, never before have their employees had such an enormous opportunity to answer their questions with information rather than just their experience and gut instinct.

This isn’t that different from where organizations stood a decade ago, or even longer. What is different is who wants access to that data to answer their questions. No longer is the expectation that a separate function of the business will be responsible for getting that data; now, everyone feels they should have access to it. So what has changed? Self-service data visualization. What is about to change to take this to the next level? Self-service data preparation.

A Short History of Self-Service Data Visualization

More than a decade ago, all things data related were the domain of specialist teams. Data projects went to either Business Intelligence (BI) teams or Information Technology (IT) teams, who set up data infrastructure projects to produce reports. This was an expensive and time-consuming process that often resulted in products that were less than ideal for all concerned.

The reason this methodology doesn’t work is the iterative nature of BI. Humans are fundamentally intelligent creatures who like to explore a question, learn, and then ask more questions as they become more intrigued. Humans are also great at spotting visual patterns and charting data sets to help find those patterns. With the traditional IT or BI projects, once the first piece of analysis was delivered, the project was over. The initial question was answered, but because the people doing the analysis were different from those asking the questions, any follow-up questions simply went unanswered or were cobbled together from disparate reports or different levels of aggregation.

This all changed with the rise of self-service data visualization tools like Tableau Desktop. With these tools’ focus on the user, suddenly individuals were able to drag and drop data fields around the screen to form their own analysis, answer their own questions, and immediately ask their next questions in a visual way that allowed them to share their findings with others.

The previous decade has seen data visualization and analysis become increasingly important throughout the organization, and a significant part of many roles that are no longer considered solely the domain of IT or data teams. Analytical capacity has come to the business, rather than the business having to go and ask specialists to access and analyze the data. This represents a big transformation in how we work and requires us to rethink what skills people now need.

Accessing the “Right Data”

The rise, and entrenchment, of self-service data visualization into individuals’ roles has surfaced needs and tensions in the analytical cycle. The analytical cycle involves:

  1. Having a question posed from someone

  2. Sourcing data that may help answer the question

  3. Preparing the data for analysis

  4. Analyzing the data

  5. Forming new or additional questions (returning to step 1)

Enabling self-service requires opening access to data sources, which has traditionally been a pain point in this cycle. With the right data, optimized for use in visual analysis tools, we can now find answers as soon as the business expert can form the questions. But accessing the “right data” is not that easy. The data assets owned by organizations are optimized for storage, optimized for tools that now seem to work against users rather than with them, and regulated by strict security layers requiring coding to access the data.

Many data projects are now focused on extracting data from their storage locations. The specialists are focused on using data skills to:

  • Find data in existing repositories, including Excel workbooks

  • Find data in public or third-party repositories

  • Create feeds of data from previously inaccessible sources and systems

The gap in the analytical cycle now sits between taking these sources and preparing them for visual analytics. There are different levels of complexity to this work (as with any skill in life), from opening an Excel spreadsheet to running application programming interface (API) queries. The challenges of data preparation with Excel or coding are different, but they highlight why another solution is necessary. Excel is very flexible and a tool many people are familiar with, but it is difficult to automate. Therefore, it often requires a lot of manual rework when data is updated. Excel’s nonvisual nature also makes it easier to introduce errors without noticing. With coding, having to make manual updates isn’t as much of an issue because the coding is often built to rerun at the press of a button. However, in most workplaces, fewer people understand coding than Excel, making it difficult to hand over this work to colleagues. And, as in Excel, it’s easy to make mistakes and not notice them because you don’t see the data until the end of the transformation. So, although there is a significant need for individuals to be able to complete data preparation tasks, for a long time there has been a gap between that need and the ability to access tools that help fulfill it.

The Self-Service Data Preparation Opportunity

This gap is being addressed by new tools that enable the business experts to access data and answer their questions using self-service visual analytics. Tableau Prep Builder makes the process of data preparation easier than other tools by bringing the same logic that enabled visual analytics to the data preparation process. By using a user interface (Figure 1-1) similar to the one that data visualizers are already accustomed to, Prep Builder makes the transition to self-service data preparation a simple one, even for those trying to complete these tasks for the first time.

The Tableau Prep Builder interface
Figure 1-1. The Tableau Prep Builder interface

Data preparation isn’t just the process of preparing a data set to make some charts as a one-off exercise. It encompasses many tasks, such as confirming accuracy of the data points, removing extraneous data, reshaping the data set to optimize it for easy analysis, and doing all of this efficiently so people aren’t left waiting for the data to be ready so they can answer their questions. Good data preparation enables use of the data set in a timely manner, avoids wasting time on manual manipulation (thanks to tools like Tableau Prep Builder), and creates a repeatable process that anyone can use. The aim of this book is that the time you spend learning these data preparation skills will be repaid many times over as you begin to deploy them in the real world many times over. After all, the more time you spend manually preparing data means the less time you have to analyze it—if you decide to use such a messy data set at all.

To date, my career has taken me through all of the roles in the traditional analytics cycle—from a user suffering through the pain of waiting for reports to be built for me, to learning how to build the analysis for myself, to wrangling the data sets in the database, and finally, to training others in data visualization and data preparation skills.

There is still a significant gap between potential data preparers (“data preppers”) and skilled ones. Learning what to do with the self-service data preparation tools, and why they are needed, is a significant undertaking but one that is worthwhile. Prep Builder will make the knowledge I have gained over many years a lot easier for you to learn and deploy straightaway.

Tableau Prep Up and Running

This book is designed for those who would benefit from being able to prepare their own data but lack the skills to do so. Over the next chapters you’ll learn how to utilize the tools to tackle the tasks that are currently acting as roadblocks in your organization. The exercises will introduce commonly used techniques to solve these problems away from the pressure of the workplace. Over time, you will develop your own strategies to prepare the data exactly how you want it, whether it comes from files, databases, surveys, pivot tables, messy data, or tangled text fields. There isn’t a straightforward recipe to follow, but by practicing, you’ll soon be able to handle these challenges.

Summary

Hopefully, you now have a view of why self-service data preparation is vital and why learning the techniques this book covers will assist you in your work with data. How you should approach those challenges and where you should start are the subjects of the next chapter.

Get Tableau Prep: Up & Running now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.