Data distribution
Data distribution (source: Pixabay)

In a recent episode of the O’Reilly Media Podcast, David Hsieh, senior vice president of marketing at Qubole, sat down with John Slocum, vice president of MediaMath’s data management platform (DMP), to discuss DataOps in the media industry. “DataOps” refers to the promotion of communication between formerly siloed data, teams, and systems. As discussed in Creating a Data-Driven Enterprise with DataOps, a report published by Qubole and O’Reilly in 2016, DataOps leverages process change, organizational realignment, and technology to facilitate relationships between everyone who handles data: developers, data engineers, data scientists, analysts, and business users. As a programmatic advertising platform, MediaMath has a unique lens into the shifting business models across the media industry, and how DataOps is playing a role in those shifts.

During the podcast, Hsieh and Slocum discussed how data has transformed the culture and overall goals of organizations in the media industry in the past 10 years, and shared some best practices for companies that are just embarking on their journey toward becoming data driven.

Here are some highlights from the conversation:

Greater focus on outcomes, return on investment

What we're seeing specifically evolve over the past 10 years since the introduction of programmatic is that clients are more focused on outcomes than they previously were. Outcomes being return on marketing investment, return on spend, whereas previous goals might have just been to spend a particular budget. It might have been driving a particular number of clicks or visitors and driving reach, but with data, incorporating that into our analytics and optimization in our platform, we're able to get a sense of what clients should expect to achieve and help them achieve that. We find our most sophisticated clients are able to differentiate themselves from their competition. We see data providing that differentiation.

Rapid evolution of devices, tool sets leads to sophisticated usage of data

MediaMath has long offered simple aggregated reporting that will help advertisers understand the performance they're seeing in their campaigns. ... That was certainly good enough the first few years of MediaMath's operation, but what we started to see maybe four or five years ago or so, was a lot of demand for more granular insight. More custom insight. Some of our more sophisticated clients were asking for the ability to see performance by a specific sample of audience data. They may not want to see performance aggregated in a particular campaign or a strategy, but they might want to sample performance elsewhere and look for audiences, that may not even be specific, defined audience segments, to see what's popping. ... That really requires storing the data at a user level.

Data-driven cultures thirsty for self-service analytics, prioritize training on basics

We saw this thirst for more answers that the account team suspected they could find in the data, and they just needed a tool set to access that data, to work with it, and they couldn't wait for the analytics team at the time, or data engineers, to answer all of those questions for them. There was really a hunger for self-service access to this data, which Qubole started providing in the form of an analytics platform that somebody who wasn't a data engineer could work with and could use effectively to start asking and answering those questions of the data. I think that's common in data-driven organizations.

There are definitely some ETL processes on the incoming data that we need to do—some data scrubbing, some compression, some processing prior to exposing that user-level data back in our platform. We're using AWS services to manage the data, specifically, so we're able to permission that data appropriately using roles and assigning the proper user privileges on top of that data. We don't want to take our source of truth and expose that across the organization to everyone with read/write/delete privileges because we don't want to be messing up that data set. So, thinking that through, ensuring that you have some admin control over the data sets that you're exposing, and then a basic user group or user role that keeps the newer or less practiced users from getting into trouble is important, and then working with those users as well, identifying the folks who need access to this tool, who want access to this tool, and working with them to train them on the basics. Write a query using partitions on the data so you're not kicking off massive MapReduce jobs that are going to tie up your cluster for the rest of the afternoon. Little things like that, that seem kind of simple to a more experienced analyst, are important to communicate out to a larger user base.

Start at the beginning: Know the problem you’re trying to solve

I think the most important thing to think about before pursuing a data-driven approach is the problem that you're trying to solve. Ultimately, what are you trying to do with data, that you think you can do with data, or that you suspect you can do with data? You might not know the exact answer to that question, and that's totally fine. For MediaMath, what we think we can do with data is drive outcomes and performance for our clients, and we have a variety of questions that stem from that overarching goal. ... Who do you want helping you get to that objective? How do you see that happening? What tool(s) do you need to get there, and what's the approach that you want to take? The rest is execution.

This post is part of a collaboration between O'Reilly and Qubole. See our statement of editorial independence.

Article image: Data distribution (source: Pixabay).