Integrating across silos with operational data hubs

The O'Reilly Podcast: Ken Krupa on the challenge of data integration, and a solution.

By Jon Bruner
June 6, 2017
Storage silos. Storage silos. (source: Pixabay)

Over the last five or so years, leaders at big companies have recognized that data can be an essential driver of business value. They’ve invested quickly in data and analytics systems, but they’ve often wound up with duplicate systems serving different stakeholders. The result is siloed data. The dream of big data—that managers could draw together data from disparate functions like operations, marketing, and sales to make better decisions—can feel out of reach.

As a result, data integration has become a priority for managers who have collected useful data from a lot of different sources, but aren’t yet able to draw it together. In this podcast episode, Ken Krupa, enterprise CTO at MarkLogic, spells out the challenge—the integration of what he calls “observe-the-business data,” which managers use for developing strategy, and “run-the-business data,” which drives operations—and outlines an approach for dealing with it known as the “operational data hub.”

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Krupa also emphasizes the importance of “data-centric architecture”—a term he uses to refer to architectures that collect and expose data beyond what’s strictly needed for a particular use case. “Typical service-oriented architecture is just about the coarse-grained endpoints,” he says, whereas the operational data hub provides a view into the underlying data at a more granular level. “You don’t just have to deal with black box services,” he says.

“It’s a nice concept to say, ‘hey, we could abstract everything from everything, and we don’t know what happens to the data, and we don’t care what happens to the data,’ but ignorance is not bliss,” says Krupa. “The whole point of gathering and analyzing data is to harmonize what’s happening across your business and to see what’s happening across your business, and that requires access to integrated data.”

What about data governance? In many companies, silos exist as a result of efforts to implement data governance—sealing off customer account data from marketing data, for instance. Krupa points out that silos don’t really address the problem; two pieces of data with different security profiles might be combined to create an insight with an even more sensitive security profile. The best response, says Krupa, is to “embrace governance from the get-go,” and apply new policies incrementally.

This post and podcast is a collaboration between O’Reilly and MarkLogic. See our statement of editorial independence

Post topics: Data science