Chapter 5. Key Components of a DataOps Ecosystem

There are many ways to think about the potential components of a next-generation data ecosystem for the enterprise. Our friends at DataKitchen have done a good job with this post, which refers to some solid work by the Eckerson Group. In the interest of trying to simplify the context of what you might consider buying versus building and which vendors you might consider, I’ve tried to lay out the primary components of a next-generation enterprise data ecosystem based on the environments I’ve seen people configuring over the past 8 to 10 years and the tools (new and old) that are available. We can summarize these components as follows:

  • Catalog/Registry

  • Movement/ETL

  • Alignment/Unification

  • Storage

  • Publishing

  • Feedback

I provide a brief summary of each of these components in the sections that follow.

Catalog/Registry

Over the past 5 to 10 years, a key function has emerged as a critical starting point in the development of a functional DataOps ecosystem—the data catalog/registry. There are a number of FOSS and commercial projects that are attempts to provide tools that enable large enterprises to answer the simple question, “What data do you have?” Apache Atlas, Alation, and Waterline are the projects that I see the most in my work at Tamr and discussions with my chief data officer friends. I’ve always believed that the best data catalog/registry is a “vendor neutral” system that crawls all tabular data and registers ...

Get Getting DataOps Right now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.