Foreword
I’ve been involved in developing software for large corporations for several decades, and managing data has always been a major architectural issue. In the early days of my career, there was a lot of enthusiasm for a single enterprise-wide data model, often stored in a single enterprise-wide database. But we soon learned that having a plethora of applications accessing a shared data store was a disaster of ad-hoc coupling. Even without that, deeper problems existed. Core ideas to an enterprise, such as a “customer,” required different data models in different business units. Corporate acquisitions further muddied the waters.
As a response, wiser enterprises have decentralized their data, pushing data storage, models, and management into different business units. That way, the people who best understand the data in their domain are responsible for managing that data. They collaborate with other domains through well-defined APIs. Since these APIs can contain behavior, we have more flexibility for how that data is shared and more importantly, how we evolve data management over time.
While this has been increasingly the way to go for day-to-day operations, data analytics has remained a more centralized activity. Data warehouses aimed to provide an enterprise repository of curated critical information. But such a centralized group struggled with the work and its conflicting customers, particularly since they didn’t have a good understanding of the data or the needs of its ...