Chapter 5. Data Movement Service
In the journey of developing insights to solve business problems, we’ve discussed discovering existing datasets and their metadata, and reusable artifacts and features that can be used to develop the insights. Often, data attributes from different data warehouses or application databases must be aggregated for building insights. For example, the revenue dashboard will require attributes from billing, product codes, and special offers to be moved into a common datastore that is then queried and joined to update the dashboard every few hours or in real time. Data users spend 16% of their time moving data. Today, data movement causes pain points for orchestrating the data movement across heterogeneous data sources, verifying data correctness between the source and target on an ongoing basis, and adapting to any schema or configuration changes that commonly occur on the data source.
Ensuring the data attributes from the different sources are available in a timely fashion is one of the major pain points. The time spent making data available impacts productivity and slows down the overall time to insight. Ideally, moving data should be self-service such that data users select a source, a target, and a schedule to move data. The success criteria for such a service is reducing the time to data availability.
Journey Map
This section talks about the different scenarios in the data scientist’s journey map where data movement is required.
Aggregating Data ...
Get The Self-Service Data Roadmap now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.