Creating a Data Pipeline for Consistent Data Collection, Processing, and Dissemination

In a data intensive application, data travels in two directions in two different forms. One form of the data is data that is returned to the end users as part of a request. The process of gathering the data is usually synchronous and in a distributed system, which typically comes from a variety of data sources. Imagine we are building a context service where we want to know everything about a given IP address that tries to access our secured network. The use case would be that we want to block all IP addresses that we know are potentially from known malicious users. Typically, to keep the example discussion simple, what we would do is the following:

Get Architecting Data-Intensive Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.