Chapter 5. Real-World Systems
Fast data architectures raise the bar for the “ilities” of distributed data processing. Whereas batch jobs seldom last more than a few hours, a streaming pipeline is designed to run for weeks, months, even years. If you wait long enough, even the most obscure problem is likely to happen.
The umbrella term reactive systems embodies the qualities that real-world systems must meet. These systems must be:
- Responsive
-
The system can always respond in a timely manner, even when it’s necessary to respond that full service isn’t available due to some failure.
- Resilient
-
The system is resilient against failure of any one component, such as server crashes, hard drive failures, network partitions, etc. Leveraging replication prevents data loss and enables a service to keep going using the remaining instances. Leveraging isolation prevents cascading failures.
- Elastic
-
You can expect the load to vary considerably over the lifetime of a service. It’s essential to implement dynamic, automatic scalability, both up and down, based on load.
- Message driven
-
While fast data architectures are obviously focused on data, here we mean that all services respond to directed commands and queries. Furthermore, they use messages to send commands and queries to other services as well.
Batch-mode and interactive systems have traditionally had less stringent requirements for these qualities. Fast data architectures are just like other online systems where downtime and data ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access