Appendix AFurther Systems and Patterns

Throughout the book, we have touched on many subjects. Some of the subjects would have been great to add but might not be appropriate with the flow of the book. Thus, I have moved these subjects to the appendix to give a rough idea of them. In this part, I would discuss Lambda architecture, Apache Cassandra, and Apache Beam.

A.1 Lambda Architecture

Lambda architecture is a deployment model where organizations complement batch processing with stream processing for real‐time big data problems. It has arisen due to troubles in serving data in real‐time (Marz, 2011). Ideally, a system wants to scan entire data to respond to a query. In practice, responding to a query gets tricky since there is just so much data to scan for some queries. The data volume can result in outrageous response times. Moreover, organizations choose availability over consistency. Most organizations would prefer services to be available. Choosing availability over inconsistency results in weaker consistency levels. A read after write might not return the expected response. Without read repairs, the data can stay corrupted. Human error can also lead to problems. Updates to systems pose corruption threats that cannot be recoverable (Figure A.1).

Schematic illustration of lambda architecture.

Figure A.1 Lambda architecture.

To address these problems, the Lambda architecture uses an immutable stream of data and ...

Get Designing Big Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.