September 2017
Intermediate to advanced
360 pages
9h 43m
English
The batch layer is the location of data-management systems storing historical data in our architecture. Incoming data streams from the speed layer commonly land in a NoSQL database. Initially, a variety of NoSQL key-value pair databases were commonly used and scaled through a technique called sharding, the horizontal partitioning of a database across multiple database servers. Some organizations use this type of NoSQL database in PoCs for rapid deployment or as pre-processing engines in production solutions.
In the past few years, Hadoop clusters have gained in popularity as the landing spot and serve as a data lake in the Lambda architecture you saw in the diagram. They can support the large data volumes of historical data ...