Chapter 5. Big Data Storage Concepts

Image

Clusters

File Systems and Distributed File Systems

NoSQL

Sharding

Replication

Sharding and Replication

CAP Theorem

ACID

BASE

Data acquired from external sources is often not in a format or structure that can be directly processed. To overcome these incompatibilities and prepare data for storage and processing, data wrangling is necessary. Data wrangling includes steps to filter, cleanse and otherwise prepare the data for downstream analysis. From a storage perspective, a copy of the data is first stored in its acquired format, and, after wrangling, the prepared data needs to be stored again. Typically, storage ...

Get Big Data Fundamentals: Concepts, Drivers & Techniques now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.