Chapter 5. Big Data Storage Concepts

Image

Clusters

File Systems and Distributed File Systems

NoSQL

Sharding

Replication

Sharding and Replication

CAP Theorem

ACID

BASE

Data acquired from external sources is often not in a format or structure that can be directly processed. To overcome these incompatibilities and prepare data for storage and processing, data wrangling is necessary. Data wrangling includes steps to filter, cleanse and otherwise prepare the data for downstream analysis. From a storage perspective, a copy of the data is first stored in its acquired format, and, after wrangling, the prepared data needs to be stored again. Typically, storage ...

Get Big Data Fundamentals: Concepts, Drivers & Techniques now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.