Skip to Content
Hadoop Application Architectures
book

Hadoop Application Architectures

by Mark Grover, Ted Malaska, Jonathan Seidman, Gwen Shapira
July 2015
Intermediate to advanced content levelIntermediate to advanced
250 pages
10h 47m
English
O'Reilly Media, Inc.
Content preview from Hadoop Application Architectures

Chapter 1. Data Modeling in Hadoop

At its core, Hadoop is a distributed data store that provides a platform for implementing powerful parallel processing frameworks. The reliability of this data store when it comes to storing massive volumes of data, coupled with its flexibility in running multiple processing frameworks makes it an ideal choice for your data hub. This characteristic of Hadoop means that you can store any type of data as is, without placing any constraints on how that data is processed.

A common term one hears in the context of Hadoop is Schema-on-Read. This simply refers to the fact that raw, unprocessed data can be loaded into Hadoop, with the structure imposed at processing time based on the requirements of the processing application.

This is different from Schema-on-Write, which is generally used with traditional data management systems. Such systems require the schema of the data store to be defined before the data can be loaded. This leads to lengthy cycles of analysis, data modeling, data transformation, loading, testing, and so on before data can be accessed. Furthermore, if a wrong decision is made or requirements change, this cycle must start again. When the application or structure of data is not as well understood, the agility provided by the Schema-on-Read pattern can provide invaluable insights on data not previously accessible.

Relational databases and data warehouses are often a good fit for well-understood and frequently accessed queries and reports ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Hadoop in Practice, Second Edition

Hadoop in Practice, Second Edition

Alex Holmes
Architecting HBase Applications

Architecting HBase Applications

Jean-Marc Spaggiari, Kevin O'Dell
Mastering Hadoop 3

Mastering Hadoop 3

Timothy Wong, Chanchal Singh, Manish Kumar
Apache Hadoop 3 Quick Start Guide

Apache Hadoop 3 Quick Start Guide

Hrishikesh Vijay Karambelkar

Publisher Resources

ISBN: 9781491910313Errata Page