Skip to Content
Data Lake for Enterprises
book

Data Lake for Enterprises

by Vivek Mishra, Tomcy John, Pankaj Misra
May 2017
Beginner to intermediate
596 pages
15h 2m
English
Packt Publishing
Content preview from Data Lake for Enterprises

When not to use Hadoop

Not all use cases require Hadoop, and when used in a use case that doesn't require Hadoop, it can be a maintenance havoc.

Hadoop should not be used if you need the following things:

  • To do graph-based data processing. You might have to bring another Hadoop ecosystem product (say, Apache Tez) to do this.
  • To process real-time data processing. However, using many products in Hadoop ecosystem, this can also be done, but it has to be analysed and then decided. Apache Flink or Spark on top of HDFS can be an option that can be considered.
  • To process data stored in relational databases. Using Hive over HDFS can be an option though which could be considered.
  • Access to shared state for processing data. Hadoop works by splitting ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

The Enterprise Big Data Lake

The Enterprise Big Data Lake

Alex Gorelik
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King
Data Lakes

Data Lakes

Anne Laurent, Dominique Laurent, Cédrine Madera

Publisher Resources

ISBN: 9781787281349Supplemental Content