Skip to Content
Big Data
book

Big Data

by James Warren, Nathan Marz
April 2015
Beginner to intermediate
328 pages
11h 1m
English
Manning Publications
Content preview from Big Data

Chapter 6. Batch layer

This chapter covers

  • Computing functions on the batch layer
  • Splitting a query into precomputed and on-the-fly components
  • Recomputation versus incremental algorithms
  • The meaning of scalability
  • The MapReduce paradigm
  • A higher-level way of thinking about MapReduce

The goal of a data system is to answer arbitrary questions about your data. Any question you could ask of your dataset can be implemented as a function that takes all of your data as input. Ideally, you could run these functions on the fly whenever you query your dataset. Unfortunately, a function that uses your entire dataset as input will take a very long time to run. You need a different strategy if you want your queries answered quickly.

In the Lambda ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Big Data For Dummies

Big Data For Dummies

Judith Hurwitz, Alan Nugent, Dr. Fern Halper, Marcia Kaufman

Publisher Resources

ISBN: 9781617290343Publisher SupportOtherPublisher WebsiteSupplemental ContentPurchase Link