O'Reilly logo

Big Data by Hai Jiang, Laurence T. Yang, Alfredo Cuzzocrea, Kuan-Ching Li

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2

Scalability and Cost Evaluation of Incremental Data Processing Using Amazon’s Hadoop Service

Xing Wu, Yan Liu, and Ian Gorton

Abstract

Based on the MapReduce model and Hadoop Distributed File System (HDFS), Hadoop enables the distributed processing of large data sets across clusters with scalability and fault tolerance. Many data-intensive applications involve continuous and incremental updates of data. Understanding the scalability and cost of a Hadoop platform to handle small and independent updates of data sets sheds light on the design of scalable and cost-effective data-intensive applications. In this chapter, we introduce a motivating movie recommendation application implemented in the MapReduce model and deployed on Amazon Elastic ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required