O'Reilly logo

Hadoop Operations and Cluster Management Cookbook by Shumin Guo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Defining a Big Data problem

Generally, the definition of Big Data is data in large sizes that go beyond the ability of commonly used software tools to collect, manage, and process within a tolerable elapsed time. More formally, the definition of Big Data should go beyond the size of the data to include other properties. In this recipe, we will outline the properties that define Big Data in a formal way.

Getting ready

Ideally, data has the following three important properties: volume, velocity, and variety. In this book, we treat the value property of Big Data as the fourth important property. And, the value property also explains the reason why the Big Data problem exists.

How to do it…

Defining a Big Data problem involves the following steps:

  1. Estimate ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required