From a standalone machine to a bunch of nodes

Handling big data is not just a matter of size; it's actually a multifaceted phenomenon. In fact, according to the 3V model (volume, velocity and variety), systems operating on big data can be classified using three (orthogonal) criteria:

  • The first criterion to consider is the velocity that the system achieves to process the data. Although a few years ago, speed was used to indicate how quickly a system was able to process a batch, nowadays, velocity indicates whether a system can provide real-time outputs on streaming data.
  • The second criterion is volume; that is, how much information is available to be processed. It can be expressed in the number of rows or features, or just a bare count of ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.