Chapter 13

Parallelizing Operations

IN THIS CHAPTER

check Understanding why simply bigger, larger, and faster isn’t always the right solution

check Looking inside the storage and computational approaches of Internet companies

check Figuring out how using clusters of commodity hardware reduces costs

check Reducing complex algorithms into separable parallel operations by MapReduce

Managing immense amounts of data using streaming or sampling strategies has clear advantages (as discussed in Chapter 12) when you have to deal with massive data processing. Using streaming and sampling algorithms helps you obtain a result even when your computational power is limited (for instance, when using your own computer). However, some costs are associated with these approaches:

  • Streaming: Handles infinite amounts of data. Yet your algorithms perform at low speed because they process individual pieces of data and the stream speed rules the pace.
  • Sampling: Applies any algorithms on any machine. Yet the obtained result is imprecise because you have only a probability, not a certainty, of getting the right answer. Most often, ...

Get Algorithms For Dummies now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.