Chapter 13

Parallelizing Operations

IN THIS CHAPTER

check Understanding why simply bigger, larger, and faster isn’t always the right solution

check Looking inside the storage and computational approaches of Internet companies

check Figuring out how using clusters of commodity hardware reduces costs

check Reducing complex algorithms into separable parallel operations by MapReduce

Managing immense amounts of data using streaming or sampling strategies has clear advantages (as discussed in Chapter 12) when you have to deal with massive data processing. Using streaming and sampling algorithms helps you obtain a result even when your computational power is limited (for instance, when using your own computer). However, some costs are associated with these approaches:

  • Streaming: Handles infinite amounts of data. Yet your algorithms perform at low speed because they process individual pieces of data and the stream speed rules the pace.
  • Sampling: Applies any algorithms on any machine. Yet the obtained result is imprecise because you have only a probability, not a certainty, of getting the right answer. Most often, ...

Get Algorithms For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.