IN THIS CHAPTER
Understanding why simply bigger, larger, and faster isn’t always the right solution
Looking inside the storage and computational approaches of Internet companies
Figuring out how using clusters of commodity hardware reduces costs
Reducing complex algorithms into separable parallel operations by MapReduce
Managing immense amounts of data using streaming or sampling strategies has clear advantages (as discussed in Chapter 12) when you have to deal with massive data processing. Using streaming and sampling algorithms helps you obtain a result even when your computational power is limited (for instance, when using your own computer). However, some costs are associated with these approaches:
- Streaming: Handles infinite amounts of data. Yet your algorithms perform at low speed because they process individual pieces of data and the stream speed rules the pace.
- Sampling: Applies any algorithms on any machine. Yet the obtained result is imprecise because you have only a probability, not a certainty, of getting the right answer. Most often, ...