IN THIS CHAPTER
Understanding how computers can store information in order to save space
Creating efficient and smart encodings
Leveraging statistics and building Huffman trees
Compressing and decompressing on the fly using the Lempel-Ziv-Welch (LZW) algorithm
The last decade has seen the world flooded by data. In fact, data is the new oil, and specialists of all sorts hope to extract new knowledge and richness from it. As a result, you find data piled everywhere and often archived as soon as it arrives. The sometimes careless storage of data comes from an increased capacity to store information; it has become cheap to buy larger drives to store everything, useful or not. Chapters 12 and 13 discuss the drivers behind this data deluge, how to deal with massive data streams, methods used to distribute it over clusters of connected computers, and techniques you can use to process data rapidly and efficiently.
Data hasn’t always been readily available, however. In previous decades, storing data required large investments in expensive mass-storage devices (hard ...