Chapter 5. The Evolving, Maturing Marketplace of Big Data Components

As the last chapter looked at data from the data scientist’s point of view, this chapter explores data science from the other side: the hardware and software that store, sort, and operate on the 1’s and 0’s. Andy Oram’s exploration of Flash and its impact on databases leads off: the benefits to be reaped from solid-state memory are much greater when baked in, rather than simply swapping in flash drives for spinning magnetic media. Following sections tackle Hadoop 2.0 (a notable release this past year), the growth of Spark, and the provocative proposition that “the data center needs an operating system." 

How Flash changes the design of database storage engines

High-performing memory throws many traditional decisions overboard

by Andy Oram

Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level—where USB sticks have effectively replaced CDs for transporting files—and the server level, where it offers a price/performance ratio radically different from both RAM and disk drives. But databases have just started to catch up during the past few years. Most still depend on internal data structures and storage management fine-tuned for spinning disks.

Citing price and performance, one author advised a wide range of database vendors to move to Flash. Certainly, a database administrator can speed up old databases just by swapping out disk drives and inserting ...

Get Big Data Now: 2014 Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.