Chapter 4
Understanding the Storage Architecture
WHAT’S IN THIS CHAPTER?
- Introducing column-oriented database storage scheme
- Reviewing document store internals
- Peeking into key/value cache and key/value stores on disk
- Working with schemas that support eventual consistency of column-oriented data sets
Column-oriented databases are among the most popular types of non-relational databases. Made famous by the venerable Google engineering efforts and popularized by the growth of social networking giants like Facebook, LinkedIn, and Twitter, they could very rightly be called the flag bearers of the NoSQL revolution. Although column databases have existed in many forms in academia for the past few years, they were introduced to the developer community with the publication of the following Google research papers:
- The Google File System — http://labs.google.com/papers/gfs.html (October 2003)
- MapReduce: Simplified Data Processing on Large Clusters — http://labs.google.com/papers/mapreduce.html (December 2004)
- Bigtable: A Distributed Storage System for Structured Data — http://labs.google.com/papers/bigtable.html (November 2006)
These publications provided a view into the world of Google’s search engine success and shed light on the mechanics of large-scale and big data efforts like Google Earth, Google Analytics, and Google Maps. It was established beyond a doubt that a cluster of inexpensive hardware can be leveraged to hold huge amounts data, way more than a single machine can hold, ...