O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

NoSQL for Mere Mortals®

Book Description

NoSQL was developed to overcome the limitations of relational databases in the largest Web applications at companies such as Google, Yahoo and Facebook. As it is applied more widely, developers are finding that it can simplify scalability while requiring far less coding and management overhead. However, NoSQL requires fundamentally different approaches to database design and modeling, and many conventional relational techniques lead to suboptimal results.

NoSQL for Mere Mortals is an easy, practical guide to succeeding with NoSQL in your environment. Following the classic, best-selling format pioneered in SQL Queries for Mere Mortals, enterprise database expert Dan Sullivan guides you step-by-step through choosing technologies, designing high-performance databases, and planning for long-term maintenance.

Sullivan introduces each type of NoSQL database, shows how to install and manage them, and demonstrates how to leverage their features while avoiding common mistakes that lead to poor performance and unmet requirements. He uses four popular NoSQL databases as reference models: MongoDB, a document database; Cassandra, a column family data store; Redis, a key-value database; and Neo4j, a graph database. You'll find explanations of each database's structure and capabilities, practical guidelines for choosing amongst them, and expert guidance on designing databases with them.

Packed with examples, NoSQL for Mere Mortals is today's best way to master NoSQL—whether you're a DBA, developer, user, or student.

Table of Contents

  1. About This eBook
  2. Title Page
  3. Copyright Page
  4. Dedication Page
  5. About the Author
  6. Contents
  7. Preface
  8. Acknowledgments
  9. Introduction
    1. Who Should Read This Book?
    2. The Purpose of This Book
    3. How to Read This Book
    4. How This Book Is Organized
      1. Part I: Introduction
      2. Part II: Key-Value Databases
      3. Part III: Document Databases
      4. Part IV: Column Family Databases
      5. Part V: Graph Databases
      6. Part VI: Choosing a Database for Your Application
      7. Part VII: Appendices
  10. Part I: Introduction
    1. 1. Different Databases for Different Requirements
      1. Relational Database Design
        1. E-commerce Application
      2. Early Database Management Systems
        1. Flat File Data Management Systems
        2. Hierarchical Data Model Systems
        3. Network Data Management Systems
        4. Summary of Early Database Management Systems
      3. The Relational Database Revolution
        1. Relational Database Management Systems
      4. Motivations for Not Just/No SQL (NoSQL) Databases
        1. Scalability
        2. Cost
        3. Flexibility
        4. Availability
      5. Summary
      6. Case Study
      7. Review Questions
      8. References
      9. Bibliography
    2. 2. Variety of NoSQL Databases
      1. Data Management with Distributed Databases
        1. Store Data Persistently
        2. Maintain Data Consistency
        3. Ensure Data Availability
        4. Balancing Response Times, Consistency, and Durability
        5. Consistency, Availability, and Partitioning: The CAP Theorem
      2. ACID and BASE
        1. ACID: Atomicity, Consistency, Isolation, and Durability
        2. BASE: Basically Available, Soft State, Eventually Consistent
        3. Types of Eventual Consistency
      3. Four Types of NoSQL Databases
        1. Key-Value Pair Databases
        2. Document Databases
        3. Column Family Databases
        4. Graph Databases
      4. Summary
      5. Review Questions
      6. References
      7. Bibliography
  11. Part II: Key-Value Databases
    1. 3. Introduction to Key-Value Databases
      1. From Arrays to Key-Value Databases
        1. Arrays: Key Value Stores with Training Wheels
        2. Associative Arrays: Taking Off the Training Wheels
        3. Caches: Adding Gears to the Bike
        4. In-Memory and On-Disk Key-Value Database: From Bikes to Motorized Vehicles
      2. Essential Features of Key-Value Databases
        1. Simplicity: Who Needs Complicated Data Models Anyway?
        2. Speed: There Is No Such Thing as Too Fast
        3. Scalability: Keeping Up with the Rush
      3. Keys: More Than Meaningless Identifiers
        1. How to Construct a Key
        2. Using Keys to Locate Values
      4. Values: Storing Just About Any Data You Want
        1. Values Do Not Require Strong Typing
        2. Limitations on Searching for Values
      5. Summary
      6. Review Questions
      7. References
      8. Bibliography
    2. 4. Key-Value Database Terminology
      1. Key-Value Database Data Modeling Terms
        1. Key
        2. Value
        3. Namespace
        4. Partition
        5. Partition Key
        6. Schemaless
      2. Key-Value Architecture Terms
        1. Cluster
        2. Ring
        3. Replication
      3. Key-Value Implementation Terms
        1. Hash Function
        2. Collision
        3. Compression
      4. Summary
      5. Review Questions
      6. References
    3. 5. Designing for Key-Value Databases
      1. Key Design and Partitioning
        1. Keys Should Follow a Naming Convention
        2. Well-Designed Keys Save Code
        3. Dealing with Ranges of Values
        4. Keys Must Take into Account Implementation Limitations
        5. How Keys Are Used in Partitioning
      2. Designing Structured Values
        1. Structured Data Types Help Reduce Latency
        2. Large Values Can Lead to Inefficient Read and Write Operations
      3. Limitations of Key-Value Databases
        1. Look Up Values by Key Only
        2. Key-Value Databases Do Not Support Range Queries
        3. No Standard Query Language Comparable to SQL for Relational Databases
      4. Design Patterns for Key-Value Databases
        1. Time to Live (TTL) Keys
        2. Emulating Tables
        3. Aggregates
        4. Atomic Aggregates
        5. Enumerable Keys
        6. Indexes
      5. Summary
      6. Case Study: Key-Value Databases for Mobile Application Configuration
      7. Review Questions
      8. References
  12. Part III: Document Databases
    1. 6. Introduction to Document Databases
      1. What Is a Document?
        1. Documents Are Not So Simple After All
        2. Documents and Key-Value Pairs
        3. Managing Multiple Documents in Collections
      2. Avoid Explicit Schema Definitions
      3. Basic Operations on Document Databases
        1. Inserting Documents into a Collection
        2. Deleting Documents from a Collection
        3. Updating Documents in a Collection
        4. Retrieving Documents from a Collection
      4. Summary
      5. Review Questions
      6. References
    2. 7. Document Database Terminology
      1. Document and Collection Terms
        1. Document
        2. Collection
        3. Embedded Document
        4. Schemaless
        5. Polymorphic Schema
      2. Types of Partitions
        1. Vertical Partitioning
        2. Horizontal Partitioning or Sharding
      3. Data Modeling and Query Processing
        1. Normalization
        2. Denormalization
        3. Query Processor
      4. Summary
      5. Review Questions
      6. References
    3. 8. Designing for Document Databases
      1. Normalization, Denormalization, and the Search for Proper Balance
        1. One-to-Many Relations
        2. Many-to-Many Relations
        3. The Need for Joins
        4. Executing Joins: The Heavy Lifting of Relational Databases
        5. What Would a Document Database Modeler Do?
      2. Planning for Mutable Documents
        1. Avoid Moving Oversized Documents
      3. The Goldilocks Zone of Indexes
        1. Read-Heavy Applications
        2. Write-Heavy Applications
      4. Modeling Common Relations
        1. One-to-Many Relations in Document Databases
        2. Many-to-Many Relations in Document Databases
        3. Modeling Hierarchies in Document Databases
      5. Summary
      6. Case Study: Customer Manifests
        1. Embed or Not Embed?
        2. Choosing Indexes
        3. Separate Collections by Type?
      7. Review Questions
      8. References
  13. Part IV: Column Family Databases
    1. 9. Introduction to Column Family Databases
      1. In the Beginning, There Was Google BigTable
        1. Utilizing Dynamic Control over Columns
        2. Indexing by Row, Column Name, and Time Stamp
        3. Controlling Location of Data
        4. Reading and Writing Atomic Rows
        5. Maintaining Rows in Sorted Order
      2. Differences and Similarities to Key-Value and Document Databases
        1. Column Family Database Features
        2. Column Family Database Similarities to and Differences from Document Databases
        3. Column Family Database Versus Relational Databases
      3. Architectures Used in Column Family Databases
        1. HBase Architecture: Variety of Nodes
        2. Cassandra Architecture: Peer-to-Peer
        3. Getting the Word Around: Gossip Protocol
        4. Thermodynamics and Distributed Database: Why We Need Anti-Entropy
        5. Hold This for Me: Hinted Handoff
      4. When to Use Column Family Databases
      5. Summary
      6. Review Questions
      7. References
    2. 10. Column Family Database Terminology
      1. Basic Components of Column Family Databases
        1. Keyspace
        2. Row Key
        3. Column
        4. Column Families
      2. Structures and Processes: Implementing Column Family Databases
        1. Internal Structures and Configuration Parameters of Column Family Databases
        2. Old Friends: Clusters and Partitions
        3. Taking a Look Under the Hood: More Column Family Database Components
      3. Processes and Protocols
        1. Replication
        2. Anti-Entropy
        3. Gossip Protocol
        4. Hinted Handoff
      4. Summary
      5. Review Questions
      6. References
    3. 11. Designing for Column Family Databases
      1. Guidelines for Designing Tables
        1. Denormalize Instead of Join
        2. Make Use of Valueless Columns
        3. Use Both Column Names and Column Values to Store Data
        4. Model an Entity with a Single Row
        5. Avoid Hotspotting in Row Keys
        6. Keep an Appropriate Number of Column Value Versions
        7. Avoid Complex Data Structures in Column Values
      2. Guidelines for Indexing
        1. When to Use Secondary Indexes Managed by the Column Family Database System
        2. When to Create and Manage Secondary Indexes Using Tables
      3. Tools for Working with Big Data
        1. Extracting, Transforming, and Loading Big Data
        2. Analyzing Big Data
        3. Tools for Monitoring Big Data
      4. Summary
      5. Case Study: Customer Data Analysis
        1. Understanding User Needs
      6. Review Questions
      7. References
  14. Part V: Graph Databases
    1. 12. Introduction to Graph Databases
      1. What Is a Graph?
      2. Graphs and Network Modeling
        1. Modeling Geographic Locations
        2. Modeling Infectious Diseases
        3. Modeling Abstract and Concrete Entities
        4. Modeling Social Media
      3. Advantages of Graph Databases
        1. Query Faster by Avoiding Joins
        2. Simplified Modeling
        3. Multiple Relations Between Entities
      4. Summary
      5. Review Questions
      6. References
    2. 13. Graph Database Terminology
      1. Elements of Graphs
        1. Vertex
        2. Edge
        3. Path
        4. Loop
      2. Operations on Graphs
        1. Union of Graphs
        2. Intersection of Graphs
        3. Graph Traversal
      3. Properties of Graphs and Nodes
        1. Isomorphism
        2. Order and Size
        3. Degree
        4. Closeness
        5. Betweenness
      4. Types of Graphs
        1. Undirected and Directed Graphs
        2. Flow Network
        3. Bipartite Graph
        4. Multigraph
        5. Weighted Graph
      5. Summary
      6. Review Questions
      7. References
    3. 14. Designing for Graph Databases
      1. Getting Started with Graph Design
        1. Designing a Social Network Graph Database
        2. Queries Drive Design (Again)
      2. Querying a Graph
        1. Cypher: Declarative Querying
        2. Gremlin: Query by Graph Traversal
      3. Tips and Traps of Graph Database Design
        1. Use Indexes to Improve Retrieval Time
        2. Use Appropriate Types of Edges
        3. Watch for Cycles When Traversing Graphs
        4. Consider the Scalability of Your Graph Database
      4. Summary
      5. Case Study: Optimizing Transportation Routes
        1. Understanding User Needs
        2. Designing a Graph Analysis Solution
      6. Review Questions
      7. References
  15. Part VI: Choosing a Database for Your Application
    1. 15. Guidelines for Selecting a Database
      1. Choosing a NoSQL Database
        1. Criteria for Selecting Key-Value Databases
        2. Use Cases and Criteria for Selecting Document Databases
        3. Use Cases and Criteria for Selecting Column Family Databases
        4. Use Cases and Criteria for Selecting Graph Databases
      2. Using NoSQL and Relational Databases Together
      3. Summary
      4. Review Questions
      5. References
  16. Part VII: Appendices
    1. A. Answers to Chapter Review Questions
      1. Chapter 1
      2. Chapter 2
      3. Chapter 3
      4. Chapter 4
      5. Chapter 5
      6. Chapter 6
      7. Chapter 7
      8. Chapter 8
      9. Chapter 9
      10. Chapter 10
      11. Chapter 11
      12. Chapter 12
      13. Chapter 13
      14. Chapter 14
      15. Chapter 15
    2. B. List of NoSQL Databases
  17. Glossary
  18. Index
  19. Code Snippets