CockroachDB: The Definitive Guide

Book description

Get the lowdown on CockroachDB, the distributed SQL database built to handle the demands of today's data-driven cloud applications. In this hands-on guide, software developers, architects, and DevOps/SRE teams will learn how to use CockroachDB to create applications that scale elastically and provide seamless delivery for end users while remaining indestructible. Teams will also learn how to migrate existing applications to CockroachDB's performant, cloud native data architecture.

If you're familiar with distributed systems, you'll quickly discover the benefits of strong data correctness and consistency guarantees as well as optimizations for delivering ultra low latencies to globally distributed end users.

You'll learn how to:

  • Design and build applications for distributed infrastructure, including data modeling and schema design
  • Migrate data into CockroachDB
  • Read and write data and run ACID transactions across distributed infrastructure
  • Plan a CockroachDB deployment for resiliency across single region and multi-region clusters
  • Secure, monitor, and optimize your CockroachDB deployment

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Why Cockroach?
    2. Building CockroachDB
    3. Next Steps
    4. Why We Wrote This Book
    5. Who This Book Is For
    6. How This Book Is Organized
    7. Conventions Used in This Book
    8. Using Code Examples
    9. O’Reilly Online Learning
    10. How to Contact Us
    11. Acknowledgments
  2. I. Introduction to CockroachDB
  3. 1. Introduction to CockroachDB
    1. A Brief History of Databases
      1. Pre-Relational Databases
      2. The Relational Model
      3. Implementing the Relational Model
      4. Transactions
      5. The SQL Language
      6. The RDBMS Hegemony
      7. Enter the Internet
      8. The NoSQL Movement
      9. The Emergence of Distributed SQL
    2. The Advent of CockroachDB
      1. CockroachDB Design Goals
      2. CockroachDB Releases
    3. CockroachDB in Action
      1. CockroachDB at DevSisters
      2. CockroachDB at DoorDash
      3. CockroachDB at Bose
    4. Summary
  4. 2. CockroachDB Architecture
    1. The CockroachDB Cluster Architecture
      1. Ranges and Replicas
    2. The CockroachDB Software Stack
    3. The CockroachDB SQL Layer
    4. From SQL to Key-Values
      1. Tables as Represented in the KV Store
      2. Column Families
      3. Indexes in the KV Store
      4. Inverted Indexes
      5. The STORING Clause
      6. Table Definitions and Schema Changes
    5. The CockroachDB Transaction Layer
      1. MVCC Principles
      2. Transaction Workflow
      3. Write Intents
      4. Parallel Commits
      5. Transaction Cleanup
      6. Overview of Transaction Flow
      7. Read/Write Conflicts
      8. Clock Synchronization and Clock Skew
    6. The CockroachDB Distribution Layer
      1. Meta Ranges
      2. Gossip
      3. Leaseholders
      4. Range Splits
      5. Multiregion Distribution
    7. The CockroachDB Replication Layer
      1. Raft
      2. Raft and Leaseholders
      3. Closed Timestamps and Follower Reads
    8. The CockroachDB Storage Layer
      1. Log-Structured Merge Trees
      2. SSTables and Bloom Filters
      3. Deletes and Updates
      4. MultiVersion Concurrency Control
      5. The Block Cache
    9. Summary
  5. 3. Getting Started
    1. Installation
      1. Installing CockroachDB Software
      2. Creating a CockroachDB Serverless Instance
      3. Starting a Local Single-Node Server
      4. Starting Up CockroachDB in a Docker Container
      5. Starting Up a Secure Server
      6. Shutting Down the Server
      7. Remote Connection
      8. Creating a Kubernetes Cluster
    2. Using a GUI Client
    3. Exploring CockroachDB
      1. Adding Some Data
      2. Databases and Tables
      3. Issuing SQL
      4. The DB Console
    4. Working with Programming Languages
      1. Connecting to CockroachDB from Node.js
      2. Connecting to CockroachDB from Java
      3. Connecting to CockroachDB from Python
      4. Connecting to CockroachDB from Go
    5. Summary
  6. 4. CockroachDB SQL
    1. SQL Language Compatibility
    2. Querying Data with SELECT
      1. The SELECT List
      2. The FROM Clause
      3. JOINS
      4. Anti-Joins
      5. Cross Joins
      6. Set Operations
      7. Group Operations
      8. Subqueries
      9. Correlated Subquery
      10. Lateral Subquery
      11. The WHERE Clause
      12. Common Table Expressions
      13. ORDER BY
      14. Window Functions
      15. Other SELECT Clauses
      16. CockroachDB Arrays
      17. Working with JSON
      18. Summary of SELECT
    3. Creating Tables and Indexes
      1. Column Definitions
      2. Computed Columns
      3. Data Types
      4. Primary Keys
      5. Constraints
      6. Indexes
      7. CREATE TABLE AS SELECT
      8. Altering Tables
      9. Dropping Tables
      10. Views
    4. Inserting Data
    5. UPDATE
    6. UPSERT
    7. DELETE
    8. TRUNCATE
    9. IMPORT/IMPORT INTO
    10. Transactional Statements
      1. BEGIN Transaction
      2. SAVEPOINT
      3. COMMIT
      4. ROLLBACK
      5. SELECT FOR UPDATE
      6. AS OF SYSTEM TIME
    11. Other Data Definition Language Targets
    12. Administrative Commands
    13. The Information Schema
    14. Summary
  7. II. Developing Applications with CockroachDB
  8. 5. CockroachDB Schema Design
    1. Logical Data Modeling
      1. Normalization
      2. Don’t Go Too Far
      3. Primary Key Choices
      4. Special-Purpose Designs
    2. Physical Design
      1. Entities to Tables
      2. Attributes to Columns
      3. Primary Key Design
      4. Foreign Key Constraints
    3. Denormalization
      1. Replicating Columns to Avoid Joins
      2. Summary Tables
      3. Vertical Partitioning
      4. Horizontal Partitioning
      5. Repeating Groups
    4. JSON Document Models
      1. JSON Document Antipatterns
      2. Indexing JSON Attributes
      3. Using JSON or Arrays to Avoid Joins
    5. Indexes
      1. Index Selectivity
      2. Index Break-Even Point
      3. Index Overhead
      4. Composite Indexes
      5. Covering Indexes
      6. Composite and Covering Index Performance
      7. Guidelines for Composite Indexes
      8. Indexes and Null Values
      9. Inverted Indexes
      10. Partial Indexes
      11. Sort-Optimizing Indexes
      12. Expression Indexes
      13. Spatial Indexes
      14. Hash-Sharded Indexes
      15. Measuring Index Effectiveness
    6. Summary
  9. 6. Application Design and Implementation
    1. CockroachDB Programming
      1. Performing CRUD Operations
      2. Connection Pools
      3. Prepared and Parameterized Statements
      4. Batch Inserts
      5. Pagination of Results
      6. Projections
      7. Client-Side Caching
    2. Managing Transactions
      1. Transaction Retry Errors
      2. Implementing Transaction Retries
      3. Automatic Transaction Retries
      4. Avoiding Transaction Retry Errors with FOR UPDATE
      5. Reducing Contention by Eliminating Hot Rows
      6. Reducing Transaction Elapsed Time
      7. Reordering Statements
      8. Time Travel Queries
      9. Ambiguous Transactions Errors
      10. Deadlocks
      11. Transaction Priorities
    3. Working with ORM Frameworks
    4. Summary
  10. 7. Application Migration and Integration
    1. Loading Data
      1. File Locations
      2. Importing Files
      3. Importing from Cloud Storage
      4. Import Performance
    2. Migrating from Another Database
      1. Extracting and Converting DDL
      2. General Considerations When Converting DDL
      3. Exporting Data
      4. Loading Data Into CockroachDB
      5. Directly Importing PostgreSQL or MySQL Dumps
      6. Synchronizing and Switching Over
      7. Updating Application Code
    3. Exporting CockroachDB Data
    4. Change Data Capture
      1. Core Change Data Capture
      2. Using the Changefeed Programmatically
      3. Enterprise Change Data Capture
      4. Change Data Capture to Kafka
      5. Change Data Capture to Snowflake
    5. Summary
  11. 8. SQL Tuning
    1. Finding Slow SQL
    2. Explaining and Tracing SQL
      1. EXPLAIN ANALYZE
      2. EXPLAIN Options
      3. EXPLAIN DEBUG
    3. Changing SQL Execution
      1. Optimizing Table Lookups
      2. Optimizing Joins
      3. Join Methods
      4. Optimizing Sorting and Aggregation
      5. Optimizing DML
    4. Optimizing the Optimizer
      1. Optimizer Statistics
      2. Viewing Statistics
      3. Automatic Statistics
      4. Manually Collecting Statistics
    5. Summary
  12. III. Deploying and Administering CockroachDB
  13. 9. Planning a Deployment
    1. Know Your Requirements
    2. Comparison of Deployment Options
    3. Serverless Deployments
    4. Single-Region Dedicated Deployments
      1. Common Planning Tasks—Dedicated Deployments
      2. Benchmarking and Capacity Planning
      3. CockroachDB Cloud Deployments
      4. Self-Hosted on a Cloud Platform
      5. Self-Hosted “Bare-Metal” On-Premise
      6. Other Self-Hosted Considerations
      7. Self-Hosted Kubernetes
    5. Configuring for Self-Hosted High Availability
      1. Disk Failure
      2. Node Failures
      3. Network Failure
      4. Zone and Region Topologies
    6. Summary
  14. 10. Single-Region Deployment
    1. Deploying On-Premise or On-Cloud
      1. Firewall Configuration
      2. Operating System Configuration
      3. Clock Synchronization On-Premise
      4. Clock Synchronization on Cloud Platforms
      5. Creating Certificates
      6. Configuring the Nodes
      7. Creating a Ballast File
      8. Initializing the Cluster
      9. Creating the First User
      10. Installing a Load Balancer (On-Premise)
      11. Cloud Load Balancers
      12. Configuring Regions and Zones
    2. Deploying on Kubernetes
      1. Initializing the Operator
      2. Initializing the Cluster
      3. Creating a Client Pod
      4. Load Balancing
      5. Other Kubernetes Tasks
    3. Summary
  15. 11. Multiregion Deployment
    1. Multiregion Concepts
      1. Regions and Zones
      2. Survival Goals
      3. Locality Rules
      4. Planning Your Mutliregion Deployment
    2. Deploying in Multiregion
      1. Converting to a Multiregion Database
      2. Configuring Regional by Row
      3. Setting Regional Survival Goal
      4. Placement Restricted Databases
    3. Summary
  16. 12. Backup and Disaster Recovery
    1. Backups
      1. The BACKUP Command
      2. Backup Destinations
      3. Full Backup
      4. Table- and Database-Level Backups
      5. Incremental Backups
      6. AS OF SYSTEM TIME Backup
      7. WITH REVISION HISTORY
      8. SHOW BACKUP
      9. Managing Backup Jobs
      10. Scheduling Backups
      11. Locality-Aware Backups
    2. Restoring Data
    3. Exporting Data
    4. Disaster Recovery Best Practices
      1. Backup Scheduling and Configuration
      2. Recovering from Human Errors
    5. Summary
  17. 13. Security
    1. Firewall Configuration
      1. IP Allowlist with CockroachDB Dedicated
      2. VPC Peering and PrivateLink with CockroachDB Dedicated
      3. Native Linux Firewall
      4. Configuring a Firewall in GCP
      5. Configuring a Firewall in AWS
      6. Configuring Ports for Microsoft Azure
    2. Encryption and Server Certificates
    3. Encryption at Rest
    4. Authentication Mechanisms
      1. Standard Authentication
      2. Advanced Authentication
    5. Authorization
      1. Managing Users
      2. Managing Privileges
      3. Fine-Grained Access Control with Views
    6. Logging and Auditing
    7. Security Best Practices
    8. Summary
  18. 14. Administration and Troubleshooting
    1. Monitoring
      1. CockroachDB Dedicated Alerts
      2. CockroachDB Serverless Alerts
      3. Availability Monitoring
      4. The Cluster API
      5. Monitoring and Alerting with Prometheus
      6. Monitoring and Alerting with Datadog
    2. Log Configuration
      1. Log Channels
      2. Log Format
      3. Filter Levels
      4. Log Destinations
      5. Logging to Fluentd
      6. Redaction
      7. Logs in Cloud Deployments
    3. Cluster Management
      1. Upgrading the Cluster Version
      2. Adding Nodes to a Cluster
      3. Decommissioning Nodes
    4. Troubleshooting
      1. Clock Synchronization Errors
      2. Node Liveness
      3. Networking Issues
      4. Loss of Client Connectivity
      5. Running Out of Disk Space
      6. Working with CockroachDB Support Resources
    5. Summary
  19. 15. Cluster Optimization
    1. Tuning Versus Firefighting
    2. Workload Optimization
      1. Detecting Problem Workloads
      2. Review of Workload Optimization Strategies
      3. Ad Hoc or Analytic Queries
    3. Cluster Balance
      1. Causes of Imbalance
      2. Hot Ranges
      3. Load Balancing
      4. Changes in Cluster Topology
      5. Admission Control
    4. Network
    5. Memory Optimization
    6. Key-Value Cache
      1. max-sql-memory
      2. Host Memory
    7. Disk I/O
    8. Scaling Out
    9. Summary
  20. Index
  21. About the Authors

Product information

  • Title: CockroachDB: The Definitive Guide
  • Author(s): Guy Harrison, Jesse Seldess, Ben Darnell
  • Release date: April 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098100247