ScyllaDB in Action

Book description

Build, maintain, and run databases that are easy to scale and quick to query—all with ScyllaDB.

ScyllaDB in Action is your guide to everything you need to know about ScyllaDB, from your very first queries to running it in a production environment. It starts you with the basics of creating, reading, and deleting data and expands your knowledge from there. You’ll soon have mastered everything you need to build, maintain, and run an effective and efficient database.

Inside ScyllaDB in Action you’ll learn how to:

  • Read, write, and delete data in ScyllaDB
  • Design database schemas for ScyllaDB
  • Write performant queries against ScyllaDB
  • Connect and query a ScyllaDB cluster from an application
  • Configure, monitor, and operate ScyllaDB in production

This book teaches you ScyllaDB the best way—through hands-on examples. Dive into the node-based architecture of ScyllaDB to understand how its distributed systems work, how you can troubleshoot problems, and how you can constantly improve performance.

About the Technology
ScyllaDB is a versatile NoSQL database that can move large volumes of data fast. Very, very, very fast. This drop-in replacement for Cassandra takes full advantage of modern multi-core hardware and scales to handle large real-time data workloads with incredibly low latency. It features built-in monitoring and management tools, and its efficient use of computing resources can save a lot of money on high-volume applications.

About the Book
ScyllaDB in Action demonstrates how to integrate ScyllaDB into data-intensive applications. You’ll work through a hands-on project step by step as you use ScyllaDB to store data and learn to configure, monitor, and safely operate a distributed database. Along the way, you’ll discover how ScyllaDB’s unique “shard per core” approach helps you deliver impressive performance in real-time systems.

What's Inside
  • Design schemas for ScyllaDB
  • Write performant queries
  • Get an instant speed boost over Cassandra


About the Reader
For backend and infrastructure engineers who know the basics of SQL.

About the Author
Bo Ingram is a staff software engineer at Discord working in database infrastructure. He has extensive experience working with ScyllaDB as an operator and developer.

The technical editor on this book was Piotr Wiktor Sarna.

Quotes
If you plan to run ScyllaDB at scale, read this book before going to production! Bo Ingram captures years of high scalability practices in a friendly and fun package.
- Dor Laor, Co-Founder and CEO, ScyllaDB

Working with a distributed database without a proper understanding of how it works is insane. This all-in-one guide to ScyllaDB is the path to avoiding insanity.
- Avi Kivity, Co-Founder and CTO, ScyllaDB

Bo combines deep tech knowledge with hard-won insights from the trenches, keeping it engaging with his signature humor. This book will save you a ton of headaches.
- Sahn Lam, Coauthor of System Design Interview Series

Table of contents

  1. copyright
  2. dedication
  3. contents
  4. preface
  5. acknowledgments
  6. about this book
    1. Who should read this book
    2. How this book is organized: A road map
    3. About the code
    4. liveBook discussion forum
  7. about the author
  8. about the cover illustration
  9. Part 1 Getting started
  10. 1 Introducing ScyllaDB
    1. 1.1 ScyllaDB, a different database
      1. 1.1.1 Hypothetical databases
      2. 1.1.2 Real-world databases
      3. 1.1.3 Unpacking the definition
    2. 1.2 ScyllaDB, a distributed database
      1. 1.2.1 Distributing data
      2. 1.2.2 ScyllaDB vs. relational databases
      3. 1.2.3 ScyllaDB vs. Cassandra
      4. 1.2.4 ScyllaDB vs. Amazon Aurora, Amazon DynamoDB, Google Cloud Spanner, and Google AlloyDB
      5. 1.2.5 ScyllaDB vs. document stores
      6. 1.2.6 ScyllaDB vs. distributed relational databases
      7. 1.2.7 When to prefer other databases
    3. 1.3 ScyllaDB, a practical database
      1. 1.3.1 Fault tolerance
      2. 1.3.2 Scalability
      3. 1.3.3 Production usage
    4. Summary
  11. 2 Touring ScyllaDB
    1. 2.1 Launching your first cluster
      1. 2.1.1 The first node
      2. 2.1.2 Your new friend, nodetool
      3. 2.1.3 Building the cluster
    2. 2.2 Creating your first table
      1. 2.2.1 Keyspaces and tables
      2. 2.2.2 Creating a schema
    3. 2.3 Running your first queries
      1. 2.3.1 Inserting data
      2. 2.3.2 Reading data
      3. 2.3.3 Updating data
      4. 2.3.4 Deleting data
    4. 2.4 Handling failures
      1. 2.4.1 Shutting down a node
      2. 2.4.2 Experimenting with consistency
    5. Summary
  12. Part 2 Query-first design
  13. 3 Data modeling in ScyllaDB
    1. 3.1 Application design before schema design
      1. 3.1.1 Your query-first design toolbox
      2. 3.1.2 The sample application requirements
      3. 3.1.3 Determining the queries
    2. 3.2 Identifying tables
      1. 3.2.1 Denormalization
      2. 3.2.2 Extracting tables
    3. 3.3 Distributing data efficiently on the hash ring
      1. 3.3.1 The hash ring
      2. 3.3.2 Making good partitions
    4. Summary
  14. 4 Data types in ScyllaDB
    1. 4.1 Preparing yourself
      1. 4.1.1 Data-type playground
      2. 4.1.2 Identifying the fields
    2. 4.2 The most common types: Text and numbers
      1. 4.2.1 Text
      2. 4.2.2 Numbers
    3. 4.3 Dates and times
      1. 4.3.1 Working with dates and times
      2. 4.3.2 Durations
      3. 4.3.3 When to use timestamps, dates, and times
    4. 4.4 IDs
      1. 4.4.1 UUIDs
      2. 4.4.2 Picking an ID type
    5. 4.5 Collections
      1. 4.5.1 Lists
      2. 4.5.2 Sets
      3. 4.5.3 Maps
      4. 4.5.4 User-defined types
      5. 4.5.5 Frozen collections
      6. 4.5.6 Storing images
    6. 4.6 A few other types to know
      1. 4.6.1 Blobs
      2. 4.6.2 IP addresses
      3. 4.6.3 Counters
    7. Summary
  15. 5 Tables in ScyllaDB
    1. 5.1 Completing your query-first design
      1. 5.1.1 Reviewing restaurant reviews
      2. 5.1.2 The final two questions
      3. 5.1.3 Bucketing
      4. 5.1.4 Finishing the design
    2. 5.2 Keyspace configuration
      1. 5.2.1 SimpleStrategy
      2. 5.2.2 NetworkTopologyStrategy
    3. 5.3 Creating your application’s tables
      1. 5.3.1 Articles
      2. 5.3.2 Article summaries
      3. 5.3.3 Authors
    4. Summary
  16. Part 3 Querying the database
  17. 6 Writing data to ScyllaDB
    1. 6.1 Inserting and updating data
      1. 6.1.1 Writing data
      2. 6.1.2 Concurrent operations
    2. 6.2 Deleting data
      1. 6.2.1 Executing deletes
      2. 6.2.2 Tombstones
      3. 6.2.3 Compaction
      4. 6.2.4 Deleting multiple rows
    3. 6.3 Time to live
      1. 6.3.1 Expiring temporary data
      2. 6.3.2 The difference between inserts and updates
      3. 6.3.3 Table TTLs
    4. 6.4 Batching data
      1. 6.4.1 Executing a batch
      2. 6.4.2 Logged vs. unlogged batches
    5. 6.5 Lightweight transactions
      1. 6.5.1 The power of IF
      2. 6.5.2 Not lightweight
    6. Summary
  18. 7 Reading data from ScyllaDB
    1. 7.1 Selecting
      1. 7.1.1 The basics
      2. 7.1.2 Limiting results
      3. 7.1.3 Paginating queries
      4. 7.1.4 Ordering results
      5. 7.1.5 Counting
      6. 7.1.6 Grouping rows in your queries
    2. 7.2 Read performance
      1. 7.2.1 What does a read do?
      2. 7.2.2 Avoiding slow queries
      3. 7.2.3 Allowing filtering
    3. 7.3 Materialized views
      1. 7.3.1 Constructing a view
      2. 7.3.2 Easier denormalization
      3. 7.3.3 Indexes
    4. Summary
  19. Part 4 Operating the database
  20. 8 ScyllaDB’s architecture
    1. 8.1 Scylla’s design goals
    2. 8.2 Distributed systems in Scylla
      1. 8.2.1 Revisiting the hash ring
      2. 8.2.2 Consistency
      3. 8.2.3 Communication protocols
      4. 8.2.4 Gossip
      5. 8.2.5 Consensus
    3. 8.3 On-node architecture
      1. 8.3.1 The memtable and the commit log
      2. 8.3.2 Shards
      3. 8.3.3 SSTables
      4. 8.3.4 Tablets: The future
    4. 8.4 Cluster operations
      1. 8.4.1 Compaction
      2. 8.4.2 Repairs
      3. 8.4.3 Hinted handoff
    5. Summary
  21. 9 Running ScyllaDB in production
    1. 9.1 Building a production cluster
      1. 9.1.1 The config file
      2. 9.1.2 Seeds
      3. 9.1.3 Addresses
      4. 9.1.4 Authentication
      5. 9.1.5 Authorization
      6. 9.1.6 Snitches
    2. 9.2 Building your cluster
      1. 9.2.1 Designing your cluster topology
      2. 9.2.2 Computing your nodes
      3. 9.2.3 Testing the cluster
    3. 9.3 Managing the cluster
      1. 9.3.1 Repairing a node
      2. 9.3.2 Backing up your cluster
      3. 9.3.3 Compacting a node
      4. 9.3.4 Troubleshooting tables
    4. 9.4 Managing the node lifecycle
      1. 9.4.1 Stopping and starting a node
      2. 9.4.2 Replacing a node
      3. 9.4.3 Adding a node
      4. 9.4.4 Removing a node
    5. Summary
  22. 10 Application development with ScyllaDB
    1. 10.1 Your application
      1. 10.1.1 Python
      2. 10.1.2 Virtual environments
      3. 10.1.3 Flask
    2. 10.2 Querying Scylla
      1. 10.2.1 A new Scylla cluster
      2. 10.2.2 Connecting to the cluster
      3. 10.2.3 Your first application query
    3. 10.3 Reading data
      1. 10.3.1 Prepared statements
      2. 10.3.2 Reading articles
    4. 10.4 Writing data
      1. 10.4.1 The necessary data
      2. 10.4.2 Laying the write groundwork
      3. 10.4.3 Batch-writing articles
      4. 10.4.4 Working with user-defined types
    5. 10.5 Configuring the driver
      1. 10.5.1 Consistency
      2. 10.5.2 Load balancing
      3. 10.5.3 Retrying queries
    6. 10.6 Authentication and authorization
      1. 10.6.1 Enabling authentication and authorization
      2. 10.6.2 Implementing role-based access control
      3. 10.6.3 Authenticating via the app
    7. Summary
  23. 11 Monitoring ScyllaDB
    1. 11.1 The monitoring stack
      1. 11.1.1 Deploying monitoring
      2. 11.1.2 Prometheus
      3. 11.1.3 Grafana
      4. 11.1.4 Alertmanager
      5. 11.1.5 Other monitoring needs
    2. 11.2 Causing stress with cassandra-stress
      1. 11.2.1 Setting up cassandra-stress
      2. 11.2.2 Examining performance
    3. 11.3 Common incidents
      1. 11.3.1 A hot partition
      2. 11.3.2 An overwhelmed database
      3. 11.3.3 Failing to meet consistency requirements
    4. Summary
  24. 12 Moving data in bulk with ScyllaDB
    1. 12.1 Extracting data from ScyllaDB
      1. 12.1.1 Using token ranges
      2. 12.1.2 Change data capture
    2. 12.2 Migrating to ScyllaDB
      1. 12.2.1 Dual writing
      2. 12.2.2 SSTableLoader
      3. 12.2.3 Spark Migrator
      4. 12.2.4 Writing a migrator
      5. 12.2.5 Validating migrations
    3. Summary
  25. appendix Docker
    1. A.1 Linux
    2. A.2 macOS
    3. A.3 Windows
    4. A.4 Running ScyllaDB on Docker
  26. index

Product information

  • Title: ScyllaDB in Action
  • Author(s): Bo Ingram
  • Release date: October 2024
  • Publisher(s): Manning Publications
  • ISBN: 9781633437265