Learning Apache Cassandra

Book description

Build an efficient, scalable, fault-tolerant, and highly-available data layer into your application using Cassandra

In Detail

Cassandra is a distributed database that stands out for its robust feature set and intuitive interface, while still providing the high availability and scalability of a distributed store.

Starting from installing Cassandra and creating your first keyspace, to mastering the different table structures Cassandra offers and exploring the latest and most powerful features of the Cassandra Query Language, CQL3, this book explores each topic through the lens of a real-world example application. With plenty of examples, tips, and clear explanations, you'll master compound primary keys, collection columns, lightweight transactions, and many other advanced aspects of Cassandra.

By the end of the book, you'll be fully equipped to build powerful, scalable Cassandra database layers for your applications.

What You Will Learn

  • Install Cassandra and create your first keyspace
  • Choose the right table structure for the task at hand in a variety of scenarios
  • Use range slice queries for efficient data access
  • Effortlessly handle concurrent updates with collection columns
  • Ensure data integrity with lightweight transactions and logged batches
  • Understand eventual consistency and use the right consistency level for your situation
  • Implement best practices for data modeling and access

Table of contents

  1. Learning Apache Cassandra
    1. Table of Contents
    2. Learning Apache Cassandra
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Getting Up and Running with Cassandra
      1. What Cassandra offers, and what it doesn't
        1. Horizontal scalability
        2. High availability
        3. Write optimization
        4. Structured records
        5. Secondary indexes
        6. Efficient result ordering
        7. Immediate consistency
        8. Discretely writable collections
        9. Relational joins
        10. MapReduce
        11. Comparing Cassandra to the alternatives
      2. Installing Cassandra
        1. Installing on Mac OS X
        2. Installing on Ubuntu
        3. Installing on Windows
      3. Bootstrapping the project
        1. CQL – the Cassandra Query Language
        2. Interacting with Cassandra
      4. Creating a keyspace
        1. Selecting a keyspace
      5. Summary
    9. 2. The First Table
      1. Creating the users table
        1. Structuring of tables
        2. Table and column options
        3. The type system
          1. Strings
          2. Integers
          3. Floating point and decimal numbers
          4. Dates and times
          5. UUIDs
          6. Booleans
          7. Blobs
          8. The purpose of types
      2. Inserting data
        1. Writing data does not yield feedback
        2. Partial inserts
      3. Selecting data
        1. Missing rows
        2. Selecting more than one row
        3. Retrieving all the rows
          1. Paginating through results
      4. Developing a mental model for Cassandra
      5. Summary
    10. 3. Organizing Related Data
      1. A table for status updates
        1. Creating a table with a compound primary key
        2. The structure of the status updates table
          1. UUIDs and timestamps
      2. Working with status updates
        1. Extracting timestamps
        2. Looking up a specific status update
        3. Automatically generating UUIDs
      3. Anatomy of a compound primary key
        1. Anatomy of a single-column primary key
      4. Beyond two columns
      5. Compound keys represent parent-child relationships
      6. Coupling parents and children using static columns
        1. Defining static columns
        2. Working with static columns
        3. Interacting only with the static columns
          1. Static-only inserts
          2. Static columns act like predefined joins
          3. When to use static columns
      7. Refining our mental model
      8. Summary
    11. 4. Beyond Key-Value Lookup
      1. Looking up rows by partition
        1. The limits of the WHERE keyword
          1. Restricting by clustering column
          2. Restricting by part of a partition key
      2. Retrieving status updates for a specific time range
        1. Creating time UUID ranges
        2. Selecting a slice of a partition
      3. Paginating over rows in a partition
        1. Counting rows
      4. Reversing the order of rows
        1. Reversing clustering order at query time
        2. Reversing clustering order in the schema
      5. Paginating over multiple partitions
      6. Building an autocomplete function
      7. Summary
    12. 5. Establishing Relationships
      1. Modeling follow relationships
        1. Outbound follows
        2. Inbound follows
      2. Storing follow relationships
        1. Designing around queries
        2. Denormalization
      3. Looking up follow relationships
      4. Unfollowing users
      5. Using secondary indexes to avoid denormalization
        1. The form of the single table
        2. Adding a secondary index
        3. Other uses of secondary indexes
        4. Limitations of secondary indexes
          1. Secondary indexes can only have one column
          2. Secondary indexes can only be tested for equality
          3. Secondary index lookup is not as efficient as primary key lookup
      6. Summary
    13. 6. Denormalizing Data for Maximum Performance
      1. A normalized approach
        1. Generating the timeline
        2. Ordering and pagination
        3. Multiple partitions and read efficiency
      2. Partial denormalization
        1. Displaying the home timeline
        2. Read performance and write complexity
      3. Fully denormalizing the home timeline
        1. Creating a status update
        2. Displaying the home timeline
      4. Write complexity and data integrity
      5. Summary
    14. 7. Expanding Your Data Model
      1. Viewing a table schema in cqlsh
      2. Adding columns to tables
      3. Deleting columns
      4. Updating the existing rows
        1. Updating multiple columns
        2. Updating multiple rows
      5. Removing a value from a column
        1. Missing columns in Cassandra
        2. Deleting specific columns
        3. Syntactic sugar for deletion
      6. Inserts, updates, and upserts
        1. Inserts can overwrite existing data
        2. Checking before inserting isn't enough
        3. Another advantage of UUIDs
        4. Conditional inserts and lightweight transactions
        5. Updates can create new rows
        6. Optimistic locking with conditional updates
          1. Optimistic locking in action
          2. Optimistic locking and accidental updates
      7. Lightweight transactions have a cost
        1. When lightweight transactions aren't necessary
      8. Summary
    15. 8. Collections, Tuples, and User-defined Types
      1. The problem with concurrent updates
        1. Serializing the collection
        2. Introducing concurrency
      2. Collection columns and concurrent updates
        1. Defining collection columns
        2. Reading and writing sets
          1. Advanced set manipulation
          2. Removing values from a set
          3. Sets and uniqueness
          4. Collections and upserts
      3. Using lists for ordered, nonunique values
        1. Defining a list column
        2. Writing a list
        3. Discrete list manipulation
          1. Writing data at a specific index
          2. Removing elements from the list
      4. Using maps to store key-value pairs
        1. Writing a map
        2. Updating discrete values in a map
          1. Removing values from maps
      5. Collections in inserts
      6. Collections and secondary indexes
        1. Secondary indexes on map columns
      7. The limitations of collections
        1. Reading discrete values from collections
          1. Collection size limit
        2. Reading a collection column from multiple rows
        3. Performance of collection operations
      8. Working with tuples
        1. Creating a tuple column
        2. Writing to tuples
        3. Indexing tuples
      9. User-defined types
        1. Creating a user-defined type
        2. Assigning a user-defined type to a column
        3. Adding data to a user-defined column
        4. Indexing and querying user-defined types
        5. Partial selection of user-defined types
      10. Choosing between tuples and user-defined types
      11. Comparing data structures
      12. Summary
    16. 9. Aggregating Time-Series Data
      1. Recording discrete analytics observations
        1. Using discrete analytics observations
        2. Slicing and dicing our data
      2. Recording aggregate analytics observations
        1. Answering the right question
        2. Precomputation versus read-time aggregation
        3. The many possibilities for aggregation
          1. The role of discrete observations
      3. Recording analytics observations
        1. Updating a counter column
        2. Counters and upserts
        3. Setting and resetting counter columns
        4. Counter columns and deletion
        5. Counter columns need their own table
      4. Summary
    17. 10. How Cassandra Distributes Data
      1. Data distribution in Cassandra
        1. Cassandra's partitioning strategy: partition key tokens
          1. Distributing partition tokens
          2. Partition keys group data on the same node
          3. Virtual nodes
          4. Virtual nodes facilitate redistribution
      2. Data replication in Cassandra
        1. Masterless replication
          1. Replication without a master
      3. Consistency
        1. Immediate and eventual consistency
        2. Consistency in Cassandra
          1. The anatomy of a successful request
        3. Tuning consistency
          1. Eventual consistency with ONE
          2. Immediate consistency with ALL
          3. Fault-tolerant immediate consistency with QUORUM
        4. Comparing consistency levels
          1. Choosing the right consistency level
        5. The CAP theorem
      4. Handling conflicting data
        1. Last-write-wins conflict resolution
        2. Introspecting write timestamps
        3. Overriding write timestamps
      5. Distributed deletion
        1. Stumbling on tombstones
        2. Expiring columns with TTL
      6. Summary
    18. A. Peeking Under the Hood
      1. Using cassandra-cli
      2. The structure of a simple primary key table
        1. Exploring cells
        2. A model of column families: RowKey and cells
      3. Compound primary keys in column families
        1. A complete mapping
        2. The wide row data structure
        3. The empty cell
      4. Collection columns in column families
        1. Set columns in column families
        2. Map columns in column families
        3. List columns in column families
          1. Appending and prepending values to lists
        4. Other list operations
      5. Summary
    19. B. Authentication and Authorization
      1. Enabling authentication and authorization
        1. Authentication, authorization, and fault tolerance
        2. Authentication with cqlsh
        3. Authentication in your application
      2. Setting up a user
        1. Changing a user's password
        2. Viewing user accounts
      3. Controlling access
        1. Viewing permissions
        2. Revoking access
      4. Authorization in action
        1. Authorization as a hedge against mistakes
      5. Security beyond authentication and authorization
        1. Security protects against vulnerabilities
      6. Summary
      7. Wrapping up
    20. Index

Product information

  • Title: Learning Apache Cassandra
  • Author(s): Mat Brown
  • Release date: February 2015
  • Publisher(s): Packt Publishing
  • ISBN: 9781783989201