book

Cassandra: The Definitive Guide, 3rd Edition

by Jeff Carpenter, Eben Hewitt

April 2020

Intermediate to advanced

426 pages

12h 4m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Why Apache Cassandra?Is This Book for You?What’s in This Book?New for the Third EditionConventions Used in This BookUsing Code ExamplesO’Reilly Interactive Katacoda ScenariosO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Beyond Relational Databases
What’s Wrong with Relational Databases?A Quick Review of Relational DatabasesTransactions, ACID-ity, and Two-Phase CommitSchemaSharding and Shared-Nothing ArchitectureWeb ScaleThe Rise of NoSQLSummary
2. Introducing Cassandra
The Cassandra Elevator PitchCassandra in 50 Words or LessDistributed and DecentralizedElastic ScalabilityHigh Availability and Fault ToleranceTuneable ConsistencyBrewer’s CAP TheoremRow-OrientedHigh PerformanceWhere Did Cassandra Come From?Is Cassandra a Good Fit for My Project?Large DeploymentsLots of Writes, Statistics, and AnalysisGeographical DistributionHybrid Cloud and Multicloud DeploymentGetting InvolvedSummary
3. Installing Cassandra
Installing the Apache DistributionExtracting the DownloadWhat’s in There?Building from SourceAdditional Build TargetsRunning CassandraSetting the EnvironmentStarting the ServerStopping CassandraOther Cassandra DistributionsRunning the CQL ShellBasic cqlsh Commandscqlsh HelpDescribing the Environment in cqlshCreating a Keyspace and Table in cqlshWriting and Reading Data in cqlshRunning Cassandra in DockerSummary
4. The Cassandra Query Language
The Relational Data ModelCassandra’s Data ModelClustersKeyspacesTablesColumnsCQL TypesNumeric Data TypesTextual Data TypesTime and Identity Data TypesOther Simple Data TypesCollectionsTuplesUser-Defined TypesSummary
5. Data Modeling
Conceptual Data ModelingRDBMS DesignDesign Differences Between RDBMS and CassandraDefining Application QueriesLogical Data ModelingHotel Logical Data ModelReservation Logical Data ModelPhysical Data ModelingHotel Physical Data ModelReservation Physical Data ModelEvaluating and RefiningCalculating Partition SizeCalculating Size on DiskBreaking Up Large PartitionsDefining Database SchemaCassandra Data Modeling ToolsSummary
6. The Cassandra Architecture
Data Centers and RacksGossip and Failure DetectionSnitchesRings and TokensVirtual NodesPartitionersReplication StrategiesConsistency LevelsQueries and Coordinator NodesHinted HandoffAnti-Entropy, Repair, and Merkle TreesLightweight Transactions and PaxosMemtables, SSTables, and Commit LogsBloom FiltersCachingCompactionDeletion and TombstonesManagers and ServicesCassandra DaemonStorage EngineStorage ServiceStorage ProxyMessaging ServiceStream ManagerCQL Native Transport ServerSystem KeyspacesSummary
7. Designing Applications with Cassandra
Hotel Application DesignCassandra and Microservice ArchitectureMicroservice Architecture for a Hotel ApplicationIdentifying Bounded ContextsIdentifying ServicesDesigning Microservice PersistenceExtending DesignsSecondary IndexesMaterialized ViewsReservation Service: A Sample MicroserviceDesign Choices for a Java MicroserviceDeployment and Integration ConsiderationsServices, Keyspaces, and ClustersData Centers and Load BalancingInteractions Between MicroservicesSummary
8. Application Development with Drivers
DataStax Java DriverDevelopment Environment ConfigurationConnecting to a ClusterStatementsSimple StatementsPrepared StatementsQuery BuilderObject MapperAsynchronous ExecutionDriver ConfigurationMetadataDebugging and MonitoringOther Cassandra DriversSummary

9. Writing and Reading Data
WritingWrite Consistency LevelsThe Cassandra Write PathWriting Files to DiskLightweight TransactionsBatchesReadingRead Consistency LevelsThe Cassandra Read PathRead RepairRange Queries, Ordering and FilteringPagingDeletingSummary
10. Configuring and Deploying Cassandra
Cassandra Cluster ManagerCreating a ClusterAdding Nodes to a ClusterDynamic Ring ParticipationNode ConfigurationSeed NodesSnitchesPartitionersTokens and Virtual NodesNetwork InterfacesData StorageStartup and JVM SettingsPlanning a Cluster DeploymentCluster Topology and Replication StrategiesSizing Your ClusterSelecting InstancesStorageNetworkCloud DeploymentAmazon Web ServicesGoogle Cloud PlatformMicrosoft AzureSummary
11. Monitoring
Monitoring Cassandra with JMXCassandra’s MBeansDatabase MBeansCluster-Related MBeansInternal MBeansMonitoring with nodetoolGetting Cluster InformationGetting StatisticsVirtual TablesSystem Virtual SchemaSystem ViewsMetricsLoggingExamining Log FilesFull Query LoggingSummary
12. Maintenance
Health CheckCommon Maintenance TasksFlushCleanupRepairRebuilding IndexesMoving TokensAdding NodesAdding Nodes to an Existing Data CenterAdding a Data Center to a ClusterHandling Node FailureRepairing Failed NodesReplacing NodesRemoving NodesUpgrading CassandraBackup and RecoveryTaking a SnapshotClearing a SnapshotEnabling Incremental BackupRestoring from SnapshotSSTable UtilitiesMaintenance ToolsNetflix PriamDataStax OpsCenterCassandra SidecarsCassandra Kubernetes OperatorsSummary
13. Performance Tuning
Managing PerformanceSetting Performance GoalsBenchmarking and Stress TestingMonitoring PerformanceAnalyzing Performance IssuesTracingTuning MethodologyCachingKey CacheRow CacheChunk CacheCounter CacheSaved Cache SettingsMemtablesCommit LogsSSTablesHinted HandoffCompactionConcurrency and ThreadingNetworking and TimeoutsJVM SettingsMemoryGarbage CollectionSummary
14. Security
Authentication and AuthorizationPassword AuthenticatorUsing CassandraAuthorizerRole-Based Access ControlEncryptionSSL, TLS, and CertificatesNode-to-Node EncryptionClient-to-Node EncryptionJMX SecuritySecuring JMX AccessSecurity MBeansAudit LoggingSummary
15. Migrating and Integrating
Knowing When to MigrateAdapting the Data ModelTranslating EntitiesTranslating RelationshipsAdapting the ApplicationRefactoring Data AccessMaintaining ConsistencyMigrating Stored ProceduresPlanning the DeploymentMigrating DataZero-Downtime MigrationBulk LoadingCommon IntegrationsManaging Data Flow with Apache KafkaSearching with Apache Lucene, SOLR, and ElasticsearchAnalyzing Data with Apache SparkSummary
Index

Content preview from Cassandra: The Definitive Guide, 3rd Edition

Chapter 4. The Cassandra Query Language

In this chapter, you’ll gain an understanding of Cassandra’s data model and how that data model is implemented by the Cassandra Query Language (CQL). We’ll show how CQL supports Cassandra’s design goals and look at some general behavior characteristics.

For developers and administrators coming from the relational world, the Cassandra data model can be difficult to understand initially. Some terms, such as keyspace, are completely new, and some, such as column, exist in both worlds but have slightly different meanings. The syntax of CQL is similar in many ways to SQL, but with some important differences. For those familiar with NoSQL technologies such as Dynamo or Bigtable, it can also be confusing, because although Cassandra may be based on those technologies, its own data model is significantly different.

So in this chapter, we start from relational database terminology and introduce Cassandra’s view of the world. Along the way you’ll get more familiar with CQL and learn how it implements this data model.

The Relational Data Model

In a relational database, the database itself is the outermost container that might correspond to a single application. The database contains tables. Tables have names and contain one or more columns, which also have names. When you add data to a table, you specify a value for every column defined; if you don’t have a value for a particular column, you use null. This new entry adds a row to the table, which you ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Cassandra: The Definitive Guide, 2nd Edition

Publisher Resources

ISBN: 9781098115159Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Cassandra: The Definitive Guide, 3rd Edition

by Jeff Carpenter, Eben Hewitt

Chapter 4. The Cassandra Query Language

The Relational Data Model

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.