book

Cassandra: The Definitive Guide, (Revised) Third Edition, 3rd Edition

by Jeff Carpenter, Eben Hewitt

January 2022

Intermediate to advanced

430 pages

12h 10m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Why Apache Cassandra?Is This Book for You?What’s in This Book?New for the Third EditionNote on the Revised Third EditionConventions Used in This BookUsing Code ExamplesO’Reilly Interactive Katacoda ScenariosO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Beyond Relational Databases
What’s Wrong with Relational Databases?A Quick Review of Relational DatabasesTransactions, ACID-ity, and Two-Phase CommitSchemaSharding and Shared-Nothing ArchitectureWeb ScaleThe Rise of NoSQLSummary
2. Introducing Cassandra
The Cassandra Elevator PitchCassandra in 50 Words or LessDistributed and DecentralizedElastic ScalabilityHigh Availability and Fault ToleranceTuneable ConsistencyBrewer’s CAP TheoremRow-OrientedHigh PerformanceWhere Did Cassandra Come From?Is Cassandra a Good Fit for My Project?Large DeploymentsLots of Writes, Statistics, and AnalysisGeographical DistributionHybrid Cloud and Multicloud DeploymentGetting InvolvedSummary
3. Installing Cassandra
Installing the Apache DistributionExtracting the DownloadWhat’s in There?Building from SourceAdditional Build TargetsRunning CassandraSetting the EnvironmentStarting the ServerStopping CassandraOther Cassandra DistributionsRunning the CQL ShellBasic cqlsh Commandscqlsh HelpDescribing the Environment in cqlshCreating a Keyspace and Table in cqlshWriting and Reading Data in cqlshRunning Cassandra in DockerSummary
4. The Cassandra Query Language
The Relational Data ModelCassandra’s Data ModelClustersKeyspacesTablesColumnsCQL TypesNumeric Data TypesTextual Data TypesTime and Identity Data TypesOther Simple Data TypesCollectionsTuplesUser-Defined TypesSummary
5. Data Modeling
Conceptual Data ModelingRDBMS DesignDesign Differences Between RDBMS and CassandraDefining Application QueriesLogical Data ModelingHotel Logical Data ModelReservation Logical Data ModelPhysical Data ModelingHotel Physical Data ModelReservation Physical Data ModelEvaluating and RefiningCalculating Partition SizeCalculating Size on DiskBreaking Up Large PartitionsDefining Database SchemaCassandra Data Modeling ToolsSummary
6. The Cassandra Architecture
Data Centers and RacksGossip and Failure DetectionSnitchesRings and TokensVirtual NodesPartitionersReplication StrategiesConsistency LevelsQueries and Coordinator NodesHinted HandoffAnti-Entropy, Repair, and Merkle TreesLightweight Transactions and PaxosMemtables, SSTables, and Commit LogsBloom FiltersCachingCompactionDeletion and TombstonesManagers and ServicesCassandra DaemonStorage EngineStorage ServiceStorage ProxyMessaging ServiceStream ManagerCQL Native Transport ServerSystem KeyspacesSummary
7. Designing Applications with Cassandra
Hotel Application DesignCassandra and Microservice ArchitectureMicroservice Architecture for a Hotel ApplicationIdentifying Bounded ContextsIdentifying ServicesDesigning Microservice PersistenceExtending DesignsSecondary IndexesMaterialized ViewsReservation Service: A Sample MicroserviceDesign Choices for a Java MicroserviceDeployment and Integration ConsiderationsServices, Keyspaces, and ClustersData Centers and Load BalancingInteractions Between MicroservicesSummary
8. Application Development with Drivers
DataStax Java DriverDevelopment Environment ConfigurationConnecting to a ClusterStatementsSimple StatementsPrepared StatementsQuery BuilderObject MapperAsynchronous ExecutionDriver ConfigurationMetadataDebugging and MonitoringDataStax Python DriverDataStax Node.js DriverDataStax C# DriverOther Cassandra DriversSummary

9. Writing and Reading Data
WritingWrite Consistency LevelsThe Cassandra Write PathWriting Files to DiskLightweight TransactionsBatchesReadingRead Consistency LevelsThe Cassandra Read PathRead RepairRange Queries, Ordering and FilteringPagingDeletingSummary
10. Configuring and Deploying Cassandra
Cassandra Cluster ManagerCreating a ClusterAdding Nodes to a ClusterDynamic Ring ParticipationNode ConfigurationSeed NodesSnitchesPartitionersTokens and Virtual NodesNetwork InterfacesData StorageStartup and JVM SettingsPlanning a Cluster DeploymentCluster Topology and Replication StrategiesSizing Your ClusterSelecting InstancesStorageNetworkCloud DeploymentAmazon Web ServicesGoogle Cloud PlatformMicrosoft AzureSummary
11. Monitoring
Monitoring Cassandra with JMXCassandra’s MBeansDatabase MBeansCluster-Related MBeansInternal MBeansMonitoring with nodetoolGetting Cluster InformationGetting StatisticsVirtual TablesSystem Virtual SchemaSystem ViewsMetricsLoggingExamining Log FilesFull Query LoggingSummary
12. Maintenance
Health CheckCommon Maintenance TasksFlushCleanupRepairRebuilding IndexesMoving TokensAdding NodesAdding Nodes to an Existing Data CenterAdding a Data Center to a ClusterHandling Node FailureRepairing Failed NodesReplacing NodesRemoving NodesUpgrading CassandraBackup and RecoveryTaking a SnapshotClearing a SnapshotEnabling Incremental BackupRestoring from SnapshotSSTable UtilitiesMaintenance ToolsNetflix PriamDataStax OpsCenterCassandra SidecarsCassandra Kubernetes OperatorsSummary
13. Performance Tuning
Managing PerformanceSetting Performance GoalsBenchmarking and Stress TestingMonitoring PerformanceAnalyzing Performance IssuesTracingTuning MethodologyCachingKey CacheRow CacheChunk CacheCounter CacheSaved Cache SettingsMemtablesCommit LogsSSTablesHinted HandoffCompactionConcurrency and ThreadingNetworking and TimeoutsJVM SettingsMemoryGarbage CollectionSummary
14. Security
Authentication and AuthorizationPassword AuthenticatorUsing CassandraAuthorizerRole-Based Access ControlEncryptionSSL, TLS, and CertificatesNode-to-Node EncryptionClient-to-Node EncryptionJMX SecuritySecuring JMX AccessSecurity MBeansAudit LoggingSummary
15. Migrating and Integrating
Knowing When to MigrateAdapting the Data ModelTranslating EntitiesTranslating RelationshipsAdapting the ApplicationRefactoring Data AccessMaintaining ConsistencyMigrating Stored ProceduresPlanning the DeploymentMigrating DataZero-Downtime MigrationBulk LoadingCommon IntegrationsManaging Data Flow with Apache KafkaSearching with Apache Lucene, SOLR, and ElasticsearchAnalyzing Data with Apache SparkSummary
Index
About the Authors

Content preview from Cassandra: The Definitive Guide, (Revised) Third Edition, 3rd Edition

Chapter 11. Monitoring

The term observability is often used to describe a desirable attribute of distributed systems. Observability means having visibility into the various components of a system in order to detect, predict, and perhaps even prevent the complex failures that can occur in distributed systems. Failures in individual components can affect other components in turn, and multiple failures can interact in unforeseen ways, leading to system-wide outages. Common elements of an observability strategy for a system include metrics, logging, and tracing.

In this chapter, you’ll learn how Cassandra supports these elements of observability and how to use available tools to monitor and understand important events in the life cycle of your Cassandra cluster. We’ll look at some simple ways to see what’s going on, such as changing the logging levels and understanding the output.

To begin, let’s discuss how Cassandra uses the Java Management Extensions (JMX) to expose information about its internal operations and allow the dynamic configuration of some of its behavior. That will give you a basis to learn how to monitor Cassandra with various tools.

Monitoring Cassandra with JMX

Cassandra makes use of JMX to enable remote management of your nodes. JMX started as Java Specification Request (JSR) 160 and has been a core part of Java since version 5.0. You can read more about the JMX implementation in Java by examining the java.lang.management package.

JMX is a Java API that provides ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Kafka: The Definitive Guide, 2nd Edition

Publisher Resources

ISBN: 9781492097136Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Cassandra: The Definitive Guide, (Revised) Third Edition, 3rd Edition

by Jeff Carpenter, Eben Hewitt

Chapter 11. Monitoring

Monitoring Cassandra with JMX

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.