book

Cassandra: The Definitive Guide, 2nd Edition

by Jeff Carpenter, Eben Hewitt

July 2016

Intermediate to advanced

367 pages

10h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Why Apache Cassandra?Is This Book for You?What’s in This Book?New for the Second EditionConventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
What’s Wrong with Relational Databases?A Quick Review of Relational DatabasesRDBMSs: The Awesome and the Not-So-MuchWeb ScaleThe Rise of NoSQLSummary
The Cassandra Elevator PitchCassandra in 50 Words or LessDistributed and DecentralizedElastic ScalabilityHigh Availability and Fault ToleranceTuneable ConsistencyBrewer’s CAP TheoremRow-OrientedHigh PerformanceWhere Did Cassandra Come From?Release HistoryIs Cassandra a Good Fit for My Project?Large DeploymentsLots of Writes, Statistics, and AnalysisGeographical DistributionEvolving ApplicationsGetting InvolvedSummary
Installing the Apache DistributionExtracting the DownloadWhat’s In There?Building from SourceAdditional Build TargetsRunning CassandraOn WindowsOn LinuxStarting the ServerStopping CassandraOther Cassandra DistributionsRunning the CQL ShellBasic cqlsh Commandscqlsh HelpDescribing the Environment in cqlshCreating a Keyspace and Table in cqlshWriting and Reading Data in cqlshSummary
The Relational Data ModelCassandra’s Data ModelClustersKeyspacesTablesColumnsCQL TypesNumeric Data TypesTextual Data TypesTime and Identity Data TypesOther Simple Data TypesCollectionsUser-Defined TypesSecondary IndexesSummary
Conceptual Data ModelingRDBMS DesignDesign Differences Between RDBMS and CassandraDefining Application QueriesLogical Data ModelingHotel Logical Data ModelReservation Logical Data ModelPhysical Data ModelingHotel Physical Data ModelReservation Physical Data ModelMaterialized ViewsEvaluating and RefiningCalculating Partition SizeCalculating Size on DiskBreaking Up Large PartitionsDefining Database SchemaDataStax DevCenterSummary
Data Centers and RacksGossip and Failure DetectionSnitchesRings and TokensVirtual NodesPartitionersReplication StrategiesConsistency LevelsQueries and Coordinator NodesMemtables, SSTables, and Commit LogsCachingHinted HandoffLightweight Transactions and PaxosTombstonesBloom FiltersCompactionAnti-Entropy, Repair, and Merkle TreesStaged Event-Driven Architecture (SEDA)Managers and ServicesCassandra DaemonStorage EngineStorage ServiceStorage ProxyMessaging ServiceStream ManagerCQL Native Transport ServerSystem KeyspacesSummary
Cassandra Cluster ManagerCreating a ClusterSeed NodesPartitionersMurmur3 PartitionerRandom PartitionerOrder-Preserving PartitionerByteOrderedPartitionerSnitchesSimple SnitchProperty File SnitchGossiping Property File SnitchRack Inferring SnitchCloud SnitchesDynamic SnitchNode ConfigurationTokens and Virtual NodesNetwork InterfacesData StorageStartup and JVM SettingsAdding Nodes to a ClusterDynamic Ring ParticipationReplication StrategiesSimpleStrategyNetworkTopologyStrategyChanging the Replication FactorSummary

Hector, Astyanax, and Other Legacy ClientsDataStax Java DriverDevelopment Environment ConfigurationClusters and Contact PointsSessions and Connection PoolingStatementsPoliciesMetadataDebugging and MonitoringDataStax Python DriverDataStax Node.js DriverDataStax Ruby DriverDataStax C# DriverDataStax C/C++ DriverDataStax PHP DriverSummary
WritingWrite Consistency LevelsThe Cassandra Write PathWriting Files to DiskLightweight TransactionsBatchesReadingRead Consistency LevelsThe Cassandra Read PathRead RepairRange Queries, Ordering and FilteringFunctions and AggregatesPagingSpeculative RetryDeletingSummary
LoggingTailingExamining Log FilesMonitoring Cassandra with JMXConnecting to Cassandra via JConsoleOverview of MBeansCassandra’s MBeansDatabase MBeansNetworking MBeansMetrics MBeansThreading MBeansService MBeansSecurity MBeansMonitoring with nodetoolGetting Cluster InformationGetting StatisticsSummary
Health CheckBasic MaintenanceFlushCleanupRepairRebuilding IndexesMoving TokensAdding NodesAdding Nodes to an Existing Data CenterAdding a Data Center to a ClusterHandling Node FailureRepairing NodesReplacing NodesRemoving NodesUpgrading CassandraBackup and RecoveryTaking a SnapshotClearing a SnapshotEnabling Incremental BackupRestoring from SnapshotSSTable UtilitiesMaintenance ToolsDataStax OpsCenterNetflix PriamSummary
Managing PerformanceSetting Performance GoalsMonitoring PerformanceAnalyzing Performance IssuesTracingTuning MethodologyCachingKey CacheRow CacheCounter CacheSaved Cache SettingsMemtablesCommit LogsSSTablesHinted HandoffCompactionConcurrency and ThreadingNetworking and TimeoutsJVM SettingsMemoryGarbage CollectionUsing cassandra-stressSummary
Authentication and AuthorizationPassword AuthenticatorUsing CassandraAuthorizerRole-Based Access ControlEncryptionSSL, TLS, and CertificatesNode-to-Node EncryptionClient-to-Node EncryptionJMX SecuritySecuring JMX AccessSecurity MBeansSummary
Planning a Cluster DeploymentSizing Your ClusterSelecting InstancesStorageNetworkCloud DeploymentAmazon Web ServicesMicrosoft AzureGoogle Cloud PlatformIntegrationsApache Lucene, SOLR, and ElasticsearchApache HadoopApache SparkSummary

Content preview from Cassandra: The Definitive Guide, 2nd Edition

Foreword

Cassandra was open-sourced by Facebook in July 2008. This original version of Cassandra was written primarily by an ex-employee from Amazon and one from Microsoft. It was strongly influenced by Dynamo, Amazon’s pioneering distributed key/value database. Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful “column family” data model.

I became involved in December of that year, when Rackspace asked me to build them a scalable database. This was good timing, because all of today’s important open source scalable databases were available for evaluation. Despite initially having only a single major use case, Cassandra’s underlying architecture was the strongest, and I directed my efforts toward improving the code and building a community.

Cassandra was accepted into the Apache Incubator, and by the time it graduated in March 2010, it had become a true open source success story, with committers from Rackspace, Digg, Twitter, and other companies that wouldn’t have written their own database from scratch, but together built something important.

Today’s Cassandra is much more than the early system that powered (and still powers) Facebook’s inbox search; it has become “the hands-down winner for transaction processing performance,” to quote Tony Bain, with a deserved reputation for reliability and performance at scale.

As Cassandra matured and began attracting more mainstream users, it became clear that there was a need for commercial ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Cassandra: The Definitive Guide, 3rd Edition

Publisher Resources

ISBN: 9781491933657Errata Page Supplemental Content

Cassandra: The Definitive Guide, 2nd Edition

by Jeff Carpenter, Eben Hewitt

Foreword

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Cassandra: The Definitive Guide, 3rd Edition