book

Mastering Apache Cassandra - Second Edition

by Nishant Neeraj

March 2015

Beginner to intermediate

350 pages

7h 54m

English

Packt Publishing

Read now

Unlock full access

Support files, eBooks, discount offers, and moreWhy subscribe?Free access for Packt account holders
What this book covers

Downloading the example codeErrataPiracyQuestions
Introduction to CassandraA distributed databaseHigh availabilityReplicationMultiple data centers
Modeling dataWriting codeSetting upInserting recordsRetrieving dataWriting your applicationGetting the connectionExecuting queriesObject mapping
Problems in the RDBMS world
The CAP theoremConsistencyAvailabilityPartition-toleranceThe significance of the CAP theorem
Ring representationVirtual nodesHow Cassandra worksWrite in actionRead in actionThe components of CassandraThe messaging serviceGossipFailure detectionGossip and failure detectionPartitionerReplicationThe notorious R + W > N inequalityLSM treeCommit logMemTableSSTableThe bloom filterIndex filesData filesCompactionTombstonesHinted handoffRead repair and anti-entropyMerkle tree
The Cassandra data modelThe counter column (cell)The expiring cellThe column familyKeyspacesData typesThe primary indexCQL3Creating a keyspaceSimpleStrategyNetworkTopologyStrategyAltering a keyspaceCreating a tableTable propertiesAltering a tableAdding a columnRenaming a columnChanging the data typeDropping a columnUpdating the table propertiesDropping a tableCreating an indexDropping an indexCreating a data typeAltering a custom typeDropping a custom typeCreating triggersDropping a triggerCreating a userAltering a userDropping a userThe granting permissionRevoking permission using REVOKEInserting dataCollections in CQLListsSetsMapsLightweight transactionsUpdating a rowDeleting a rowExecuting the BATCH statementOther CQL commandsUSETRUNCATELIST USERSLIST PERMISSIONS
DESCRIBETRACINGCONSISTENCYCOPYCAPTUREASSUMESOURCESHOWEXIT
Evaluating requirementsHard disk capacityRAMCPUIs node a server?Network
Optimizing user limitsSwapping memoryClock synchronizationDisk readahead
Installing Oracle Java 7RHEL and CentOS systemsDebian and Ubuntu systemsInstalling the Java Native Access library
Installing from a tarballInstalling from ASFRepository for Debian or UbuntuAnatomy of the installationCassandra binariesConfiguration filesSetting up data and commitlog directories
The cluster nameThe seed nodeListen, broadcast, and RPC addressesnum_tokens versus initial_tokennum_tokensinitial_tokenPartitionersThe Random partitionerThe Byte-ordered partitionerThe Mumur3 partitionerSnitchesSimpleSnitchPropertyFileSnitchGossipingPropertyFileSnitchRackInferringSnitchEC2SnitchEC2MultiRegionSnitchReplica placement strategiesSimpleStrategyNetworkTopologyStrategyMultiple data center setupsLaunching a cluster with a scriptCreating a keyspace
Stress testingDatabase schemaData distributionWrite patternRead queries
Write performanceRead performanceChoosing the right compaction strategySize-tiered compaction strategyLeveled compactionRow cacheKey cacheCache settingsEnabling compressionTuning the bloom filterMore tuning via cassandra.yamlcommitlog_synccolumn_index_size_in_kbcommitlog_total_space_in_mbTweaking JVMJava heapGarbage collectionOther JVM optionsScaling horizontally and verticallyNetwork
ScalingAdding nodes to a clusterAdding new nodes in vnode-enabled clustersAdding a new node to a cluster without vnodesRemoving nodes from a clusterRemoving a live nodeRemoving a dead node
Using the Cassandra bulk loader to restore the data
Cassandra's JMX interfaceAccessing MBeans using JConsole
Monitoring with nodetoolcfstatsnetstatsstatusring and describeringtpstatscompactionstatsinfoManaging administration with nodetooldraindecommissionremovenodemoverepairupgradesstablesnapshot
The OpsCenter featuresInstalling OpsCenter and an agentPrerequisitesRunning a Cassandra clusterInstalling OpsCenter from tarballSetting up an OpsCenter agentMonitoring and administrating with OpsCenterOther features of OpsCenter
Installing NagiosPrerequisitesPreparationInstallationInstalling NagiosConfiguring Apache httpdInstalling Nagios pluginsSetting up Nagios as a serviceNagios pluginsNagios plugins for CassandraExecuting remote plugins via the NRPE pluginInstalling NRPE on host machinesInstalling the NRPE plugin on a Nagios machineSetting up things to monitorMonitoring and notification using Nagios
Enabling Java options for GC logging
High CPU usageHigh memory usageHotspotsOpen JDK's erratic behaviorDisk performanceSlow snapshotsGetting help from the mailing list
Using Hadoop
Introduction to HadoopHDFSData managementNameNodeDataNodesHadoop MapReduceJobTrackerTaskTrackerReliability of data and processes in HadoopSetting up local HadoopTesting the installation
Preparing Cassandra for HadoopColumnFamilyInputFormatColumnFamilyOutputFormatCqlOutputFormat and CqlInputFormatConfigHelperWide row supportBulk loadingSecondary index support
Executing, debugging, monitoring, and looking at results
Cassandra filesystem
Installing PigIntegrating Pig and CassandraIntegration with other analytical tools

Content preview from Mastering Apache Cassandra - Second Edition

Cassandra with Hadoop MapReduce

Cassandra provides built-in support for Hadoop. If you have ever written a MapReduce program, you will find out that writing a MapReduce task with Cassandra is quite similar to how one would write a MapReduce task for the data stored in HDFS. Cassandra supports input to Hadoop with ColumnFamilyInputFormat and output with the ColumnFamilyOutputFormat classes, respectively. Apart from these, you will need to put Cassandra-specific settings for Hadoop via ConfigHelper. These three classes are enough to get you started. Another class that might be worth looking at is BulkOutputFormat. All these classes are under the org.apache.cassandra.hadoop.* package.

To be able to compile the MapReduce code that uses Cassandra as ...