O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Seven NoSQL Databases in a Week

Book Description

A beginner's guide to get you up and running with Cassandra, DynamoDB, HBase, InfluxDB, MongoDB, Neo4j, and Redis

About This Book

  • Covers the basics of 7 NoSQL databases and how they are used in the enterprises
  • Quick introduction to MongoDB, DynamoDB, Redis, Cassandra, Neo4j, InfluxDB, and Hbase
  • Includes effective techniques for database querying and management

Who This Book Is For

If you are a budding DBA or a developer who wants to get started with the fundamentals of NoSQL databases, this book is for you. Relational DBAs who want to get insights into the various offerings of popular NoSQL databases will also find this book to be very useful.

What You Will Learn

  • Understand how MongoDB provides high-performance, high-availability, and automatic scaling
  • Interact with your Neo4j instances via database queries, Python scripts, and Java application code
  • Get familiar with common querying and programming methods to interact with Redis
  • Study the different types of problems Cassandra can solve
  • Work with HBase components to support common operations such as creating tables and reading/writing data
  • Discover data models and work with CRUD operations using DynamoDB
  • Discover what makes InfluxDB a great choice for working with time-series data

In Detail

This is the golden age of open source NoSQL databases. With enterprises having to work with large amounts of unstructured data and moving away from expensive monolithic architecture, the adoption of NoSQL databases is rapidly increasing. Being familiar with the popular NoSQL databases and knowing how to use them is a must for budding DBAs and developers.

This book introduces you to the different types of NoSQL databases and gets you started with seven of the most popular NoSQL databases used by enterprises today. We start off with a brief overview of what NoSQL databases are, followed by an explanation of why and when to use them. The book then covers the seven most popular databases in each of these categories: MongoDB, Amazon DynamoDB, Redis, HBase, Cassandra, InfluxDB, and Neo4j. The book doesn't go into too much detail about each database but teaches you enough to get started with them.

By the end of this book, you will have a thorough understanding of the different NoSQL databases and their functionalities, empowering you to select and use the right database according to your needs.

Style and approach

This book is a quick-start guide with short and simple introductory content on the seven popular databases.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Seven NoSQL Databases in a Week
  3. Dedication
  4. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  5. Contributors
    1. About the authors
    2. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  7. Introduction to NoSQL Databases
    1. Consistency versus availability
    2. ACID guarantees
    3. Hash versus range partition
    4. In-place updates versus appends
    5. Row versus column versus column-family storage models
    6. Strongly versus loosely enforced schemas
    7. Summary
  8. MongoDB
    1. Installing of MongoDB
    2. MongoDB data types
      1. The MongoDB database
        1. MongoDB collections
        2. MongoDB documents
      2. The create operation
      3. The read operation
        1. Applying filters on fields
        2. Applying conditional and logical operators on the filter parameter
      4. The update operation
      5. The delete operation
    3. Data models in MongoDB
      1. The references document data model
      2. The embedded data model
    4. Introduction to MongoDB indexing
      1. The default _id index
    5. Replication
      1. Replication in MongoDB
      2. Automatic failover in replication
      3. Read operations
    6. Sharding
      1. Sharded clusters
      2. Advantages of sharding
    7. Storing large data in MongoDB
    8. Summary
  9. Neo4j
    1. What is Neo4j?
    2. How does Neo4j work?
    3. Features of Neo4j
      1. Clustering
      2. Neo4j Browser
      3. Cache sharding
      4. Help for beginners
    4. Evaluating your use case
      1. Social networks
      2. Matchmaking
      3. Network management
      4. Analytics
      5. Recommendation engines
    5. Neo4j anti-patterns
      1. Applying relational modeling techniques in Neo4j
      2. Using Neo4j for the first time on something mission-critical
      3. Storing entities and relationships within entities
      4. Improper use of relationship types
      5. Storing binary large object data
      6. Indexing everything
    6. Neo4j hardware selection, installation, and configuration
      1. Random access memory
      2. CPU
      3. Disk
      4. Operating system
      5. Network/firewall
      6. Installation
        1. Installing JVM
      7. Configuration
        1. High-availability clustering
        2. Causal clustering
    7. Using Neo4j
      1. Neo4j Browser
      2. Cypher
        1. Python
        2. Java
      3. Taking a backup with Neo4j
        1. Backup/restore with Neo4j Enterprise
        2. Backup/restore with Neo4j Community
      4. Differences between the Neo4j Community and Enterprise Editions
    8. Tips for success
    9. Summary
    10. References 
  10. Redis
    1. Introduction to Redis
    2. What are the key features of Redis?
      1. Performance
      2. Tunable data durability
      3. Publish/Subscribe
      4. Useful data types
      5. Expiring data over time
      6. Counters
      7. Server-side Lua scripting
    3. Appropriate use cases for Redis
      1. Data fits into RAM
      2. Data durability is not a concern
      3. Data at scale
      4. Simple data model
      5. Features of Redis matching part of your use case
    4. Data modeling and application design with Redis
      1. Taking advantage of Redis' data structures
      2. Queues
      3. Sets
      4. Notifications
      5. Counters
      6. Caching
    5. Redis anti-patterns
      1. Dataset cannot fit into RAM
      2. Modeling relational data
      3. Improper connection management
      4. Security
      5. Using the KEYS command
      6. Unnecessary trips over the network
      7. Not disabling THP
    6. Redis setup, installation, and configuration
      1. Virtualization versus on-the-metal
      2. RAM
      3. CPU
      4. Disk
      5. Operating system
      6. Network/firewall
      7. Installation
        1. Configuration files
    7. Using Redis
      1. redis-cli
      2. Lua
      3. Python
      4. Java
      5. Taking a backup with Redis
      6. Restoring from a backup
    8. Tips for success
    9. Summary
    10. References
  11. Cassandra
    1. Introduction to Cassandra
    2. What problems does Cassandra solve?
    3. What are the key features of Cassandra?
      1. No single point of failure
      2. Tunable consistency
      3. Data center awareness
      4. Linear scalability
      5. Built on the JVM
    4. Appropriate use cases for Cassandra
      1. Overview of the internals
      2. Data modeling in Cassandra
        1. Partition keys
        2. Clustering keys
        3. Putting it all together
      3. Optimal use cases
    5. Cassandra anti-patterns
      1. Frequently updated data
      2. Frequently deleted data
      3. Queues or queue-like data
      4. Solutions requiring query flexibility
      5. Solutions requiring full table scans
      6. Incorrect use of BATCH statements
        1. Using Byte Ordered Partitioner
        2. Using a load balancer in front of Cassandra nodes
        3. Using a framework driver
    6. Cassandra hardware selection, installation, and configuration
      1. RAM
      2. CPU
      3. Disk
      4. Operating system
      5. Network/firewall
      6. Installation using apt-get
      7. Tarball installation
      8. JVM installation
    7. Node configuration
    8. Running Cassandra
      1. Adding a new node to the cluster
    9. Using Cassandra
      1. Nodetool
      2. CQLSH
      3. Python
      4. Java
      5. Taking a backup with Cassandra
      6. Restoring from a snapshot
    10. Tips for success
      1. Run Cassandra on Linux
      2. Open ports 7199, 7000, 7001, and 9042
      3. Enable security
      4. Use solid state drives (SSDs) if possible
      5. Configure only one or two seed nodes per data center
      6. Schedule weekly repairs
      7. Do not force a major compaction
      8. Remember that every mutation is a write
      9. The data model is key
      10. Consider a support contract
      11. Cassandra is not a general purpose database
    11. Summary
    12. References
  12. HBase
    1. Architecture
      1. Components in the HBase stack
        1. Zookeeper
        2. HDFS
        3. HBase master
        4. HBase RegionServers
    2. Reads and writes
      1. The HBase write path
        1. HBase writes – design motivation
      2. The HBase read path
      3. HBase compactions
    3. System trade-offs
    4. Logical and physical data models
    5. Interacting with HBase – the HBase shell
    6. Interacting with HBase – the HBase Client API
      1. Interacting with secure HBase clusters
    7. Advanced topics
      1. HBase high availability
        1. Replicated reads
        2. HBase in multiple regions
      2. HBase coprocessors
      3. SQL over HBase
    8. Summary
  13. DynamoDB
    1. The difference between SQL and DynamoDB
    2. Setting up DynamoDB
      1. Setting up locally
      2. Setting up using AWS
      3. The difference between downloadable DynamoDB and DynamoDB web services
    3. DynamoDB data types and terminology
      1. Tables, items, and attributes
      2. Primary key
      3. Secondary indexes
      4. Streams
      5. Queries
      6. Scan
      7. Data types
    4. Data models and CRUD operations in DynamoDB
    5. Limitations of DynamoDB
    6. Best practices
    7. Summary
  14. InfluxDB
    1. Introduction to InfluxDB
      1. Key concepts and terms of InfluxDB
      2. Data model and storage engine
      3. Storage engine
    2. Installation and configuration
      1. Installing InfluxDB
      2. Configuring InfluxDB
      3. Production deployment considerations
    3. Query language and API
      1. Query language
      2. Query pagination
      3. Query performance optimizations
      4. Interaction via Rest API
      5. InfluxDB API client
      6. InfluxDB with Java client
      7. InfluxDB with a Python client
      8. InfluxDB with Go client
    4. InfluxDB ecosystem
      1. Telegraf
        1. Telegraf data management
      2. Kapacitor
    5. InfluxDB operations
      1. Backup and restore
        1. Backups
        2. Restore
      2. Clustering and HA
      3. Retention policy
      4. Monitoring
    6. Summary
  15. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think