O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Mastering MongoDB 3.x

Book Description

An expert's guide to build fault tolerant MongoDB application

About This Book

  • Master the advanced modeling, querying, and administration techniques in MongoDB and become a MongoDB expert
  • Covers the latest updates and Big Data features frequently used by professional MongoDB developers and administrators
  • If your goal is to become a certified MongoDB professional, this book is your perfect companion

Who This Book Is For

Mastering MongoDB is a book for database developers, architects, and administrators who want to learn how to use MongoDB more effectively and productively.

If you have experience in, and are interested in working with, NoSQL databases to build apps and websites, then this book is for you.

What You Will Learn

  • Get hands-on with advanced querying techniques such as indexing, expressions, arrays, and more.
  • Configure, monitor, and maintain highly scalable MongoDB environment like an expert.
  • Master replication and data sharding to optimize read/write performance.
  • Design secure and robust applications based on MongoDB.
  • Administer MongoDB-based applications on-premise or in the cloud
  • Scale MongoDB to achieve your design goals
  • Integrate MongoDB with big data sources to process huge amounts of data

In Detail

MongoDB has grown to become the de facto NoSQL database with millions of users—from small startups to Fortune 500 companies. Addressing the limitations of SQL schema-based databases, MongoDB pioneered a shift of focus for DevOps and offered sharding and replication maintainable by DevOps teams. The book is based on MongoDB 3.x and covers topics ranging from database querying using the shell, built in drivers, and popular ODM mappers to more advanced topics such as sharding, high availability, and integration with big data sources.

You will get an overview of MongoDB and how to play to its strengths, with relevant use cases. After that, you will learn how to query MongoDB effectively and make use of indexes as much as possible. The next part deals with the administration of MongoDB installations on-premise or in the cloud. We deal with database internals in the next section, explaining storage systems and how they can affect performance. The last section of this book deals with replication and MongoDB scaling, along with integration with heterogeneous data sources. By the end this book, you will be equipped with all the required industry skills and knowledge to become a certified MongoDB developer and administrator.

Style and approach

This book takes a practical, step-by-step approach to explain the concepts of MongoDB. Practical use-cases involving real-world examples are used throughout the book to clearly explain theoretical concepts.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Errata
      3. Piracy
      4. Questions
  2. MongoDB – A Database for the Modern Web
    1. Web history
      1. Web 1.0
      2. Web 2.0
      3. Web 3.0
    2. SQL and NoSQL evolution
      1. MongoDB evolution
        1. Major feature set for versions 1.0 and 1.2
        2. Version 2
        3. Version 3
        4. Version 3+
      2. MongoDB for SQL developers
      3. MongoDB for NoSQL developers
    3. MongoDB key characteristics and use cases
      1. Key characteristics
      2. What is the use case for MongoDB?
      3. MongoDB criticism
    4. MongoDB configuration and best practices
      1. Operational best practices
      2. Schema design best practices
      3. Best practices for write durability
      4. Best practices for replication
      5. Best practices for sharding
      6. Best practices for security
      7. Best practices for AWS
    5. Reference documentation
      1. MongoDB documentation
      2. Packt references
      3. Further reading 
    6. Summary
  3. Schema Design and Data Modeling
    1. Relational schema design
      1. MongoDB schema design
        1. Read-write ratio
      2. Data modeling
        1. Data types
        2. Comparing different data types
          1. Date type
          2. ObjectId
      3. Modeling data for atomic operations
        1. Write isolation
        2. Read isolation and consistency
      4. Modeling relationships
        1. One-to-one
        2. One-to-many, many-to-many
        3. Modeling data for keyword searches
      5. Connecting to MongoDB
        1. Connecting using Ruby
        2. Mongoid ODM
        3. Inheritance with Mongoid models
      6. Connecting using Python
        1. PyMODM ODM
        2. Inheritance with PyMODM models
      7. Connecting using PHP
        1. Doctrine ODM
        2. Inheritance with Doctrine
    2. Summary
  4. MongoDB CRUD Operations
    1. CRUD using the shell
      1. Scripting for the mongo shell
        1. Differences between scripting for the mongo shell and using it directly
        2. Batch inserts using the shell
        3. Batch operations using the mongo shell
      2. Administration
        1. fsync
        2. compact
        3. currentOp/killOp
        4. collMod
        5. touch
      3. MapReduce in the mongo shell
        1. MapReduce concurrency
        2. Incremental MapReduce
        3. Troubleshooting MapReduce
      4. Aggregation framework
        1. SQL to aggregation
        2. Aggregation versus MapReduce
      5. Securing the shell
        1. Authentication and authorization
        2. Authorization with MongoDB
        3. Security tips for MongoDB
          1. Encrypting communication using TLS/SSL
          2. Encrypting data
          3. Limiting network exposure
          4. Firewalls and VPNs
          5. Auditing
          6. Use secure configuration options
      6. Authentication with MongoDB
        1. Enterprise Edition
          1. Kerberos authentication
          2. LDAP authentication
    2. Summary
  5. Advanced Querying
    1. MongoDB CRUD operations
      1. CRUD using the Ruby driver
        1. Creating documents
        2. Read
        3. Chaining operations in find()
        4. Nested operations
        5. Update
        6. Delete
        7. Batch operations
      2. CRUD in Mongoid
        1. Read
        2. Scoping queries
        3. Create, update, and delete
      3. CRUD using the Python driver
        1. Create and delete
        2. Finding documents
        3. Updating documents
      4. CRUD using PyMODM
        1. Creating documents
        2. Updating documents
        3. Deleting documents
        4. Querying documents
      5. CRUD using the PHP driver
        1. Create and delete
        2. Bulk write
        3. Read
        4. Update
      6. CRUD using Doctrine
        1. Create, update, and delete
        2. Read
        3. Best practices
      7. Comparison operators
      8. Update operators
      9. Smart querying
        1. Using regular expressions
        2. Query results and cursors
        3. Storage considerations on delete
    2. Summary
  6. Aggregation
    1. Why aggregation?
    2. Aggregation operators
      1. Aggregation stage operators
      2. Expression operators
        1. Expression Boolean operators
        2. Expression comparison operators
        3. Set expression and array operators
        4. Expression date operators
        5. Expression string operators
        6. Expression arithmetic operators
        7. Aggregation accumulators
        8. Conditional expressions
        9. Other operators
          1. Text search
          2. Variable
          3. Literal
          4. Parsing data type
    3. Limitations
    4. Aggregation use case
    5. Summary
  7. Indexing
    1. Index internals
      1. Index types
        1. Single field indexes
          1. Indexing embedded fields
          2. Indexing embedded documents
          3. Background indexes
        2. Compound indexes
          1. Sorting using compound indexes
          2. Reusing compound indexes
        3. Multikey indexes
        4. Special types of index
          1. Text
          2. Hashed
          3. TTL
          4. Partial
          5. Sparse
          6. Unique
          7. Case-insensitive
          8. Geospatial
      2. Building and managing indexes
        1. Forcing index usage
          1. Hint and sparse indexes
          2. Building indexes on replica sets
        2. Managing indexes
          1. Naming indexes
          2. Special considerations
      3. Using indexes efficiently
        1. Measuring performance
          1. Improving performance
          2. Index intersection
    2. References
    3. Summary
  8. Monitoring, Backup, and Security
    1. Monitoring
      1. What should we monitor?
        1. Page faults
        2. Resident memory
        3. Virtual and mapped memory
        4. Working set
      2. Monitoring memory usage in WiredTiger
      3. Tracking page faults
      4. Tracking B-tree misses
        1. I/O wait
        2. Read and write queues
        3. Lock percentage
        4. Background flushes
        5. Tracking free space
        6. Monitoring replication
        7. Oplog size
      5. Working set calculations
      6. Monitoring tools
        1. Hosted tools
        2. Open source tools
    2. Backups
      1. Backup options
        1. Cloud-based solutions
        2. Backups with file system snapshots
        3. Taking a backup of a sharded cluster
        4. Backups using mongodump
        5. Backups by copying raw files
        6. Backups using queueing
      2. EC2 backup and restore
      3. Incremental backups
    3. Security
      1. Authentication
      2. Authorization
        1. User roles
        2. Database administration roles
        3. Cluster administration roles
        4. Backup restore roles
        5. Roles across all databases
          1. Superuser
      3. Network level security
      4. Auditing security
      5. Special cases
      6. Overview
    4. Summary
  9. Storage Engines
    1. Pluggable storage engines
      1. WiredTiger
        1. Document-level locking
        2. Snapshots and checkpoints
        3. Journaling
        4. Data compression
        5. Memory usage
        6. readConcern
        7. WiredTiger collection-level options
        8. WiredTiger performance strategies
        9. WiredTiger B-tree versus LSM indexes
      2. Encrypted
      3. In-memory
      4. MMAPv1
        1. MMAPv1 storage optimization
      5. Mixed usage
      6. Other storage engines
        1. RocksDB
        2. TokuMX
    2. Locking in MongoDB
      1. Lock reporting
      2. Lock yield
      3. Commonly used commands and locks
      4. Commands requiring a database lock
    3. References
    4. Summary
  10. Harnessing Big Data with MongoDB
    1. What is big data?
      1. Big data landscape
      2. Message queuing systems
        1. Apache ActiveMQ
        2. RabbitMQ
        3. Apache Kafka
      3. Data warehousing
        1. Apache Hadoop
        2. Apache Spark
        3. Spark comparison with Hadoop MapReduce
      4. MongoDB as a data warehouse
    2. Big data use case
      1. Kafka setup
      2. Hadoop setup
        1. Steps
      3. Hadoop to MongoDB pipeline
      4. Spark to MongoDB
    3. References
    4. Summary
  11. Replication
    1. Replication
      1. Logical or physical replication
      2. Different high availability types
    2. Architectural overview
    3. How do elections work?
    4. What is the use case for a replica set?
    5. Setting up a replica set
      1. Converting a standalone server to a replica set
      2. Creating a replica set
      3. Read preference
      4. Write concern
        1. Custom write concern
      5. Priority settings for replica set members
        1. Priority zero replica set members
        2. Hidden replica set members
        3. Delayed replica set members
      6. Production considerations
    6. Connecting to a replica set
    7. Replica set administration
      1. How to perform maintenance on replica sets
      2. Resyncing a member of a replica set
      3. Changing the oplog size
      4. Reconfiguring a replica set when we have lost the majority of our servers
      5. Chained replication
    8. Cloud options for a replica set
      1. mLab
      2. MongoDB Atlas
    9. Replica set limitations
    10. Summary
  12. Sharding
    1. Advantages of sharding
    2. Architectural overview
      1. Development, continuous deployment, and staging environments
      2. Planning ahead on sharding
    3. Sharding setup
      1. Choosing the shard key
        1. Changing the shard key
      2. Choosing the correct shard key
        1. Range-based sharding
        2. Hash-based sharding
        3. Coming up with our own key
        4. Location-based data
    4. Sharding administration and monitoring
      1. Balancing data – how to track and keep our data balanced
      2. Chunk administration
        1. Moving chunks
        2. Changing the default chunk size
        3. Jumbo chunks
        4. Merging chunks
        5. Adding and removing shards
      3. Sharding limitations
    5. Querying sharded data
      1. The query router
        1. Find
        2. Sort/limit/skip
        3. Update/remove
      2. Querying using Ruby
      3. Performance comparison with replica sets
    6. Sharding recovery
      1. Mongos
      2. Mongod process
      3. Config server
      4. A shard goes down
      5. The entire cluster goes down
    7. References
    8. Summary
  13. Fault Tolerance and High Availability
    1. Application design
      1. Schema-less doesn't mean schema design-less
      2. Read performance optimization
        1. Consolidating read querying
      3. Defensive coding
        1. Monitoring integrations
    2. Operations
    3. Security
      1. Enabling security by default
      2. Isolating our servers
      3. Checklists
    4. References
    5. Summary