Mastering Ceph - Second Edition

Book description

Discover the unified, distributed storage system and improve the performance of applications

Key Features

  • Explore the latest features of Ceph's Mimic release
  • Get to grips with advanced disaster and recovery practices for your storage
  • Harness the power of Reliable Autonomic Distributed Object Store (RADOS) to help you optimize storage systems

Book Description

Ceph is an open source distributed storage system that is scalable to Exabyte deployments. This second edition of Mastering Ceph takes you a step closer to becoming an expert on Ceph.

You'll get started by understanding the design goals and planning steps that should be undertaken to ensure successful deployments. In the next sections, you'll be guided through setting up and deploying the Ceph cluster with the help of orchestration tools. This will allow you to witness Ceph's scalability, erasure coding (data protective) mechanism, and automated data backup features on multiple servers. You'll then discover more about the key areas of Ceph including BlueStore, erasure coding and cache tiering with the help of examples. Next, you'll also learn some of the ways to export Ceph into non-native environments and understand some of the pitfalls that you may encounter. The book features a section on tuning that will take you through the process of optimizing both Ceph and its supporting infrastructure. You'll also learn to develop applications, which use Librados and distributed computations with shared object classes. Toward the concluding chapters, you'll learn to troubleshoot issues and handle various scenarios where Ceph is not likely to recover on its own.

By the end of this book, you'll be able to master storage management with Ceph and generate solutions for managing your infrastructure.

What you will learn

  • Plan, design and deploy a Ceph cluster
  • Get well-versed with different features and storage methods
  • Carry out regular maintenance and daily operations with ease
  • Tune Ceph for improved ROI and performance
  • Recover Ceph from a range of issues
  • Upgrade clusters to BlueStore

Who this book is for

If you are a storage professional, system administrator, or cloud engineer looking for guidance on building powerful storage solutions for your cloud and on-premise infrastructure, this book is for you.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Mastering Ceph Second Edition
  3. About Packt
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the author
    2. About the reviewer
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Section 1: Planning And Deployment
  7. Planning for Ceph
    1. What is Ceph?
    2. How Ceph works
    3. Ceph use cases
      1. Specific use cases
        1. OpenStack or KVM based virtualization
        2. Large bulk block storage
        3. Object storage
        4. Object storage with custom applications
        5. Distributed filesystem – web farm
        6. Distributed filesystem – NAS or fileserver replacement
        7. Big data
    4. Infrastructure design
      1. SSDs
        1. Enterprise SSDs
          1. Enterprise – read-intensive
          2. Enterprise – general usage
          3. Enterprise – write-intensive
      2. Memory
      3. CPU
      4. Disks
      5. Networking
        1. 10 G requirement
        2. Network design
      6. OSD node sizes
        1. Failure domains
        2. Price
      7. Power supplies
    5. How to plan a successful Ceph implementation
      1. Understanding your requirements and how they relate to Ceph
      2. Defining goals so that you can gauge whether the project is a success
      3. Joining the Ceph community
      4. Choosing your hardware
      5. Training yourself and your team to use Ceph
      6. Running a PoC to determine whether Ceph has met the requirements
      7. Following best practices to deploy your cluster
      8. Defining a change management process
      9. Creating a backup and recovery plan
    6. Summary
    7. Questions
  8. Deploying Ceph with Containers
    1. Technical requirements
    2. Preparing your environment with Vagrant and VirtualBox
      1. How to install VirtualBox
      2. How to set up Vagrant
      3. Ceph-deploy
    3. Orchestration
    4. Ansible
      1. Installing Ansible
      2. Creating your inventory file
      3. Variables
      4. Testing
    5. A very simple playbook
    6. Adding the Ceph Ansible modules
      1. Deploying a test cluster with Ansible
    7. Change and configuration management
    8. Ceph in containers
      1. Containers
      2. Kubernetes
        1. Deploying a Ceph cluster with Rook
    9. Summary
    10. Questions
  9. BlueStore
    1. What is BlueStore?
    2. Why was it needed?
      1. Ceph's requirements
        1. Filestore limitations
      2. Why is BlueStore the solution?
    3. How BlueStore works
      1. RocksDB
      2. Compression
      3. Checksums
      4. BlueStore cache tuning
      5. Deferred writes
      6. BlueFS
      7. ceph-volume
    4. How to use BlueStore
      1. Strategies for upgrading an existing cluster to BlueStore
      2. Upgrading an OSD in your test cluster
    5. Summary
    6. Questions
  10. Ceph and Non-Native Protocols
    1. Block
    2. File
    3. Examples
      1. Exporting Ceph RBDs via iSCSI
      2. Exporting CephFS via Samba
      3. Exporting CephFS via NFS
    4. ESXi hypervisor
    5. Clustering
      1. Split brain
      2. Fencing
      3. Pacemaker and corosync
      4. Creating a highly available NFS share backed by CephFS
    6. Summary
    7. Questions
  11. Section 2: Operating and Tuning
  12. RADOS Pools and Client Access
    1. Pools
      1. Replicated pools
      2. Erasure code pools
        1. What is erasure coding?
        2. K+M
        3. How does erasure coding work in Ceph?
        4. Algorithms and profiles
          1. Jerasure
          2. ISA
          3. LRC
          4. SHEC
        5. Overwrite support in erasure-coded pools
        6. Creating an erasure-coded pool
        7. Troubleshooting the 2147483647 error
        8. Reproducing the problem
      3. Scrubbing
    2. Ceph storage types
      1. RBD
        1. Thin provisioning
        2. Snapshots and clones
        3. Object maps
        4. Exclusive locking
      2. CephFS
        1. MDSes and their states
        2. Creating a CephFS filesystem
        3. How is data stored in CephFS?
        4. File layouts
        5. Snapshots
        6. Multi-MDS
      3. RGW
        1. Deploying RGW
    3. Summary
    4. Questions
  13. Developing with Librados
    1. What is librados?
    2. How to use librados
    3. Example librados application
      1. Example of the librados application with atomic operations
      2. Example of the librados application that uses watchers and notifiers
    4. Summary
    5. Questions
  14. Distributed Computation with Ceph RADOS Classes
    1. Example applications and the benefits of using RADOS classes
    2. Writing a simple RADOS class in Lua
    3. Writing a RADOS class that simulates distributed computing
      1. Preparing the build environment
      2. RADOS classes
      3. Client librados applications
        1. Calculating MD5 on the client
        2. Calculating MD5 on the OSD via the RADOS class
      4. Testing
    4. RADOS class caveats
    5. Summary
    6. Questions
  15. Monitoring Ceph
    1. Why it is important to monitor Ceph
    2. What should be monitored
      1. Ceph health
      2. Operating system and hardware
      3. Smart stats
      4. Network
      5. Performance counters
    3. The Ceph dashboard
    4. PG states – the good, the bad, and the ugly
      1. The good ones
        1. The active state
        2. The clean state
        3. Scrubbing and deep scrubbing
      2. The bad ones
        1. The inconsistent state
        2. The backfilling, backfill_wait, recovering, and recovery_wait states
        3. The degraded state
        4. Remapped
        5. Peering
      3. The ugly ones
        1. The incomplete state
        2. The down state
        3. The backfill_toofull and recovery_toofull state
    5. Monitoring Ceph with collectd
      1. Graphite
      2. Grafana
      3. collectd
      4. Deploying collectd with Ansible
      5. Sample Graphite queries for Ceph
        1. Number of Up and In OSDs
        2. Showing the most deviant OSD usage
        3. Total number of IOPs across all OSDs
        4. Total MBps across all OSDs
        5. Cluster capacity and usage
        6. Average latency
      6. Custom Ceph collectd plugins
    6. Summary
    7. Questions
  16. Tuning Ceph
    1. Latency
      1. Client to Primary OSD
      2. Primary OSD to Replica OSD(s)
      3. Primary OSD to Client
    2. Benchmarking
      1. Benchmarking tools
      2. Network benchmarking
      3. Disk benchmarking
      4. RADOS benchmarking
      5. RBD benchmarking
    3. Recommended tunings
      1. CPU
      2. BlueStore
        1. WAL deferred writes
      3. Filestore
        1. VFS cache pressure
        2. WBThrottle and/or nr_requests
        3. Throttling filestore queues
          1. filestore_queue_low_threshhold
          2. filestore_queue_high_threshhold
          3. filestore_expected_throughput_ops
          4. filestore_queue_high_delay_multiple
          5. filestore_queue_max_delay_multiple
        4. Splitting PGs
      4. Scrubbing
      5. OP priorities
      6. The network
      7. General system tuning
      8. Kernel RBD
        1. Queue depth
        2. readahead
      9. Tuning CephFS
      10. RBDs and erasure-coded pools
      11. PG distributions
    4. Summary
    5. Questions
  17. Tiering with Ceph
    1. Tiering versus caching
      1. How Ceph's tiering functionality works
    2. What is a bloom filter?
    3. Tiering modes
      1. Writeback
      2. Forward
        1. Read-forward
      3. Proxy
        1. Read-proxy
    4. Uses cases
    5. Creating tiers in Ceph
    6. Tuning tiering
      1. Flushing and eviction
        1. Promotions
    7. Promotion throttling
      1. Monitoring parameters
      2. Alternative caching mechanisms
    8. Summary
    9. Questions
  18. Section 3: Troubleshooting and Recovery
  19. Troubleshooting
    1. Repairing inconsistent objects
    2. Full OSDs
    3. Ceph logging
    4. Slow performance
      1. Causes
        1. Increased client workload
        2. Down OSDs
        3. Recovery and backfilling
        4. Scrubbing
        5. Snaptrimming
        6. Hardware or driver issues
      2. Monitoring
        1. iostat
        2. htop
        3. atop
      3. Diagnostics
    5. Extremely slow performance or no IO
      1. Flapping OSDs
      2. Jumbo frames
      3. Failing disks
      4. Slow OSDs
      5. Out of capacity
    6. Investigating PGs in a down state
    7. Large monitor databases
    8. Summary
    9. Questions
  20. Disaster Recovery
    1. What is a disaster?
    2. Avoiding data loss
    3. What can cause an outage or data loss?
    4. RBD mirroring
      1. The journal
      2. The rbd-mirror daemon
      3. Configuring RBD mirroring
      4. Performing RBD failover
    5. RBD recovery
      1. Filestore
      2. BlueStore
      3. RBD assembly – filestore
      4. RBD assembly – BlueStore
      5. Confirmation of recovery
    6. RGW Multisite
    7. CephFS recovery
      1. Creating the disaster
      2. CephFS metadata recovery
    8. Lost objects and inactive PGs
    9. Recovering from a complete monitor failure
    10. Using the Ceph object-store tool
    11. Investigating asserts
      1. Example assert
    12. Summary
    13. Questions
  21. Assessments
    1. Chapter 1, Planning for Ceph
    2. Chapter 2, Deploying Ceph with Containers
    3. Chapter 3, BlueStore
    4. Chapter 4, Ceph and Non-Native Protocols
    5. Chapter 5, RADOS Pools and Client Access
    6. Chapter 6, Developing with Librados
    7. Chapter 7, Distributed Computation with Ceph RADOS Classes
    8. Chapter 8, Monitoring Ceph
    9. Chapter 9, Tuning Ceph
    10. Chapter 10, Tiering with Ceph
    11. Chapter 11, Troubleshooting
    12. Chapter 12, Disaster Recovery
  22. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Mastering Ceph - Second Edition
  • Author(s): Nick Fisk
  • Release date: March 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781789610703