Stuck inactive incomplete PGs in Ceph

If any PG is stuck due to OSD or node failure and becomes unhealthy, resulting in the cluster becoming inaccessible due to a blocked request for greater than 32 secs, try the following:

  1. Set noout to prevent data rebalancing:
        #ceph osd set noout
  1. Query the PG to see which are the probing OSDs:
        # ceph pg xx.x query
  1. Go to each probing OSD and delete the header folder here:
        var/lib/ceph/osd/ceph-X/current/xx.x_head/
  1. Restart all OSDs. 
  1. Run a PG query to see the PG does not exist. It should show something like a NOENT message.
  1. Force create a PG:
        # ceph pg force_pg_create x.xx
  1. Restart PG OSDs.
Warning !! Follow this only if all attempts to restore the placement group or PG have failed. This ...

Get Mastering Proxmox - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.