Ceph operation, maintenance and repair

From techdocs
Jump to navigation Jump to search

Important links and documentation

How to see the status of the cluster

On any cluster node, you can run the ceph health, ceph health detail or ceph status commands to get an increasingly-detailed overview of the cluster's status.

Important: read the "Placement group states" page (linked above) for what status strings like "active", "backfilling", etc., mean.

ceph health

ceph health gives a very condensed status of the cluster. Ideally, it's output will look like this:

root@storage00:~# ceph health
HEALTH_OK
root@storage00:~#

Less ideally it'll display a WARN or ERROR status, which might look like this:

[root@vmfram1 ~]# ceph health
HEALTH_WARN Low space hindering backfill (add storage if this doesn't resolve itself): 2 pgs backfill_toofull
[root@vmfram1 ~]#

The above warning indicates that the cluster is not able to shuffle some objects around (backfilling) due to a lack of disk space. This state might be temporary while it is doing other backfilling, the result of which be some extra space available. More likely the "too full" state will persist and you need to either add storage (best) or make some architectural or OSD weight changes (less than ideal) to force space to become available.

ceph health detail

Below is a more detailed output from the ceph health from above.

[root@vmfram1 ~]# ceph health detail
HEALTH_WARN Low space hindering backfill (add storage if this doesn't resolve itself): 2 pgs backfill_toofull
PG_BACKFILL_FULL Low space hindering backfill (add storage if this doesn't resolve itself): 2 pgs backfill_toofull
    pg 9.5 is active+remapped+backfill_wait+backfill_toofull, acting [305,406,103]
    pg 9.e is active+remapped+backfill_wait+backfill_toofull, acting [306,406,104]
[root@vmfram1 ~]#

ceph status

root@storage00:~# ceph status
  cluster:
    id:     db5b6a5a-1080-46d2-974a-80fe8274c8ba
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum storage00,storage01,compute01 (age 12d)
    mgr: storage01(active, since 12d), standbys: storage00
    mds: vm:1 {0=storage01=up:active} 1 up:standby
    osd: 10 osds: 8 up (since 12d), 8 in (since 3M)
 
  data:
    pools:   4 pools, 448 pgs
    objects: 15.60k objects, 59 GiB
    usage:   177 GiB used, 14 TiB / 14 TiB avail
    pgs:     448 active+clean
 
  io:
    client:   341 B/s wr, 0 op/s rd, 0 op/s wr

root@storage00:~#

Error states

Unfound objects

my development Ceph cluster (vmfarm) got an unrecoverable unfound object error on the weekend. "ceph health detail" showed the error in pool 7 (the RBD block device pool used for my VM's) and "ceph osd pg repair" couldn't repair it but showed that the primary OSD for that group was OSD 205 (which is on vmfram2). Running "dmesg" on that server showed a physical sector error on that actual disk device.

The non-obvious fix for this is, a bit counter intuitive, is to turn off the OSD. This fails the OSD, and then Ceph starts rebuilding using the remaining good OSDs (on which the object is still found).

============================================================

Mon Jul 10 06:15:02 AEST 2023

HEALTH_ERR 1/1353953 objects unfound (0.000%); Possible data damage: 1 pg recovery_unfound; Degraded data redundancy: 3/4061859 objects degraded (0.000%), 1 pg degraded

OBJECT_UNFOUND 1/1353953 objects unfound (0.000%)

    pg 7.b has 1 unfound objects

PG_DAMAGED Possible data damage: 1 pg recovery_unfound

    pg 7.b is active+recovery_unfound+degraded+repair, acting [205,103,304], 1 unfound

PG_DEGRADED Degraded data redundancy: 3/4061859 objects degraded (0.000%), 1 pg degraded

    pg 7.b is active+recovery_unfound+degraded+repair, acting [205,103,304], 1 unfound

============================================================