Ceph OSDs not starting due condition not met #76

Closed
opened 2024-08-28 12:58:50 +02:00 by rarias · 2 comments
Owner

There is a problem enabling some OSDs:

Aug 28 11:46:57 lake2 systemd[1]: Ceph OSD daemon 5 was skipped because of an unmet condition check (ConditionPathExists=/var/lib/ceph/osd/ceph-5/keyring).

The ceph-volume service has only started OSD 4:

lake2% sudo systemctl status ceph-volume.service
○ ceph-volume.service - Ceph Volume activation
     Loaded: loaded (/etc/systemd/system/ceph-volume.service; enabled; preset: enabled)
     Active: inactive (dead) since Wed 2024-08-28 11:46:54 CEST; 1h 9min ago
   Duration: 1.505s
   Main PID: 1251 (code=exited, status=0/SUCCESS)
        CPU: 744ms

Aug 28 11:46:56 lake2 sh[1253]: Running command: /nix/store/zvb73z94sbh5js93yliqn18bzmzlbayk-ceph-18.2.1/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-b6b9047d-8696-4>
Aug 28 11:46:56 lake2 sh[1253]: Running command: /nix/store/dhv5gh89him9a7ddr56cqg87zfkmjihp-coreutils-9.5/bin/ln -snf /dev/ceph-b6b9047d-8696-43ca-bbd3-9be42aac3151/osd-block-8f3a21b8-e9b0>
Aug 28 11:46:56 lake2 sh[1253]: Running command: /nix/store/dhv5gh89him9a7ddr56cqg87zfkmjihp-coreutils-9.5/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-4/block
Aug 28 11:46:56 lake2 sh[1253]: Running command: /nix/store/dhv5gh89him9a7ddr56cqg87zfkmjihp-coreutils-9.5/bin/chown -R ceph:ceph /dev/dm-0
Aug 28 11:46:56 lake2 sh[1253]: Running command: /nix/store/dhv5gh89him9a7ddr56cqg87zfkmjihp-coreutils-9.5/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-4
Aug 28 11:46:56 lake2 sh[1253]: --> ceph-volume lvm activate successful for osd ID: 4
Aug 28 11:46:53 lake2 systemd[1]: Started Ceph Volume activation.
Aug 28 11:46:54 lake2 systemd[1]: ceph-volume.service: Deactivated successfully.
Aug 28 11:46:57 lake2 systemd[1]: /etc/systemd/system/ceph-volume.service:4: Unknown key name 'Type' in section 'Unit', ignoring.
Aug 28 11:46:57 lake2 systemd[1]: /etc/systemd/system/ceph-volume.service:13: Unit uses KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service

WTF there are disk missing:

lake2% ls /dev/nvme*
/dev/nvme0  /dev/nvme0n1
There is a problem enabling some OSDs: ``` Aug 28 11:46:57 lake2 systemd[1]: Ceph OSD daemon 5 was skipped because of an unmet condition check (ConditionPathExists=/var/lib/ceph/osd/ceph-5/keyring). ``` The ceph-volume service has only started OSD 4: ``` lake2% sudo systemctl status ceph-volume.service ○ ceph-volume.service - Ceph Volume activation Loaded: loaded (/etc/systemd/system/ceph-volume.service; enabled; preset: enabled) Active: inactive (dead) since Wed 2024-08-28 11:46:54 CEST; 1h 9min ago Duration: 1.505s Main PID: 1251 (code=exited, status=0/SUCCESS) CPU: 744ms Aug 28 11:46:56 lake2 sh[1253]: Running command: /nix/store/zvb73z94sbh5js93yliqn18bzmzlbayk-ceph-18.2.1/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-b6b9047d-8696-4> Aug 28 11:46:56 lake2 sh[1253]: Running command: /nix/store/dhv5gh89him9a7ddr56cqg87zfkmjihp-coreutils-9.5/bin/ln -snf /dev/ceph-b6b9047d-8696-43ca-bbd3-9be42aac3151/osd-block-8f3a21b8-e9b0> Aug 28 11:46:56 lake2 sh[1253]: Running command: /nix/store/dhv5gh89him9a7ddr56cqg87zfkmjihp-coreutils-9.5/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-4/block Aug 28 11:46:56 lake2 sh[1253]: Running command: /nix/store/dhv5gh89him9a7ddr56cqg87zfkmjihp-coreutils-9.5/bin/chown -R ceph:ceph /dev/dm-0 Aug 28 11:46:56 lake2 sh[1253]: Running command: /nix/store/dhv5gh89him9a7ddr56cqg87zfkmjihp-coreutils-9.5/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 Aug 28 11:46:56 lake2 sh[1253]: --> ceph-volume lvm activate successful for osd ID: 4 Aug 28 11:46:53 lake2 systemd[1]: Started Ceph Volume activation. Aug 28 11:46:54 lake2 systemd[1]: ceph-volume.service: Deactivated successfully. Aug 28 11:46:57 lake2 systemd[1]: /etc/systemd/system/ceph-volume.service:4: Unknown key name 'Type' in section 'Unit', ignoring. Aug 28 11:46:57 lake2 systemd[1]: /etc/systemd/system/ceph-volume.service:13: Unit uses KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service ``` WTF there are disk missing: ``` lake2% ls /dev/nvme* /dev/nvme0 /dev/nvme0n1
rarias added the
hw
label 2024-08-28 13:04:04 +02:00
Author
Owner

Let's see if rebooting makes the disk appear again.

Let's see if rebooting makes the disk appear again.
Author
Owner

Right...

lake2% ls /dev/nvme*
/dev/nvme0  /dev/nvme0n1  /dev/nvme1  /dev/nvme1n1  /dev/nvme2  /dev/nvme2n1  /dev/nvme3  /dev/nvme3n1

Now ceph is working fine. Closing.

Right... ``` lake2% ls /dev/nvme* /dev/nvme0 /dev/nvme0n1 /dev/nvme1 /dev/nvme1n1 /dev/nvme2 /dev/nvme2n1 /dev/nvme3 /dev/nvme3n1 ``` Now ceph is working fine. Closing.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: rarias/jungle#76
No description provided.