Ceph fails to enable OSD monitoring #83

Closed
opened 2025-01-17 13:18:58 +01:00 by rarias · 6 comments
Owner

We are lacking throughput an latency metrics. I suspect is due to the failed python import:

Jan 17 13:11:01 bay ceph-mgr[1492]: 2025-01-17T13:11:01.610+0100 7f39aeeb4a00  1 mgr[py] Loading python module 'telegraf'
Jan 17 13:11:01 bay ceph-mgr[1492]: 2025-01-17T13:11:01.856+0100 7f39aeeb4a00 -1 mgr[py] Module telegraf has missing NOTIFY_TYPES member
Jan 17 13:11:01 bay ceph-mgr[1492]: 2025-01-17T13:11:01.856+0100 7f39aeeb4a00  1 mgr[py] Loading python module 'diskprediction_local'
Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.201+0100 7f39aeeb4a00 -1 mgr[py] Module not found: 'diskprediction_local'
Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.202+0100 7f39aeeb4a00 -1 mgr[py] Traceback (most recent call last):
Jan 17 13:11:02 bay ceph-mgr[1492]:   File "/nix/store/8vs1vinssy472ymcm4h09641cjc9allp-ceph-19.2.0-lib/lib/ceph/mgr/diskprediction_local/__init__.py", line 2, in <module>
Jan 17 13:11:02 bay ceph-mgr[1492]:     from .module import Module
Jan 17 13:11:02 bay ceph-mgr[1492]:   File "/nix/store/8vs1vinssy472ymcm4h09641cjc9allp-ceph-19.2.0-lib/lib/ceph/mgr/diskprediction_local/module.py", line 16, in <module>
Jan 17 13:11:02 bay ceph-mgr[1492]:     import scipy  # noqa: ignore=F401
Jan 17 13:11:02 bay ceph-mgr[1492]:     ^^^^^^^^^^^^
Jan 17 13:11:02 bay ceph-mgr[1492]:   File "/nix/store/3cknpi0h2mjhv0hfk46bgkz6z3fj1dn7-python3-3.11.11-env/lib/python3.11/site-packages/scipy/__init__.py", line 47, in <module>
Jan 17 13:11:02 bay ceph-mgr[1492]:     from numpy import __version__ as __numpy_version__
Jan 17 13:11:02 bay ceph-mgr[1492]:   File "/nix/store/3cknpi0h2mjhv0hfk46bgkz6z3fj1dn7-python3-3.11.11-env/lib/python3.11/site-packages/numpy/__init__.py", line 157, in <module>
Jan 17 13:11:02 bay ceph-mgr[1492]:     from . import random
Jan 17 13:11:02 bay ceph-mgr[1492]:   File "/nix/store/3cknpi0h2mjhv0hfk46bgkz6z3fj1dn7-python3-3.11.11-env/lib/python3.11/site-packages/numpy/random/__init__.py", line 180, in <module>
Jan 17 13:11:02 bay ceph-mgr[1492]:     from . import _pickle
Jan 17 13:11:02 bay ceph-mgr[1492]:   File "/nix/store/3cknpi0h2mjhv0hfk46bgkz6z3fj1dn7-python3-3.11.11-env/lib/python3.11/site-packages/numpy/random/_pickle.py", line 1, in <module>
Jan 17 13:11:02 bay ceph-mgr[1492]:     from .mtrand import RandomState
Jan 17 13:11:02 bay ceph-mgr[1492]: ImportError: Interpreter change detected - this module can only be loaded into one interpreter per process.
Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.204+0100 7f39aeeb4a00 -1 mgr[py] Class not found in module 'diskprediction_local'
Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.204+0100 7f39aeeb4a00 -1 mgr[py] Error loading module 'diskprediction_local': (2) No such file or directory
Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.204+0100 7f39aeeb4a00  1 mgr[py] Loading python module 'stats'
Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.448+0100 7f39aeeb4a00  1 mgr[py] Loading python module 'rbd_support'

Which probably is a dependency of the osd statistics.

We are lacking throughput an latency metrics. I suspect is due to the failed python import: ``` Jan 17 13:11:01 bay ceph-mgr[1492]: 2025-01-17T13:11:01.610+0100 7f39aeeb4a00 1 mgr[py] Loading python module 'telegraf' Jan 17 13:11:01 bay ceph-mgr[1492]: 2025-01-17T13:11:01.856+0100 7f39aeeb4a00 -1 mgr[py] Module telegraf has missing NOTIFY_TYPES member Jan 17 13:11:01 bay ceph-mgr[1492]: 2025-01-17T13:11:01.856+0100 7f39aeeb4a00 1 mgr[py] Loading python module 'diskprediction_local' Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.201+0100 7f39aeeb4a00 -1 mgr[py] Module not found: 'diskprediction_local' Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.202+0100 7f39aeeb4a00 -1 mgr[py] Traceback (most recent call last): Jan 17 13:11:02 bay ceph-mgr[1492]: File "/nix/store/8vs1vinssy472ymcm4h09641cjc9allp-ceph-19.2.0-lib/lib/ceph/mgr/diskprediction_local/__init__.py", line 2, in <module> Jan 17 13:11:02 bay ceph-mgr[1492]: from .module import Module Jan 17 13:11:02 bay ceph-mgr[1492]: File "/nix/store/8vs1vinssy472ymcm4h09641cjc9allp-ceph-19.2.0-lib/lib/ceph/mgr/diskprediction_local/module.py", line 16, in <module> Jan 17 13:11:02 bay ceph-mgr[1492]: import scipy # noqa: ignore=F401 Jan 17 13:11:02 bay ceph-mgr[1492]: ^^^^^^^^^^^^ Jan 17 13:11:02 bay ceph-mgr[1492]: File "/nix/store/3cknpi0h2mjhv0hfk46bgkz6z3fj1dn7-python3-3.11.11-env/lib/python3.11/site-packages/scipy/__init__.py", line 47, in <module> Jan 17 13:11:02 bay ceph-mgr[1492]: from numpy import __version__ as __numpy_version__ Jan 17 13:11:02 bay ceph-mgr[1492]: File "/nix/store/3cknpi0h2mjhv0hfk46bgkz6z3fj1dn7-python3-3.11.11-env/lib/python3.11/site-packages/numpy/__init__.py", line 157, in <module> Jan 17 13:11:02 bay ceph-mgr[1492]: from . import random Jan 17 13:11:02 bay ceph-mgr[1492]: File "/nix/store/3cknpi0h2mjhv0hfk46bgkz6z3fj1dn7-python3-3.11.11-env/lib/python3.11/site-packages/numpy/random/__init__.py", line 180, in <module> Jan 17 13:11:02 bay ceph-mgr[1492]: from . import _pickle Jan 17 13:11:02 bay ceph-mgr[1492]: File "/nix/store/3cknpi0h2mjhv0hfk46bgkz6z3fj1dn7-python3-3.11.11-env/lib/python3.11/site-packages/numpy/random/_pickle.py", line 1, in <module> Jan 17 13:11:02 bay ceph-mgr[1492]: from .mtrand import RandomState Jan 17 13:11:02 bay ceph-mgr[1492]: ImportError: Interpreter change detected - this module can only be loaded into one interpreter per process. Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.204+0100 7f39aeeb4a00 -1 mgr[py] Class not found in module 'diskprediction_local' Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.204+0100 7f39aeeb4a00 -1 mgr[py] Error loading module 'diskprediction_local': (2) No such file or directory Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.204+0100 7f39aeeb4a00 1 mgr[py] Loading python module 'stats' Jan 17 13:11:02 bay ceph-mgr[1492]: 2025-01-17T13:11:02.448+0100 7f39aeeb4a00 1 mgr[py] Loading python module 'rbd_support' ``` Which probably is a dependency of the osd statistics.
rarias added the
ceph
label 2025-01-17 13:18:58 +01:00
Author
Owner

We are importing:

Jan 17 13:10:46 bay ceph-mgr[1492]: 2025-01-17T13:10:46.074+0100 7fc8678dca00  1 mgr[py] Loading python module 'dashboard'

Which has:

bay% rg wsgi /nix/store/8vs1vinssy472ymcm4h09641cjc9allp-ceph-19.2.0-lib/lib/ceph/mgr/
/nix/store/8vs1vinssy472ymcm4h09641cjc9allp-ceph-19.2.0-lib/lib/ceph/mgr/dashboard/cherrypy_backports.py
46:        from cherrypy.wsgiserver.wsgiserver2 import CP_fileobject, HTTPConnection
72:# cherrypy.wsgiserver was extracted wsgiserver into cheroot in cherrypy v9.0.0
75:        from cherrypy.wsgiserver.ssl_builtin import BuiltinSSLAdapter as builtin_ssl

See: https://modwsgi.readthedocs.io/en/develop/user-guides/application-issues.html#multiple-python-sub-interpreters

We are importing: ``` Jan 17 13:10:46 bay ceph-mgr[1492]: 2025-01-17T13:10:46.074+0100 7fc8678dca00 1 mgr[py] Loading python module 'dashboard' ``` Which has: ``` bay% rg wsgi /nix/store/8vs1vinssy472ymcm4h09641cjc9allp-ceph-19.2.0-lib/lib/ceph/mgr/ /nix/store/8vs1vinssy472ymcm4h09641cjc9allp-ceph-19.2.0-lib/lib/ceph/mgr/dashboard/cherrypy_backports.py 46: from cherrypy.wsgiserver.wsgiserver2 import CP_fileobject, HTTPConnection 72:# cherrypy.wsgiserver was extracted wsgiserver into cheroot in cherrypy v9.0.0 75: from cherrypy.wsgiserver.ssl_builtin import BuiltinSSLAdapter as builtin_ssl ``` See: https://modwsgi.readthedocs.io/en/develop/user-guides/application-issues.html#multiple-python-sub-interpreters
Author
Owner

From: https://github.com/ricardoasmarques/ceph-dev-docker/issues/34

Another combination which does not trigger the issue:

WITH_MGR_DASHBOARD_FRONTEND=OFF

Also: https://bugs.archlinux.org/task/68726

From: https://github.com/ricardoasmarques/ceph-dev-docker/issues/34 > Another combination which does not trigger the issue: > > WITH_MGR_DASHBOARD_FRONTEND=OFF Also: https://bugs.archlinux.org/task/68726
Author
Owner

Gentoo killed the function that checks the single interpreter:

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=84f2bdae597f394425b96329fc8305d9b729f782

Gentoo killed the function that checks the single interpreter: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=84f2bdae597f394425b96329fc8305d9b729f782
Author
Owner
FFS: https://github.com/ceph/ceph/blob/4f27a0ff0ed093366b4a6d5b70d8b036a82f161c/src/mgr/PyModule.cc#L294 https://github.com/pyca/cryptography/issues/12080#issuecomment-2510243984
Author
Owner

The above problem is unrelated to the OSD statistics. Here is the actual problem: 5a058b8fb3

Fixed with:

sudo ceph config set mgr mgr/prometheus/exclude_perf_counters false
The above problem is unrelated to the OSD statistics. Here is the actual problem: https://github.com/ceph/ceph/commit/5a058b8fb376541f1313f25e96a11606c7fef1d4 Fixed with: ``` sudo ceph config set mgr mgr/prometheus/exclude_perf_counters false ```
Author
Owner

Finally working:

image

Finally working: ![image](/attachments/32dd8a82-8e9b-4e88-a133-5322e64e0ae3)
234 KiB
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: rarias/jungle#83
No description provided.