Native SUSE Virtualization Node Monitoring Profile for Checkmk
When monitoring SUSE Virtualization nodes via SSH using
a shell-based Checkmk agent script, service discovery
produces an unmanageable number of irrelevant services:
ephemeral container overlay mounts, KubeVirt internal
devices, transient filesystems, and similar artifacts
inherent to the platform. Some of these default to
CRITICAL, requiring significant manual effort to reach
a clean and meaningful monitoring baseline.
This problem is compounded by the fact that SUSE
Virtualization runs on SLE Micro, an immutable operating
system using transactional updates and snapshot-based
rollbacks. Any manual configuration applied to the node
can be lost after an OS update, forcing administrators
to repeat the cleanup process from scratch.
The result is a lack of stable monitoring baselines,
persistent false alerts, and ongoing operational overhead
with every update cycle.
Proposed solution: an official SUSE Virtualization Node
Monitoring Profile shipped with Checkmk, defining:
- Which mountpoints and devices to exclude by default
on SLE Micro based nodes
- Meaningful default thresholds appropriate for
SUSE Virtualization workloads
- A curated service discovery ruleset that produces a
clean baseline on any fresh node
Developing this profile requires collaboration with SUSE
to define what a healthy SUSE Virtualization node looks
like from a monitoring perspective.
We have established relationships on the SUSE side
and are confident the right people can be brought
to the table should Checkmk be interested in
pursuing this.
Who benefits: any organization running Checkmk on
SUSE Virtualization nodes.