Checkmk Services Dependencies across hosts

374 votes

It would be great to have service dependencies across hosts for Checkmk, just like the host parent/child relationships.

In case that the parent service fails all the child services could go to stale and state that the parent service is down instead alerting all the depending Services.

In case of a downtime the parent service could inherit it's downtime on request to all the child services like it can be done with the host downtimes.

As an example:
If a web server is no longer running, the central active HTTP checks on the Checkmk server do not have to alarm.

Under consideration Checks&Agents Notifications Setup Suggested by: Lars Sörensen (06 May, '22) • Upvoted: 2 days ago • Comments: 19

Comments: 19

18 Jun, '22
Overlord
I absolutely recommend this feature request
23 Jun, '22
Ian Barry
This would be very useful. For example, today, when a network interface service on host A goes CRIT, the VPN service on host B also goes CRIT, and we get 2 alerts for 1 issue. A dependency between the services would solve this.
13 Jul, '22
Paulo Adriano
It's possible to do that relationship using Business Intelligence Module.
1
19 Jul, '22
Lars Sörensen
The main task of the BI module is to aggregate the status of multiple services, not to distinguish between different causes.

To use Ian's example:
In case of a network fault, the network team must be alerted, and in case of a VPN fault, the VPN team must be alerted. But if the dependent network service is the cause, the VPN Service and all other dependent Services could become "Stale" and the VPN team must not be alerted.

Another advantage of dependencies:
If a downtime is defined for a parent service, it could optionally be applied to all child services as well, so that their service owners are not notified during this time. This is particularly useful in larger organizations with different responsibilities.
1
07 Sep, '22
Max Voit
I support this feature request.

As a usage example: NFS-mounts need not be marked critical when the NFS-server providing the respective shares is down.
27 Oct, '22
Pascal Warnecke
This would be useful for us, too.

We are monitoring a lot of switchports (e.g. for Performancedata)
Example: If a connected ESXi will be rebooted and is on downtime, we don't want an alert for the network team, because the host is in downtime
06 Dec, '22
Fabrice Le Dorze
I'm coming from Nagios. This feature exists in Nagios 3 itself :
https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/dependencies.html
So I guess it's the case in CheckMk RAW in Nagios 3 engine. Am I right ?
14 Feb, '23
Daniel
It was there in the beginning but some people decided there is no need for it anymore ;-)
Maybe in the RAW idition, dont know - but guess most of the users here are not RAW Edition users, and there ist clearly missing
15 Feb, '23
Sven Ruben
i would like that feature too, cause applications do use services from different servers, so if one of the servers providing those services goes down, no need to disturb the application guys by a notification, there is nothing they can do … but, they need to be informed about the planned downtime to inform their customers
25 Mar, '23
Julião Duartenn
I also support this.
We have primary api services and distributed mirrors that call the upstream service.
If the api service misbehaves, all the mirrors will show an error.
We would like to suppress alerts on the mirrors in case the primary is down.
But we do want the mirrors to be monitored because they themselves may misbehave.
18 Apr, '23
Alz
This functionality must be implemented.
Just to avoid multiple and unnecessary alert from child services.
Click to enlarge

24 Apr, '23
JPH
Please let us work together to get this solution done.
This feature is very important! To get an idea and a real picture of the complexity of an business service.
From the user front end to each host, to host communication, application communication, etc. and their monitoring services, to the bare metal in the computer room.
Furthermore the known data of a company like changes, problems, incidents, CMDB and Discovery data, etc. in one view to get the best root cause analysis with Checkmk.
Please implement something innovative, dynamic, automatic, new and future proof which is comparable to other tools, like:
BMC, Instana, Nutanix, AppDynamics, New Relic, DataDog, Micro Focus, Dynatrace, Nexthing, eG Enterprise
Checkmk has so much data of each host and application, just use it and bring all the data in relation to a dynamic view.
Hence to get the possibility to drill down to the main problem - a perfect root cause analysis’s with Checkmk.
Just bring the good Checkmk a step further up.
25 Apr, '23
Gerd Stolz
yes :)

There are so many dependencies that checkmk could also detect automatically:
i.e. if the vCenter is down or its "Check_MK" Service is CRIT, no alerts for all VMs/ESXi missing piggyback data (same for all other hosts missing piggyback data, if the piggy delivering the data is not available)

Also maybe things like:
NTP Servers are down -> all NTP Time Services will alert and they really shouldn't
Or "Multipath" Services on phys. systems alerting when some of their paths are missing when a SAN Switch is offline.
(of course both examples only work if check_mk knows about the NTP Server + the SAN Switches, and some user involvement might be necessary, but a dependency that has to be defined manually is still better than no dependency at all)
27 Jul, '23
Ivan Lago
Absolutely needed.
For example, if a DB goes down I do not need to receive dozens of notifications from dependent websites.
19 Oct, '23
Christian
Well you could do this with BI and build nice stuff. But it seems that CMK does not really wants to put effort in it, there are a lot of thins broken or complicated there.
1
19 Oct, '23
Spex
I strongly support this request.

Since our VMware guys introduced power management our vcenter shut's down esx hosts automaticaly.
As a result a lot of services get unknown or warning state.
01 Dec, '23
Niklas Pulina Admin
Hello, thank you for your contributions to this idea.

We had an initial look at it and believe that it should be possible to implement it for single-site environments. In distributed setups however, the complexity rises massively. We will evaluate the technical feasibility, particularly with regard to the impact on performance.

@Christian: If you should encounter any technical issues in our software, please contact our support team or report it in our form (https://forum.checkmk.com/). Thank you!

Warm regards,
Your Checkmk team
17 Apr
Nathan
Has there been any new updates for this???
18 Apr
Niklas Pulina Admin
Hi Nathan,

Not yet, sorry. We'll post an update here once we have any news in this regard.

Thank you for your patience.

Warm regards,
Your Checkmk team