Checkmk Services Dependencies across hosts
It would be great to have service dependencies across hosts for Checkmk, just like the host parent/child relationships.
In case that the parent service fails all the child services could go to stale and state that the parent service is down instead alerting all the depending Services.
In case of a downtime the parent service could inherit it's downtime on request to all the child services like it can be done with the host downtimes.
As an example:
If a web server is no longer running, the central active HTTP checks on the Checkmk server do not have to alarm.
Comments: 26
Oldest
•
Newest
•
Most likes
•
Fewest likes
-
23 Jun, '22
Ian BarryThis would be very useful. For example, today, when a network interface service on host A goes CRIT, the VPN service on host B also goes CRIT, and we get 2 alerts for 1 issue. A dependency between the services would solve this.
-
19 Jul, '22
Lars SörensenThe main task of the BI module is to aggregate the status of multiple services, not to distinguish between different causes.
To use Ian's example:
In case of a network fault, the network team must be alerted, and in case of a VPN fault, the VPN team must be alerted. But if the dependent network service is the cause, the VPN Service and all other dependent Services could become "Stale" and the VPN team must not be alerted.
Another advantage of dependencies:
If a downtime is defined for a parent service, it could optionally be applied to all child services as well, so that their service owners are not notified during this time. This is particularly useful in larger organizations with different responsibilities. -
07 Sep, '22
Max VoitI support this feature request.
As a usage example: NFS-mounts need not be marked critical when the NFS-server providing the respective shares is down. -
27 Oct, '22
Pascal WarneckeThis would be useful for us, too.
We are monitoring a lot of switchports (e.g. for Performancedata)
Example: If a connected ESXi will be rebooted and is on downtime, we don't want an alert for the network team, because the host is in downtime -
06 Dec, '22
Fabrice Le DorzeI'm coming from Nagios. This feature exists in Nagios 3 itself :
https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/dependencies.html
So I guess it's the case in CheckMk RAW in Nagios 3 engine. Am I right ? -
14 Feb, '23
DanielIt was there in the beginning but some people decided there is no need for it anymore ;-)
Maybe in the RAW idition, dont know - but guess most of the users here are not RAW Edition users, and there ist clearly missing -
15 Feb, '23
Sven Rubeni would like that feature too, cause applications do use services from different servers, so if one of the servers providing those services goes down, no need to disturb the application guys by a notification, there is nothing they can do … but, they need to be informed about the planned downtime to inform their customers
-
25 Mar, '23
Julião DuartennI also support this.
We have primary api services and distributed mirrors that call the upstream service.
If the api service misbehaves, all the mirrors will show an error.
We would like to suppress alerts on the mirrors in case the primary is down.
But we do want the mirrors to be monitored because they themselves may misbehave. -
18 Apr, '23
AlzThis functionality must be implemented.
Just to avoid multiple and unnecessary alert from child services. -
24 Apr, '23
JPHPlease let us work together to get this solution done.
This feature is very important! To get an idea and a real picture of the complexity of an business service.
From the user front end to each host, to host communication, application communication, etc. and their monitoring services, to the bare metal in the computer room.
Furthermore the known data of a company like changes, problems, incidents, CMDB and Discovery data, etc. in one view to get the best root cause analysis with Checkmk.
Please implement something innovative, dynamic, automatic, new and future proof which is comparable to other tools, like:
BMC, Instana, Nutanix, AppDynamics, New Relic, DataDog, Micro Focus, Dynatrace, Nexthing, eG Enterprise
Checkmk has so much data of each host and application, just use it and bring all the data in relation to a dynamic view.
Hence to get the possibility to drill down to the main problem - a perfect root cause analysis’s with Checkmk.
Just bring the good Checkmk a step further up. -
25 Apr, '23
Gerd Stolzyes :)
There are so many dependencies that checkmk could also detect automatically:
i.e. if the vCenter is down or its "Check_MK" Service is CRIT, no alerts for all VMs/ESXi missing piggyback data (same for all other hosts missing piggyback data, if the piggy delivering the data is not available)
Also maybe things like:
NTP Servers are down -> all NTP Time Services will alert and they really shouldn't
Or "Multipath" Services on phys. systems alerting when some of their paths are missing when a SAN Switch is offline.
(of course both examples only work if check_mk knows about the NTP Server + the SAN Switches, and some user involvement might be necessary, but a dependency that has to be defined manually is still better than no dependency at all) -
27 Jul, '23
Ivan LagoAbsolutely needed.
For example, if a DB goes down I do not need to receive dozens of notifications from dependent websites. -
19 Oct, '23
ChristianWell you could do this with BI and build nice stuff. But it seems that CMK does not really wants to put effort in it, there are a lot of thins broken or complicated there.
-
19 Oct, '23
SpexI strongly support this request.
Since our VMware guys introduced power management our vcenter shut's down esx hosts automaticaly.
As a result a lot of services get unknown or warning state. -
01 Dec, '23
Niklas Pulina AdminHello, thank you for your contributions to this idea.
We had an initial look at it and believe that it should be possible to implement it for single-site environments. In distributed setups however, the complexity rises massively. We will evaluate the technical feasibility, particularly with regard to the impact on performance.
@Christian: If you should encounter any technical issues in our software, please contact our support team or report it in our form (https://forum.checkmk.com/). Thank you!
Warm regards,
Your Checkmk team -
18 Apr, '24
Niklas Pulina AdminHi Nathan,
Not yet, sorry. We'll post an update here once we have any news in this regard.
Thank you for your patience.
Warm regards,
Your Checkmk team -
25 Jun, '24
Mohamed Saleh AdminDear Checkmk users,
We wanted to give you an update on this highly voted idea.
We understand that it helps with several use cases and solves a pain point shared by many of you. At the same time, we know that to implement this capability within the product would require a significant amount of analysis and development work in our architecture, particularly to support the use case in distributed setups.
Due to the complexity of implementing this capability, we will not be able to work on it within the 2.4 roadmap, where we have already planned the initiatives that will help us achieve our product strategy and business goals for this release.
Please follow the next comment -
25 Jun, '24
Mohamed Saleh AdminWe therefore recommend that you look at the Automated Downtimes package from our partner SVA - a “service dependencies light feature”, which provides the ability to automate the setting of downtime on hosts and services depending on the state of other hosts/services. You can find details and download the MKP here, and we also encourage you to watch SVA's talk about the functionality in their presentation at our Checkmk Conference #10 here.
Warm Regards,
Your Checkmk Team -
24 Nov, '24
MHi, as so many want that it would be definitely something to put on next roadmap. I can understand if there are other things you are also work on but it sounded like you not even want to put in future releases or on your to-do list. But maybe I understood it wrong.. (But I still cannot understand the priority of a mobile app - but ok)
It is not just about setting downtime but also to visualise things. It would also be cool if there are dependencies automatically discovered. For example maybe it could be possible to put a description on a switch port which got the FQDN of a host and which matches automatically with the host and puts a dependency only to that service and not the whole switch. Yes I know that it is not straight forward because then also the relation of another switch has to be checked and putting that all manually in BI would be too much. Just an idea. -
13 Jan, '25
LeonThought I would bump this to show my support for it. This is something we would make good use of and I'm sure it would apply to other organisations.
An example would be to have a parent child relationship between an ESXi host and it's associated services monitored on the vCenter. Presently we have an issue where we still see the vc services alert during host patching. While I have explained to the team how these can be suppressed, the time taken on a per host basis to do this is too much for such a regular and already time consuming task. It would be great to have the downtimes extend from the host to a defined set of services to prevent this and reduce the number of false alerts we see!
Fingers crossed for 2.5? -
14 Feb, '25
Alex FloreaFeature is on demand since 2022, 3 years later and its not even on the roadmap.
BIG disappointment tribe !!! -
05 Dec, '25
WayneAny update on this? My routers never go down, but the wan circuits do which causes everything at the site to alert.