This uses the internal up() function provided by Prometheus, which indicates whether or not the most recent scrape attempt of a given job was successful or not.
It is expected that the return of the query will always be an empty result. Note that we filter the pgbouncer scrape job in the query, due to this being incorrectly configured via the Mimir prometheus-agent ScrapeConfig.
This alert is intended to fire regardless of a host’s power state. Because of this, a silence should be created in Alertmanager prior to powering off any instances to avoid unwanted alerts.
This alert will fire if any single scrape target on a host is failing to be scraped. You can determine the specific scrape jobs by removing the min() aggregator from the prometheus query. Example
If this fires and you aren’t intentionally powering down a VM, always assume this is a high severity alert.
When this fires we either have a node that has failed in a way that could directly impact our customers, or we become blind to future issues as future metrics collection will not be working.