NATS monitoring
Dashboard for NATS servers can be found here.
Available servers can also be verified by the following promQL query:
(count by(type, env) (nats_healthz_js_enabled_only_status_value{value="ok"}) == bool count by(type, env) (nats_healthz_js_enabled_only_status_value) ) == 1There is also a NATS SLI dashboard that covers the rate and errors metrics for requests to NATS servers. Slow consumers or redelivered messages to consumers are current indicators of errors here.
NATS monitoring docs are a good reference to see other available metrics from its system. We rely on NATS prom exporter to export this into our environment.