Skip to content

GET Monitoring Setup

This documentation outlines setting up the staging-ref environment to work with GitLab infrastructure monitoring.

  • A private cluster is prefered for setting up alertmanager.

GET sets up Prometheus and Grafana in a VM and the default GitLab Helm chart defaults which enable Prometheus and Grafana. They will not be used and can be disabled. You can view examples of how to do this via the following MRs:

global:
# Disable Grafana
grafana:
enabled: false
...
# Disable built-in Prometheus
prometheus:
install: false

Labels help organize metrics by service. Labels can be added via the GitLab helm chart.

  • Labels need to be added to the GitLab helm values:
global:
common:
labels:
stage: main
shard: default
tier: sv
  • Deployment labels need to be added. For an up-to date list check out gitlab_charts.yml.j2 in the staging-ref repository.

Prometheus an open-source monitoring and alerting tool used to monitor all services within GitLab infrastructure. You can read more about technical details the project here.

Prometheus-stack is a helm chart that bundles cluster monitoring with prometheus using the prometheus operator. We’ll be using this chart to deploy prometheus.

  • Deploy to the GET cluster under the prometheus namespace via helm. In staging-ref, this is managed by CI jobs that validate and configure any changes to the helm chart. You can view the setup of this chart in this directory.

Scrape targets are configured in the values.yaml file under the prometheus-stack directory. Scrape targets are applied relabeling to match what is used in staging and production.

  1. Kubernetes targets. Prometheus scrape targets can be found in additionalPodMonitors and additionalServiceMonitors in values.yaml.

  2. Omnibus targets. Prometheus scrape targets can be found under additionalScrapeConfigs in values.yaml.

Exporters are “exporting” existing metrics from their applications or services. These are used by prometheus to scrape metrics. A few of them are disabled by default and we’ll need to enable them in order to use them. Exporters that need to be enabled manually within the GitLab helm values are:

Alerting rules are configured in Prometheus and then it sends alerts to an Alertmanager. The Alertmanager then manages those alerts and sends notifications, such as to a slack channel. We will not be using the bundled Alertmanager in prometheus-stack. Instead we’ve configured the use of existing alertmanager cluster.

Note: If using a public cluster you will need to configure IP Masquerade Agent in your cluster. Example configuration.

  1. Configure Alertmanager
  1. Configure Dead Man’s Snitch for Alertmanager. Alertmanager should send notifications for the dead man’s switch to the configured notification provider. This ensures that communication between the Alertmanager and the notification provider is working. (example merge request)
  2. Configure routing to Slack channels (example merge request).
  • TBA.

Dashboards for staging-ref can be found in Grafana under the staging-ref folder. If additional dashboards need to be added they can be added through the runbooks or they can be added manually.

If added manually the dashboard uid needs to be added to the protected dashboards list to prevent automated deletion that happens every 24 hours.