Skip to content

Upgrading Monitoring Components

Upgrading monitoring components requires changes in a few different places, but is standard from release-to-release.

Links to releases:

Links to various exporter releases:

Monitoring components meta-monitor each other, but some care is needed to ensure we don’t have gaps in observability.

Most services expose a SERVICE_build_info that can be used to monitor the progress of the rollout. For example, prometheus_build_info.

Similarly, most services expose process_start_time_seconds.

It’s also worth checking the standard up metric.

The monitoring-overview dashboard has a lot of details about Thanos and Prometheus metrics.

Create an infrastructure issue if there isn’t one yet.

The issue should detail:

  • The components being upgraded.
  • Any breaking changes from the release notes.
  • Any significant features/improvements being rolled out.

Prepare upgrade MRs

Don’t forget to bump cookbook versions when submitting cookbook changes.

  • Merge Chef MRs to the relevent cookbook.
  • Wait for the cookbook publisher to post MRs to chef-repo
  • Merge non-prod chef-repo MR and wait for Chef to deploy.
  • Verify new versions are deployed.
  • Merge prod chef-repo MR and wait for Chef to deploy.
  • Verify new versions are deployed.
  • Merge Helmfile/Tanka MRs.
  • Verify new versions are deployed.
  • Verify services are operating and no alerts are firing.
  • Verify the service metrics are healthy.
  • Prepare and submit rollback MRs for Chef/Helmfiles/Tanka
  • Verify service returns to normal.