Mimir Onboarding

Mimir is a multi-tenanted system.

This helps us to create soft boundaries by tenant and introduces a few key benefits:

Improved visibility into metric ownership
Query boundaries and isolated workloads via shuffle sharding
Per tenant limits and cardinality management
Reduced failure domains in the event of a query of death

Endpoints

Region	Endpoint	Internal Endpoint
us-east1	mimir.ops-gitlab-gke.us-east1.gitlab-ops.gke.gitlab.net	mimir-internal.ops-gitlab-gke.us-east1.gitlab-ops.gke.gitlab.net

Data Retention

By default Mimir keeps all data for 1 year. This is configurable per tenant through tenant limits configuration.

Historical Data

Historical data from Thanos is still available by the Thanos datasource in Grafana Thanos - Historical Data. Due to the limited use cases we have availalbe currently, and the risks around important the data in to Mimir we have opted to just keep this accessible through Thanos for now.

At a later date we will offline this completely by the data will still be able to be onlined in a need arises.

Creating a tenant

Tenants are provisioned through config-mgmt.

Check the README for creating a new tenant.

This helps us to centralise tenants across future observability backends, as well as provide a way to review limit increases or changes.

Checking Tenant Limits

The Mimir - Tenants dashboard will show you tenant specific data, as well as active series and how close you are to limits.

Additionally, there is a Mimir - Overrides dashboard which shows all of the configured default limits and any overrides applied to a given tenant.

Bumping tenant limits is also done through config-mgmt and you can see an example here as well as an example MR.

A full list of tenant overrides is documented here.

The primary limits tenants will face are:

ingestion_rate - Allowed samples per second
max_global_series_per_user - Maximum in-memory series allowed in an ingester
max_label_names_per_series - Maximum label names per sent series

Accessing metrics through the Mimir API

Mimir provides an HTTP endpoint which exposes a Prometheus query API.

In order to access the API programmatically, the following steps are necessary:

Specify a tenant scope through X-Scope-OrgID header. For example use X-ScopeOrgID: gitlab-gprd for production metrics.
Use HTTP basic auth to authenticate, see vault k8s/shared/observability/tenants/runbooks for secrets.
The API is exposed through https://mimir-internal.ops.gke.gitlab.net/prometheus, which is also available through port-forwarding.

For team members without cluster-level network access, consider using below socks-proxy based solution:

ssh -D "18202" "lb-bastion.gstg.gitlab.com" 'echo "Connected! Press Enter to disconnect."; read disconnect' >&2

Full example with authentication and using above proxy:

username=$(vault kv get -field=username -mount="k8s" "shared/observability/tenants/runbooks")
password=$(vault kv get -field=password -mount="k8s" "shared/observability/tenants/runbooks")

curl \
  -x socks5://localhost:18202 \
  --user ${username}:${password} \
  -H "X-Scope-OrgID: gitlab-gstg" \
  https://mimir-internal.ops.gke.gitlab.net/prometheus/api/v1/query\?query\=up

Please exercise caution when specifying more than one tenant in X-Scope-OrgID: tenant1|tenant2|.... This drastically increases the data scope for a given query and hence the load on the system.

Sending Metrics To Mimir

After you have set up the tenant (or use an existing), you can setup your prometheus client to remote-write metrics Prometheus configuration is done via the remote_write config.

The following example uses the prometheus-operator in kubernetes:

remoteWrite:
  - url: <replace_with_mimir_endpoint>
    name: mimir
    basicAuth:
      username:
        name: remote-write-auth
        key: username
      password:
        name: remote-write-auth
        key: password

Unfortunately prometheus doesn’t support ENV var substitution in the config file, however if using via prometheus-operator it does support a Kubernetes secret reference. In the above example we point the auth to a secret named remote-write-auth and the corresponding object keys for both username and password.

Here is an example config

Note that the current usage of htpasswd/basicAuth will be replaced in a future iteration.

For the url setting see the endpoints list.

Exploring Metrics

Unlike Thanos, Mimir does not have a query UI. Instead it relies on Grafana as its UI for querying.

Within Grafana you can use the Explore UI to run queries.

Select the explore menu item from grafana:

explore-ui

Ensure you have selected the correct datasource for your tenant:

explore-ui-datasource-selector

Query away.

For more information on using the explore UI, you can reference the Grafana official docs.

VM Metrics Ingestion

Mimir uses a different method to scrape virtual machines than Thanos did. It no longer depends on chef relabelling and instead scrapes VMs based on two GCP labels: gitlab_com_service and gitlab_com_type. The service labels is used to scrape for specific services (as an example, all postgres running hosts have the service label postgres). The type label is passed through to the metrics and will be the type label on the metrics themselves.

These labels should be set in config-mgmt as part of the Terraform configuration. An example MR that set these is https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/7714

These labels are set either in variables or in the terraform configuration directly.

The label scrape configs are managed via scrapeConfig CRD objects as part of the GitLab Helmfiles and are not yet self-service. To add a new service, please open an issue in the Scalability issue tracker.