Secrets Manager GKE (OpenBao) Service
- Service Overview
- Alerts: https://alerts.gitlab.net/#/alerts?filter=%7Btype%3D%22secrets-manager-gke%22%2C%20tier%3D%22sv%22%7D
- Label: gitlab-com/gl-infra/production~“Service::RunwayOpenBaoGKE”
Logging
Section titled “Logging”Audit Logging
Section titled “Audit Logging”We suggest the following filters to focus on relevant project audit logs in Grafana or GCP Logs Explorer:
resource.type="k8s_container"resource.labels.namespace_name="secrets-manager-gke"resource.labels.container_name="app"jsonPayload.type="response"OpenBao emits audit events as structured JSON on the app container’s stdout/stderr; Cloud Logging surfaces them under jsonPayload.
Filters
Section titled “Filters”jsonPayload.request.namespace.path="org_<org_id>/group_<root_namespace_id>/<obj_type>_<obj_id>/"can be used to filter audit logs to a particular project or group.org_idis the organization ID.root_namespace_idis the ID of the top-level group.obj_typeisgrouporproject.obj_idis the ID of the group or project where the secrets manager lives.- Example:
jsonPayload.request.namespace.path="org_1/group_2377064/project_74977306/".
jsonPayload.request.path =~ "secrets/kv/data/explicit/.*"can be used to filter to just secret value read operations.- An explicit secret name can also be given with
jsonPayload.request.path = "secrets/kv/data/explicit/<SECRET-NAME>". - This is best used in conjunction with the above.
- An explicit secret name can also be given with
jsonPayload.auth.display_name=~"pipeline_jwt"selects runner-initiated requests;jsonPayload.auth.display_name=~"gitlab_rails_jwt"selects Rails-initiated requests.
Service Logging
Section titled “Service Logging”We suggest the following filters to focus on service logs (non-audit) in Grafana or GCP Logs Explorer:
resource.type="k8s_container"resource.labels.namespace_name="secrets-manager-gke"resource.labels.container_name="app"-jsonPayload.request.remote_address:*OpenBao writes all output (including audit events) to stderr on GKE, so GCP marks every log entry with ERROR severity regardless of the [INFO]/[WARN] level inside the message body. Treat the level inside the message as authoritative.
Openbao Caller Logs
Section titled “Openbao Caller Logs”When debugging a Secrets Manager incident, it is useful to check the caller side to see what was sent to OpenBao
Rails web
Section titled “Rails web”Create, update and delete operations Secrets and associated Permissions, use Kibana — data view pubsub-rails-inf-gprd-*:
json.controller : ("Projects::SecretsController" or "Groups::SecretsController") or json.meta.caller_id : graphql\:*Secret* or json.path : "/api/v4/internal/secrets_manager/audit_logs"Three OR clauses cover the user-facing surfaces: HTML UI controllers, GraphQL mutations and resolvers, and the OpenBao→Rails audit callback Grape endpoint. The filter excludes other code declaring feature_category :secrets_management (CI Secure Files, CI job-token logging) which are unrelated to OpenBao Secrets Manager.
Sidekiq
Section titled “Sidekiq”Provisioning, deprovisioning, and rotation reminder workers — all under the SecretsManagement::* namespace. Use Kibana — data view pubsub-sidekiq-inf-gprd-*:
json.class : "SecretsManagement::*"Runner
Section titled “Runner”gitlab.com Shared Runners. Use Kibana — data view pubsub-runner-inf-gprd:
json.msg : ("resolving secrets" or "reading from Vault" or "creating vault client" or "inline auth JWT")Narrow with json.job : <job_id> or json.runner : <runner_id> once the affected job or runner is identified. json.correlation_id is often empty on runner-side Vault errors (the Vault SDK emits them outside any request context) — cross-trace to Rails/Sidekiq via json.job + timestamp instead.
Summary
Section titled “Summary”GitLab Secrets Manager is a built-in secrets management solution for CI pipelines. Secrets are created and managed using GitLab UI, and consumed by CI jobs.
GitLab Secrets Manager relies on the secrets-manager-gke Runway service.
The service is configured and deployed using the
secrets-manager-runway project.
secrets-manager-gke runs OpenBao, which is a fork of HashiCorp Vault.
The source code of OpenBao lives in
openbao-internal,
a build project that is intended to modify the upstream OpenBao releases.
Architecture
Section titled “Architecture”The Rails backend and runners connect to the secrets-manager-gke service (running OpenBao)
through the CloudFlare WAF and the Runway-managed GKE Gateway.
Both Rails and runners use the same external URL (https://secrets.gitlab.com); there is no separate internal Runway URL on GKE.
OpenBao stores data on the Cloud SQL instance provided by Runway, and gets the unseal key from Google KMS via GCP Workload Identity (no Vault secret is needed for KMS auth on GKE).
OpenBao is configured with two audit devices that fan out every audit event in parallel:
filedevice writing JSON to theappcontainer’s stdout (surfaced in Cloud Logging — see the Audit Logging section)httpdevice POSTing the same events to the Rails backend athttps://gitlab.com/api/v4/internal/secrets_manager/audit_logs
The GitLab Secrets Manager design docs provides request flow diagrams.
flowchart TB
CloudFlare(CloudFlare: secrets.gitlab.com)
KMS[GCP KMS]
PostgreSQL[GCP CloudSQL from Runway]
Gateway[Runway GKE Gateway]
Rails-- Manage OpenBao -->CloudFlare
Runner-- Fetch Pipeline Secrets -->CloudFlare
CloudFlare-->Gateway
Gateway-->OpenBao
OpenBao-- Decrypt Unseal Key -->KMS
OpenBao-- Storage -->PostgreSQL
The service runs multiple OpenBao pods:
- a single active pod
- one or more standby pods
Pods connect to the PostgreSQL backend to store data and to acquire a lock.
On GKE, pods coordinate directly via cluster port 8201 (pod-to-pod, no LB involvement).
flowchart TD
Ingress
Service_OB([HTTP API])
subgraph OpenBao
OB_1[Primary]
OB_2[Standby A]
OB_3[Standby B]
Service_Primary([Primary gRPC])
end
Ingress --> Service_OB
Service_OB --> OB_1
Service_OB --> OB_2
Service_OB --> OB_3
OB_2 -. forward .-> Service_Primary
OB_3 -. forward .-> Service_Primary
Service_Primary --> OB_1
OB_1 -->Service_DB
OB_1 -. lock maintenance .->Service_DB
OB_2 -. lock monitor .->Service_DB
OB_3 -. lock monitor .->Service_DB
Service_DB([PostgreSQL]) --> DB[(PostgreSQL)]
OB_1 -- auto-unseal --> KMS
OB_2 -- auto-unseal --> KMS
OB_3 -- auto-unseal --> KMS
Performance
Section titled “Performance”Benchmarking and sizing recommendations are covered by gitlab#589411.
Scalability
Section titled “Scalability”The service is deployed on Runway GKE.
Replicas are fixed at min_instances: 2 / max_instances: 2 — no autoscaling.
Two pods provide HA: one active and one standby, coordinating leadership via the PostgreSQL lock.
Scalability is configured in default-values.yaml.
Availability
Section titled “Availability”GitLab Secrets Manager is limited to the Premium and Ultimate tiers. The feature needs to be enabled in a group or project.
The service is currently deployed in a single region: us-east1 (both staging and production).
Per-environment Runway configuration lives in gke-service-staging.yaml and gke-service-production.yaml.
Durability
Section titled “Durability”Runway provisions and manages the Cloud SQL instance backing OpenBao. On Runway GKE, backups are always on for the Cloud SQL instance.
Runway performs backup and backup restore validation as configured for the secrets-manager-gke service. See the Runway restore validation documentation for details.
Backup procedure:
- Back up the Cloud SQL PostgreSQL database (
runway-db-secrets-manager-gke). - Back up the unseal key material stored on Google Cloud KMS. See runbooks for our internal Vault service, which similarly relies on Google Cloud KMS.
For restore, we suggest the following steps:
- Scale OpenBao down to zero pods.
- Perform the PostgreSQL restore.
- Scale OpenBao back up.
Security/Compliance
Section titled “Security/Compliance”The Cloud SQL PostgreSQL database only contains encrypted data, and the unseal key is stored on Google KMS.
On Runway GKE, KMS authentication uses GCP Workload Identity tied to the pod’s Kubernetes service account — there is no long-lived credential or Vault secret for KMS access.
Monitoring/Alerting
Section titled “Monitoring/Alerting”The service comes with built-in Runway observability:
- secrets-manager-gke dashboard
runway-db-secrets-manager-gkerunbook — dashboard, alerts, and logs for the Cloud SQL instance
Metrics
Section titled “Metrics”The service comes with built-in Runway metrics. Additionally, the OpenBao container exposes its own metrics.
OpenBao metrics for this service use the secrets_manager_gke prefix.
Note: SLIs and alerts for secrets-manager-gke are currently driven by Runway load-balancer metrics only (see metrics-catalog/services/secrets-manager-gke.jsonnet).
The secrets_manager_gke_* metrics are emitted by the OpenBao container and can be queried directly in Mimir, but they are not bound to any SLI for this service.
See OpenBao telemetry docs for the full list. The table below lists the metrics most relevant for operating the service.
| Metric | Description |
|---|---|
secrets_manager_gke_audit_log_request_failure | Number of audit log request failures |
secrets_manager_gke_audit_device_log_response_failure | Number of audit log response failures |
secrets_manager_gke_barrier_delete | Time taken to delete an entry from the barrier |
secrets_manager_gke_barrier_get | Time taken to get an entry from the barrier |
secrets_manager_gke_barrier_list | Time taken to list entries in the barrier |
secrets_manager_gke_barrier_put | Time taken to put an entry in the barrier |
secrets_manager_gke_cache_delete | Number of delete operations on the cache |
secrets_manager_gke_cache_hit | Number of cache hits |
secrets_manager_gke_cache_miss | Number of cache misses |
secrets_manager_gke_cache_write | Number of cache writes |
secrets_manager_gke_core_active | Whether the node is active (1) or standby (0) |
secrets_manager_gke_core_unsealed | Whether the node is unsealed (1) or sealed (0) |
secrets_manager_gke_core_leadership_lost | Number of times leadership was lost |
secrets_manager_gke_core_leadership_setup_failed | Number of times leadership setup failed |
secrets_manager_gke_core_in_flight_requests | Number of concurrent requests currently being processed |
secrets_manager_gke_rollback_inflight | Number of rollback operations currently in flight |
secrets_manager_gke_postgres_delete | Time taken to delete an entry from the PostgreSQL storage backend |
secrets_manager_gke_postgres_get | Time taken to get an entry from the PostgreSQL storage backend |
secrets_manager_gke_postgres_list | Time taken to list entries in the PostgreSQL storage backend |
secrets_manager_gke_postgres_put | Time taken to put an entry in the PostgreSQL storage backend |
secrets_manager_gke_runtime_alloc_bytes | Number of bytes allocated by the OpenBao process |
Notes:
- Barrier and PostgreSQL metrics are
summarymetrics, exposing_count,_sum, and quantile series (0.5, 0.9, 0.99). - PostgreSQL metrics are named
postgres(notpostgresql) in the telemetry output, despite the documentation listing them aspostgresql. - OpenBao is configured to exclude high-cardinality metrics.
Excluded metrics:
usage_gauge_periodis set to0to exclude the following metrics:token.counttoken.count.by_policytoken.count.by_authtoken.count.by_ttlexpire.leases.by_expirationsecret.kv.countidentity.entity.countidentity.entity.alias.count
prefix_filteris set to exclude the following metrics:audit.*— excluded except foraudit.log_request_failure,audit.log_request,audit.log_response_failure, andaudit.log_responserollback.attempt.*— per-mount rollback countersroute.*— per-route request timers