Skip to content

Container Registry Database Load Balancing

The Container Registry supports database load balancing. This feature is implemented as described in the technical specification.

You can follow Container Registry: Database Load Balancing (DLB) (&8591) for more updates. The rollout plan being followed is detailed here.

AlertConditionDurationSeverity
ContainerRegistryDBHighReplicaPoolChurnRateDNS add/remove rate > 0.1/sec15ms3
ContainerRegistryDBHighReplicaConnectivityQuarantineRateConnectivity quarantine rate > 0.05/sec10ms3
ContainerRegistryDBHighReplicaLagQuarantineRateLag quarantine rate > 0.05/sec5ms3
ContainerRegistryDBReplicaPoolSizeInstabilityPool size stddev > 15ms3
ContainerRegistryDBReplicaPoolDegradedPool < 50% of 1-day avg5ms3
ContainerRegistryDBNoReplicasAvailablePool size == 02ms2
ContainerRegistryDBLoadBalancerReplicaPoolSizePool below minimum threshold5ms3/s4
PatroniRegistryServiceDnsLookupsApdexSLOViolationDNS lookup latency SLO violation-s3

The first six alerts monitor the replica connectivity tracking and quarantine mechanism introduced in MR !2596. The mechanism protects the load balancer from unstable replicas through:

  1. Consecutive Failure Detection: Quarantines a replica after 3 consecutive connectivity failures.
  2. Flapping Detection: Quarantines a replica after 5 add/remove events within a 60-second window.

Quarantined replicas are automatically reintegrated after a 5-minute cooldown period.

The list of log entries emitted by the registry is documented here.

To find all relevant log entries, you can filter logs by json.msg: "replica" or "replicas" or "LSN" (example).

The list of Prometheus metrics emitted by the registry is documented here.

There are graphs for all relevant metrics in the registry: Database Detail dashboard, under a dedicated Load Balancing row.