Skip to content

AiGatewayServiceRunwayIngressTrafficCessationRegional

  • This alert is designed to detect an abnormal cessation of traffic to the runway_ingress component of the ai-gateway service in any US region. The conditions ensure that the service was receiving a non-zero amount of traffic one hour ago and has since dropped to zero traffic over the last 30 minutes.

  • This could be caused by a change to the metrics used in the SLI, or by the service not receiving traffic.

  • In the alert AiGatewayServiceRunwayIngressTrafficCessationRegional, the metric used is gitlab_regional_sli_ops:rate_30m. This metric measures the rate of HTTP requests to the runway_ingress component of the ai-gateway service. It is measured in average number of HTTP requests per second over the last 5 minutes, averaged over 30 minutes. Link to metric catalogue
  • To silence the alert , please visit Alert Manger Dashboard
  • Till only recently this was a high volume alert in particular because non-us regions having zero traffic is not unlikely but now that we only track us regions this alert is expected to be rare
  • Historical trends of the alert firing here
  • This alert might create S3 incidents.
  • There might be some gitlab.com users impact
  • Review Incident Severity Handbook page to identify the required Severity Level

It might also be helpful to look out for recent changes made to the service or recent ongoing issues, a quick look at the dashboard to check if a recent deployed caused it.

If a recent deployment/change caused this issue consider rolling back , we can revert the MR, or re-run the previous deployment job

  • AI Gateway uses capacity planning provided by Runway for long-term forecasting of saturation resources. To view forecasts, refer to Tamland page.
  • Consider rolling back to a previous working version of the AI Gateway