DuoWorkflowSvcServiceServerApdexSLOViolation
Overview
Section titled “Overview”- This alert fires when the apdex (latency) of GRPC requests to the Duo Workflow Service server component exceeds the SLO threshold.
- The alert indicates that the service is experiencing higher-than-acceptable latency, which impacts user experience.
- Apdex measures the proportion of requests that complete within acceptable time thresholds (satisfied: < 3s, tolerated: < 5s).
- Possible user impacts
- Users will see delayed responses from Duo Agent Platform.
Services
Section titled “Services”- Duo Workflow Service overview
- Team that owns the service: Agent Foundations
Metrics
Section titled “Metrics”- The metric used is
gitlab_component_apdex:confidence:ratio_1handgitlab_component_apdex:confidence:ratio_6hfor theservercomponent ofduo-workflow-svc. - This metric measures the apdex score (0-1 scale, where 1.0 is perfect).
- Satisfied threshold: < 3 seconds time to the first response
- Tolerated threshold: < 5 seconds time to the first response
- The SLO threshold is 95% apdex, meaning the alert fires when apdex drops below this threshold.
- Link to metric catalogue
Alert Behavior
Section titled “Alert Behavior”- To silence the alert, please visit Alert Manager Dashboard
- This alert is expected to be rare under normal conditions. High frequency indicates performance degradation.
Severities
Section titled “Severities”- This alert creates S2 incidents (High severity, pages on-call).
- All gitlab.com, self-managed and dedicated customers (other than those using self-hosted DAP) using Duo Workflow features are potentially impacted.
- Review Incident Severity Handbook page to identify the required Severity Level.
Verification
Section titled “Verification”- Prometheus link to query that triggered the alert
- Duo Workflow Service Overview Dashboard
- See Latency graphs under “SLI Detail: server” section in the Duo Workflow Service Overview Dashboard for further information.
Recent changes
Section titled “Recent changes”Troubleshooting
Section titled “Troubleshooting”-
Check duo workflow service logs:
-
Check for recent changes:
- Review recent changes mentioned under Recent changes section.
- Check if a recent deployment affected server latency.
- If a recent change caused the issue, consider rolling back.
Possible Resolutions
Section titled “Possible Resolutions”- N.A. We don’t have historical data on this alert’s resolutions.
Dependencies
Section titled “Dependencies”- AI Gateway / Duo Workflow Service
Escalation
Section titled “Escalation”- For investigation and resolution assistance, reach out to
#g_agent_foundationson Slack.