Code Suggestions
About Code Suggestions
Section titled “About Code Suggestions”Contact Information
Section titled “Contact Information”- Group: Create:Code Creation
- Handbook: Code Creation
- Slack: #g_code_creation
Core Functionality
Section titled “Core Functionality”GitLab Duo Code Suggestions provides two distinct AI-powered coding assistance functions:
- Code Completion:
- Powered by Vertex AI-hosted Codestral (17.11 and earlier) and Fireworks AI-hosted Codestral (18.0 and later)
- Response time: Satisfied < 1s, Tolerated < 10s
- Activated automatically while typing
- Code Generation:
- Powered by Anthropic Claude 3.7 Sonnet
- Response time: Can exceed 5 seconds for complex algorithms, satisfied < 30s
- Triggered by natural language comments followed by Enter or empty functions
- Supports streaming in JetBrains and Visual Studio IDEs
IDE Integration
Section titled “IDE Integration”Available in:
- VS Code (GitLab Workflow extension v6.2.2+)
- JetBrains IDEs (GitLab extension v3.6.5+)
- Visual Studio (GitLab extension v0.51.0+)
- Neovim (GitLab plugin v1.1.0+)
- GitLab Web IDE
Connectivity
Section titled “Connectivity”Primary Connection Method:
- Direct to AI Gateway: Almost all users (including most GitLab Self-Managed customers) connect directly from their IDE to the AI Gateway at cloud.gitlab.com
Alternative Connection Method:
- Through Self-Managed Instance: GitLab Self-Managed customers can optionally configure their installation to route Code Suggestions requests through their local GitLab Rails application instead of direct connections. This alternative method is configurable by the GitLab administrator but is less commonly used
Authentication Flow:
- Users authenticate using personal access tokens for secure API connections
- For Self-Managed users calling the AI Gateway directly, authentication follows the same pattern as SaaS users
- Detailed authentication and authorization flows are documented in the AI Gateway Architecture Design
Additional Resources:
- Complete connectivity diagrams and technical details: Code Creation Engineering Overview
Requirements
Section titled “Requirements”- Premium or Ultimate subscription with GitLab Duo Pro or Enterprise add-on
- Assigned seat in GitLab Duo subscription
- GitLab 17.2+ for optimal experience
- Personal access token for secure API connection
Usage Patterns
Section titled “Usage Patterns”- We typically see more usage Monday to Friday and less on the weekends.
- The traffic tends to be highest during traditional working hours in the different regions.
Documentation
Section titled “Documentation”- Code Suggestions Engineering Overview
- Interaction diagrams
- Dependencies
- Code Suggestion Documentation - GitLab Documentation
Service Level Indicators (SLIs)
Section titled “Service Level Indicators (SLIs)”Our monitoring is built around two key SLIs that align with our core functionality:
Code Completions SLI (server_code_completions
)
Section titled “Code Completions SLI (server_code_completions)”- Target: Response time < 1 second
- Tolerated: Response time < 10 seconds
- Failure: 5XX errors on
/v2/code/completions
or/v2/completions
endpoints - User Impact: When errors occur, users don’t see any completions in their editor. This fails silently, no error is presented. In practice, users can retry fetching the completions by continuing to write code.
- Models Used: Vertex AI-hosted or Fireworks AI-hosted Codestral
inference_vertex
- Tracks Vertex performanceinference_other
- Tracks Fireworks performance
Code Generation SLI (server_code_generations
)
Section titled “Code Generation SLI (server_code_generations)”- Target: Response time < 5 seconds
- Tolerated: Response time < 30 seconds
- Failure: 5XX errors on
/v2/code/generation
endpoint - User Impact: When errors occur, users don’t receive generated code from their comments. This fails silently, no error is presented. In practice, users can retry generating the code by pressing Enter again.
- Models Used: Anthropic Claude 3.7 Sonnet
inference_anthropic
- Tracks Anthropic performance
Note: The alerts and dashboards in the Initial Triage section below are organized by these SLIs
Initial Triage
Section titled “Initial Triage”Alerting
Section titled “Alerting”Code Suggestions alerts are surfaced through AI Gateway alerts. You can find more about that in the AI Gateway Runbook / Monitoring-Alerting Section.
The specific alerts for Code Suggestions are:
- AiGatewayServiceServerCodeCompletions…
- AiGatewayServiceServerCodeCompletionsApdexSLOViolation
- AiGatewayServiceServerCodeCompletionsApdexSLOViolationRegional
- AiGatewayServiceServerCodeCompletionsErrorSLOViolation
- AiGatewayServiceServerCodeCompletionsErrorSLViloationRegional
- AiGatewayServiceServerCodeGenerations…
- AiGatewayServiceServerCodeGenerationsApdexSLOViolation
- AiGatewayServiceServerCodeGenerationsApdexSLOViolationRegional
- AiGatewayServiceServerCodeGenerationsErrorSLOViolation
- AiGatewayServiceServerCodeGenerationsErrorSLViloationRegional
Alert: ApdexSLOViolations
Section titled “Alert: ApdexSLOViolations”This could be caused by an increase in latency or an increase in errors. The user impact will be slower response times when generating code suggestions.
Client Behavior During Slow Requests:
- Loading Indicator: Users will see a loading indicator in their IDE extension while waiting for suggestions
- If users wait without typing or navigating, slow suggestions will eventually appear in their editor once the request completes
- If users continue typing, moving the cursor, or navigating to different files while waiting, the delayed suggestion will be discarded when it finally returns (as it’s no longer contextually relevant)
- User Experience Impact: During SLO violations, users may experience:
- More frequent “loading” states in their editor
- Reduced suggestion frequency as they continue working while requests are pending
Alert: ErrorSLOViolation
Section titled “Alert: ErrorSLOViolation”This is caused by an increase in 5XX errors. When this happens the user will not see code suggestions appear in their IDE.
AI Gateway Apdex Error
Section titled “AI Gateway Apdex Error”Step 1: Determine which AI Service is affected
Go to the AI Gateway Dashboard and identify if the issue is related to code completions, code generation, or some other service.
If this is a more general AI Gateway problem, refer to the AI Gateway Runbook
Step 2: Investigate Code Completions Issues
If the dashboard indicates a code completions problem:
- Check SLI Apdex metrics
- Server_code_completions SLI Apdex: Grafana Link
- Apdex attribution for server_code_completion: Grafana Link
- Analyze Error Rates
- View Server_code_completions Errors: Grafana Link
- Investigate error details in Log dashboard or search in Elastic (data view = pubsub-mlops-inf-gprd-*)
- Review Latency Issues
- Check p95 server_code_completions Latency: Grafana Link
- Examine charts in Grafana (AI Gateway Overview) for
inference_vertex
orinference_other
(Fireworks) - Consider these factors:
- Is the issue isolated to one region or affecting all regions?
- Is it specific to a particular model?
- Is it related to a specific provider (vertex/other/anthropic)?
- Has there been an increase in requests (RPS)?
- Review the log dashboard
- Has there been an increase in prompt length? This (also called input tokens) can lead to slower response times
Step 3: Investigate Code Generation Issues
If the dashboard indicates a code generation problem:
- Check SLI Apdex metrics
- Server_code_generation SLI Apdex: Grafana Link
- Apdex attribution for server_code_generation: Grafana Link
- Analyze Error Rates
- View Server_code_generation Errors: Grafana Link
- Investigate error details in Log dashboard or search in Elastic (data view = pubsub-mlops-inf-gprd-*)
- Review Latency Issues
- Check p95 server_code_generation Latency: Grafana Link
- Examine charts in Grafana (AI Gateway Overview) for
inference_anthropic
- Consider these factors:
- Is the issue isolated to one region or affecting all regions?
- Is it specific to a particular model?
- Is it specific to a provider (vertex/other/anthropic)?
- Has there been an increase in requests (RPS)?
- Review the log dashboard
- Has there been an increase in prompt length? This (also called input tokens) can lead to slower response times
Common Resolution Steps
Section titled “Common Resolution Steps”High Error Rates
Section titled “High Error Rates”When experiencing high error rates, the most common cause is quota or rate limit issues with our LLM providers. Follow these steps to diagnose and resolve:
Step 1: Check Provider Quota Utilization
Section titled “Step 1: Check Provider Quota Utilization”Different providers have different methods for checking quota usage:
- In the saturation panel of the AI gateway service dashboard. This lists all quota as measured clientside.
- Anthropic: Check usage and rate limits at console.anthropic.com
- Vertex AI (Google Cloud): Check quota usage at console.cloud.google.com
- Fireworks: we are using dedicated deployments, so no quota limitations, but a deployment could get overwhelmed by too many requests. Fireworks provides some eyes into this in their console.
Step 2: Correlate Quota Issues with Error Patterns
Section titled “Step 2: Correlate Quota Issues with Error Patterns”After checking quotas, correlate findings with error logs:
- Review the Log dashboard for HTTP 429 (rate limit) or 403 (quota exceeded) errors
- Look for error patterns that align with the provider experiencing quota issues
- Check if errors are concentrated during peak usage hours
Step 3: Escalation and Resolution
Section titled “Step 3: Escalation and Resolution”If quota/rate limit issues are confirmed:
- Immediate: Document the affected provider, quota type, and current utilization percentage
- Contact Provider: Reach out through the appropriate channel:
- Google Cloud/Vertex: #ext-google-cloud slack channel
- Anthropic: #ext-anthropic slack channel
- Fireworks: #ext-gitlab-fireworks slack channel (internal access required)
- Include Details: When contacting providers, include:
- Current quota utilization percentage
- Time range when issues began
- Expected traffic patterns requiring higher limits
- Business impact summary
Step 4: Monitor Resolution
Section titled “Step 4: Monitor Resolution”- Continue monitoring the AI Gateway Dashboard error rates
- Verify quota increases take effect by re-checking provider consoles
- Confirm error rates return to normal baseline levels
Latency Issues
Section titled “Latency Issues”- An increase in traffic can lead to latency issues. This could be caused by saturation of the LLM which takes longer to respond.
- If there is an increase in tokens sent, then the requests could take longer.
- Check out the AI Gateway Scalability Runbook
Provider-Specific Problems
Section titled “Provider-Specific Problems”- If there are problems with a specific provider we will need to work directly with them to resolve the problem. Here are some ways to reach out in slack:
- #ext-google-cloud
- #ext-anthropic
- #ext-gitlab-fireworks (not currently public)
Prolonged Provider Outages - Model Failover
Section titled “Prolonged Provider Outages - Model Failover”When a provider experiences extended outages or degraded performance that cannot be quickly resolved, Code Suggestions has a failover system to switch traffic to alternative model providers using feature flags.
When to Consider Failover:
- Provider outage expected to last more than 30 minutes
- Sustained high error rates (>10%) from a specific provider
- Severe latency issues affecting user experience across a provider
- Provider communication indicates extended maintenance windows
Failover Process:
- Failover procedures require coordination with on-call engineers and must follow established protocols
- Complete failover documentation and procedures: Code Suggestion Failover Runbook
- Feature flag changes require appropriate approvals and should be coordinated through incident management procedures
Important Notes:
- Failover is a significant operational change that affects all users
- Always document the business justification and expected duration before initiating failover
- Monitor closely after failover to ensure the alternative provider can handle the traffic load
- Plan for failback once the primary provider issues are resolved
Dashboards
Section titled “Dashboards”Logging
Section titled “Logging”Be sure the datasource (data view) is “pubsub-mlops-inf-gprd-*”
Look for the json.jsonPayload.path that looks like “/v2/code/completions” - could be different versions or variations on the path like “/v3/code/completions” or “/v4/code/suggestions”
Code Suggestions Overview Dashboard
- For both code completion and code generation
- Request rates
- Error counts by error code
- User counts
- Latency
- Prompt lengths
- Latency for code completion (not code generation)
- Broken down by region, provider, model name
There are specific filtered versions as well:
- Fireworks: https://log.gprd.gitlab.net/app/r/s/8igQR
- Vertex: https://log.gprd.gitlab.net/app/r/s/VciTn
- Codestral: https://log.gprd.gitlab.net/app/r/s/7MASa
- Codestral on Fireworks: https://log.gprd.gitlab.net/app/r/s/bi6fU
- Codestral in europe-west-2: https://log.gprd.gitlab.net/app/r/s/yER2Q
- Codestral in us-east-4: https://log.gprd.gitlab.net/app/r/s/5y4Dt
- Qwen: https://log.gprd.gitlab.net/app/r/s/PbBOj
Grafana Dashboards
Section titled “Grafana Dashboards”Since all the Code Suggestions traffic flows through the AI Gateway, this dashboard is the best place to look. It has information about other services too (like Duo Chat).
SLI Details:inference *
Details on the various model providers:
- Fireworks can be found in
inference_other
. We currently use this for code completion with thetext-completion-fireworks_ai/codestral-2501
ortext-completion-fireworks_ai/qwen2p5-coder-7b
models - Vertex/GCP can be found in
inference_vertex
. We currently usevertex_ai/codestral-2501
for code completions - Anthropic can be found in
inference_anthropic
. We use Claude for code generation, but so do other Duo Features.
SLI Details: server_code_completions
or SLI Details: server_code_generations
- Details on latency, requests per second (RPS), and errors
- Overall breakdown, per API endpoint, per Region
Code Suggestions Error Budget Details or Code Suggestions Group Dashboard
These have much less valuable information than the AI Gateway Overview dashboard
Tableau
Section titled “Tableau”This is a good source of information for historic data, but is not updated in real time. Most of these charts can be filtered by model, provider, deployment type (SaaS, SM, etc), and more.
Sentry
Section titled “Sentry”Limited alerting data can be found in Sentry