Secret Detection Partner Token Verification Troubleshooting
Overview
Section titled “Overview”This runbook covers troubleshooting for the Secret Detection partner token verification system.
Metrics Dashboard
Section titled “Metrics Dashboard”Secret Detection Partner Token Verification Dashboard
Common Alerts
Section titled “Common Alerts”SecretDetectionPartnerAPIHighErrorRate
Section titled “SecretDetectionPartnerAPIHighErrorRate”Severity: S3
High error rate (>10%) when verifying tokens with partner APIs.
Investigation Steps
Section titled “Investigation Steps”-
Check the dashboard to identify which partner is failing
-
Review error breakdown by
error_type:network_error: Connectivity issuesrate_limit: Rate limit exceededresponse_error: Invalid/unparseable responses
-
Check recent deployments or configuration changes
-
Review partner-specific status pages, example:
Resolution
Section titled “Resolution”-
Temporary (< 1 hour): If partner has known incident, wait for recovery
-
Disable partner (1–24 hours): Edit
ee/lib/security/secret_detection/partner_tokens/registry.rb'AWS' => {client_class: ::Security::SecretDetection::PartnerTokens::AwsClient,rate_limit_key: :partner_aws_api,enabled: false # ← Set to false}
SecretDetectionPartnerAPIHighLatency
Section titled “SecretDetectionPartnerAPIHighLatency”Severity: S3
P95 latency exceeds 5 seconds for partner API calls.
Investigation Steps
Section titled “Investigation Steps”- Check if it’s systemic or partner-specific in dashboard
- Review partner status pages (may show degraded performance)
- Look for regional issues (AWS/GCP might have region-specific problems)
Resolution
Section titled “Resolution”-
Temporary (< 6 hours): If P95 < 10s, monitor — partners are slow but functional
-
Increase timeout (6–24 hours): Edit base_client.rb
DEFAULT_TIMEOUT = 10.seconds # Was 5.seconds -
Disable partner (> 24 hours): Set
enabled: falsein Registry (see above) -
Post-incident: File issue to investigate why partner is consistently slow
SecretDetectionPartnerAPIRateLimitHit
Section titled “SecretDetectionPartnerAPIRateLimitHit”Severity: S4
Rate limits are being hit (>0.1 req/s sustained for 5 minutes).
Investigation Steps
Section titled “Investigation Steps”-
Check dashboard for which partner is hitting limits
-
Verify current rate limit settings in application_rate_limiter.rb
partner_aws_api: { threshold: -> { 400 }, interval: 1.second }partner_gcp_api: { threshold: -> { 500 }, interval: 1.second }partner_postman_api: { threshold: -> { 4 }, interval: 1.second } -
Check Sidekiq queue depth:
Sidekiq::Queue.new('security_secret_detection_partner_token_verification').sizeusing teleport. -
Look for burst traffic patterns (large pipeline, multiple projects)
Resolution
Section titled “Resolution”-
Normal operation: Some rate limiting is expected with Postman (4 req/s). If < 100/hour, no action needed
-
High traffic burst: Queue will self-regulate with exponential backoff. Monitor queue depth:
- If queue < 10k jobs: Normal, will clear in ~1 hour
- If queue > 50k jobs: Consider temporarily disabling partner
-
Persistent issue: Partner may have changed rate limits. Check their API docs and update
application_rate_limiter.rb -
Last resort: Disable partner, process queue, re-enable with lower rate limits
SecretDetectionPartnerAPINetworkErrors
Section titled “SecretDetectionPartnerAPINetworkErrors”Severity: S3
Network connectivity issues to partner APIs (>0.5 errors/sec).
Investigation Steps
Section titled “Investigation Steps”-
Check dashboard to identify affected partner(s)
-
Verify GitLab.com can reach partner APIs from console using teleport:
# Run in Rails consoleuri = URI('https://sts.amazonaws.com')Net::HTTP.get_response(uri) -
Check for firewall/networking changes in #infrastructure
-
Look for DNS issues:
dig sts.amazonaws.comfrom GitLab runners -
Review recent SSL certificate renewals
Resolution
Section titled “Resolution”-
Single partner affected: Likely partner-side issue. Disable partner (see above), monitor partner status page
-
Multiple partners affected: Likely GitLab network issue
- Check with SRE team in #production
- Review recent network changes
- Verify egress rules haven’t changed
-
SSL/TLS errors: Check certificate validity, may need to update CA bundle
-
Temporary workaround: Disable affected partners until network issue resolved
Manual Verification
Section titled “Manual Verification”If you need to manually verify a specific token in development or using using teleport:
# Rails consolefinding = Vulnerabilities::Finding.find(FINDING_ID)token_type = finding.identifiers.find { |i| i['external_type'] == 'gitleaks_rule_id' }&.dig('external_id')
# Get partner name from token_typepartner_config = Security::SecretDetection::PartnerTokens::Registry.partner_for(token_type)
# Verify tokenclient = partner_config[:client_class].newresult = client.verify_token(finding.metadata['raw_source_code_extract'])
puts "Valid: #{result.valid}, Metadata: #{result.metadata}"Escalation
Section titled “Escalation”- Team: Secret Detection (@gitlab-org/secure/secret-detection)
- Slack: #g_ast-secret-detection