Create:Code Review Group Runbook
About us
Section titled “About us”- Handbook: Create:Code Review Group
- Slack channels:
Create:Code Review Group is responsible for all features about code review workflow (feature_category
is
code_review_workflow
) and GitLab CLI. List of features can be found in this
handbook page.
Services used
Section titled “Services used”- Web and API for serving rails and API requests
- Redis for caching
- Sidekiq for asynchronous jobs
- Postgres for database
- Object storage for storing external diffs
- Gitaly for getting data from git
Dashboards and links
Section titled “Dashboards and links”Grafana
Section titled “Grafana”- Code Review Group error budget detail - this can help see which component (rails, graphql, or sidekiq) owned by Create:Code Review is failing
- web: Rails Controller - for getting information about specific Rails controller/actions
- api: Rails Controller - for getting information about specific API endpoints
- sidekiq: Worker Detail - for getting information about specific Sidekiq workers
Kibana
Section titled “Kibana”These logs show all failed Rails requests and jobs. They can be filtered by:
- Specific action/endpoint by
json.meta.caller_id
- Specific job class by
json.class
- By correlation ID by
json.correlation_id
Sentry
Section titled “Sentry”Errors can be found in Sentry.
GitLab CLI changelog
Section titled “GitLab CLI changelog”Information about changes made on each GitLab CLI release can be found in project releases page.
Debugging
Section titled “Debugging”Here are some debugging steps for scenarios that we experienced before.
Delayed or no updates on merge request page
Section titled “Delayed or no updates on merge request page”Some updates like commits, diffs, and mergeability status that show on the merge request page rely on Sidekiq workers. If Sidekiq workers are taking time to get jobs performed from the queue or jobs are actually failing, they can result in outdated information.
The following workers are responsible for updating the said states:
UpdateMergeRequestsWorker
MergeRequestMergeabilityCheckWorker
MergeRequests::MergeabilityCheckBatchWorker
To check how these workers are performing, look at these Grafana dashboards:
UpdateMergeRequestsWorker
MergeRequestMergeabilityCheckWorker
andMergeRequests::MergeabilityCheckBatchWorker
In these dashboards, see if apdex is going down, error ratio and queue length are going up compared to normal levels. Look for sharp changes or sustained degradation rather than minor fluctuations.
If apdex is going down, it could be a sign that errors are up or the job is just too slow. If queue length is up, it could mean that Sidekiq workers can’t pick up jobs for some reason.
If jobs are too slow or queue length is up, see if it’s not a widespread issue. Please refer to Sidekiq runbook.
If errors are up, check Sentry for errors for those specific workers. Check the errors and determine whether they’re caused by another service failing or if it’s caused by a bug in application code. Here are links to filter errors on Sentry for those specific workers:
UpdateMergeRequestsWorker
MergeRequestMergeabilityCheckWorker
MergeRequests::MergeabilityCheckBatchWorker
When errors seem to be caused by another service failure, please refer to that service’s runbook. Otherwise, reach out to Create:Code Review engineers for assistance.
Web/API requests failing with HTTP 500
Section titled “Web/API requests failing with HTTP 500”Create:Code Review group owns a number of different rails controllers and endpoints and they can error out if there are issues in other services being used or a bug in application code.
Check Sentry
for errors for the reported action/endpoint. Filter by transaction
or by correlation ID
to focus on the specific failing action/endpoint.
If the error seems to be caused by another service failing, please refer to the runbook of that service. If it is looking like a bug in application code, reach out to Create:Code Review engineers for assistance.