GitLab Runbooks
Overview
Section titled “Overview”This folder is containing a documentation directory for each service as defined in the service catalog. Documentation not belonging to a specific service can be found in the directory uncategorized.
For each service, there is an autogenerated <service-name>/README.md
. You can
add additional content to it outside of the section marked for autogeneration.
Please do not add directories at the top level which are not matching a service name in the service catalog! Put them under uncategorized/ instead.
Suggested Runbook Structure
Section titled “Suggested Runbook Structure”Runbooks should be the main source of truth for maintaining a service. This means we need to document all necessary information about a service here. On the other hand, runbooks should be a concise list of actions without distracting with too many information. Therefore, the following structure for each service directory is suggested - in accordance with the structure suggested by the service readiness review template.
docs/ <service-name>/ README.md # partly autogenerated and structured as defined below <runbook-1>.md <runbook-2>.md
README.md
Section titled “README.md”The README.md
for each service should give background information helpful
for understanding and operating it. It should contain the following sections, as
required by the service readiness review
template (see there for details):
Runbooks
Section titled “Runbooks”Runbooks are instructions for the execution of a manual task. Several runbooks can be contained in one document or each runbook can be in it’s own file. The main points to consider are readability, if the runbook is easy to find (naming of file or runbook header) and that the runbook can be linked to from an alert.
General principles: runbooks should be
- as short and concise as possible
- avoid duplication of information - link to README.md for general information
- complete enough to be executed without further research
- service runbook structured using the service overview template, to make it easy to navigate
- alert playbook structured using the alert playbook template, to make it easy to navigate
Suggested Runbook Layout
Section titled “Suggested Runbook Layout”Runbooks are most often used to mitigate issues / react to alerts. Therefore runbooks should have a “Troubleshooting” section simply structured by describing symptom, cause (use the alertname if possible) and solution and a “Maintenance” section for other general maintenance tasks.
## Symptom: CPU Saturation Alert
You got a CPU saturation alert...
### Cause A: User is doing too many request
A user is making too many requests.
Identifyable by:
* List of Alerts* List of metrics* List of log searches* ...
#### Solution
Block the user:
* steps* to* block* the user
### Cause B: DDOS Attack
Identifyable by:
* List of alerts* List of metrics* List of log searches* ...
#### Solution
<Cloudflare DDOS runbook link>
## Execute a failover
[...]
## Restore from backup
[...]