security_incidents
Responding to SIRT Incidents on Dedicated Hosted Runners (DHR)
Section titled “Responding to SIRT Incidents on Dedicated Hosted Runners (DHR)”It is very important in a security incident to follow the Security Incident Response Guide and prioritize what is written there and the instructions of the SIRT team over any instructions in this runbook.
This runbook is just a collection of various skills it might be useful to employ during SIRT Incidents for DHR.
Rotating Runner Tokens for DHR
Section titled “Rotating Runner Tokens for DHR”- Create a new Runner Token for a Runner Stack
- Use Hosted Runners Overview dashboard and make sure that the new active shard is actually processing jobs.
- Ask the Dedicated EOC to use the Rails Console on Customers Dedicated GitLab Instance to revoke the old Runner Token, OR ask the customer to delete the old Runner Token.
Rotating the aws_iam_access_key for the fleeting_service_account (aws_iam_user named after the Runner Shard)
Section titled “Rotating the aws_iam_access_key for the fleeting_service_account (aws_iam_user named after the Runner Shard)”The aws_iam_user used by a runner shard will have the same name as the shard itself, e.g. blue-abc123 will have an IAM user named blue-abc123.
- Identify inactive shard using the Grafana Dashboard
- Breakglass into AMP pod for provision
- Init Terraform state for inactive shard
- Confirm access key for inactive shard
terraform state show module.grit_iam.aws_iam_access_key.fleeting_service_account_key - Destroy the access key for inactive shard
terraform destroy -target=module.grit_iam.aws_iam_access_key.fleeting_service_account_key - Run
hosted_runner_deployto do a ZDD onto the inactive shard. Terraform will automatically recreate the access key - It is likely wise to do this on both shards of a given runner stack.
Ideally we will eventually move to using IAM roles instead of static access keys and IAM users for fleeting, which will mean this skill is unnecessary.
Identifying details about the job run on a specific ephemeral job machine using Opensearch
Section titled “Identifying details about the job run on a specific ephemeral job machine using Opensearch”- Access the customer’s Opensearch
- Run the query
fluentd_tag: "*-manager-logs" AND json.instance-id: "*{instance_id}*" AND json.project: "**"substituting in yourinstance_id. - The log that returns should include
json.gitlab_user_idjson.instance-idjson.internal-addressjson.jobjson.namespace_idjson.organization_idjson.projectjson.project_full_pathjson.root_namespace_idjson.runnerjson.runner_namejson.time
etc etc
Manually blocking a specific IP on a hosted runner VPC via the AWS console
Section titled “Manually blocking a specific IP on a hosted runner VPC via the AWS console”- Breakglass into the customer’s DHR AWS account.
- Go to VPCs in AWS.
- Go to Network ACLs
- Find the Network ACL for the specific VPC
- Click on the Outbound Rules tab
- Edit Outbound Rules
- Add a new rule with a lower Rule Number than any existing rule
- Set Type: All traffic
- Set Destination: the specific IP/Port of the IP to be blocked in CIDR notation (e.g.
127.0.0.1/32) - Set Allow/Deny = Deny
- Save
It is very important to follow the breakglass procedure and record what was done to the Network ACLs for follow up and ideally codification.
Ideally we will eventually move to using AWS Firewall Rules which have the ability to block entire domains instead of just IP addresses.