security_incidents

Responding to SIRT Incidents on Dedicated Hosted Runners (DHR)

It is very important in a security incident to follow the Security Incident Response Guide and prioritize what is written there and the instructions of the SIRT team over any instructions in this runbook.

This runbook is just a collection of various skills it might be useful to employ during SIRT Incidents for DHR.

Rotating Runner Tokens for DHR

Create a new Runner Token for a Runner Stack
Use Hosted Runners Overview dashboard and make sure that the new active shard is actually processing jobs.
Ask the Dedicated EOC to use the Rails Console on Customers Dedicated GitLab Instance to revoke the old Runner Token, OR ask the customer to delete the old Runner Token.

Rotating the `aws_iam_access_key` for the `fleeting_service_account` (`aws_iam_user` named after the Runner Shard)

The aws_iam_user used by a runner shard will have the same name as the shard itself, e.g. blue-abc123 will have an IAM user named blue-abc123.

Identify inactive shard using the Grafana Dashboard
Breakglass into AMP pod for provision
Init Terraform state for inactive shard
Confirm access key for inactive shard terraform state show module.grit_iam.aws_iam_access_key.fleeting_service_account_key
Destroy the access key for inactive shard terraform destroy -target=module.grit_iam.aws_iam_access_key.fleeting_service_account_key
Run hosted_runner_deploy to do a ZDD onto the inactive shard. Terraform will automatically recreate the access key
It is likely wise to do this on both shards of a given runner stack.

Ideally we will eventually move to using IAM roles instead of static access keys and IAM users for fleeting, which will mean this skill is unnecessary.

Identifying details about the job run on a specific ephemeral job machine using Opensearch

Access the customer’s Opensearch
Run the query fluentd_tag: "*-manager-logs" AND json.instance-id: "*{instance_id}*" AND json.project: "**" substituting in your instance_id.
The log that returns should include

json.gitlab_user_id
json.instance-id
json.internal-address
json.job
json.namespace_id
json.organization_id
json.project
json.project_full_path
json.root_namespace_id
json.runner
json.runner_name
json.time

etc etc

Manually blocking a specific IP on a hosted runner VPC via the AWS console

Breakglass into the customer’s DHR AWS account.
Go to VPCs in AWS.
Go to Network ACLs
Find the Network ACL for the specific VPC
Click on the Outbound Rules tab
Edit Outbound Rules
Add a new rule with a lower Rule Number than any existing rule
Set Type: All traffic
Set Destination: the specific IP/Port of the IP to be blocked in CIDR notation (e.g. 127.0.0.1/32)
Set Allow/Deny = Deny
Save

It is very important to follow the breakglass procedure and record what was done to the Network ACLs for follow up and ideally codification.

Ideally we will eventually move to using AWS Firewall Rules which have the ability to block entire domains instead of just IP addresses.

security_incidents

Responding to SIRT Incidents on Dedicated Hosted Runners (DHR)

Rotating Runner Tokens for DHR

Rotating the aws_iam_access_key for the fleeting_service_account (aws_iam_user named after the Runner Shard)

Identifying details about the job run on a specific ephemeral job machine using Opensearch

Manually blocking a specific IP on a hosted runner VPC via the AWS console

Rotating the `aws_iam_access_key` for the `fleeting_service_account` (`aws_iam_user` named after the Runner Shard)