Skip to content

security_incidents

Responding to SIRT Incidents on Dedicated Hosted Runners (DHR)

Section titled “Responding to SIRT Incidents on Dedicated Hosted Runners (DHR)”

It is very important in a security incident to follow the Security Incident Response Guide and prioritize what is written there and the instructions of the SIRT team over any instructions in this runbook.

This runbook is just a collection of various skills it might be useful to employ during SIRT Incidents for DHR.

  1. Create a new Runner Token for a Runner Stack
  2. Use Hosted Runners Overview dashboard and make sure that the new active shard is actually processing jobs.
  3. Ask the Dedicated EOC to use the Rails Console on Customers Dedicated GitLab Instance to revoke the old Runner Token, OR ask the customer to delete the old Runner Token.

Rotating the aws_iam_access_key for the fleeting_service_account (aws_iam_user named after the Runner Shard)

Section titled “Rotating the aws_iam_access_key for the fleeting_service_account (aws_iam_user named after the Runner Shard)”

The aws_iam_user used by a runner shard will have the same name as the shard itself, e.g. blue-abc123 will have an IAM user named blue-abc123.

  1. Identify inactive shard using the Grafana Dashboard
  2. Breakglass into AMP pod for provision
  3. Init Terraform state for inactive shard
  4. Confirm access key for inactive shard terraform state show module.grit_iam.aws_iam_access_key.fleeting_service_account_key
  5. Destroy the access key for inactive shard terraform destroy -target=module.grit_iam.aws_iam_access_key.fleeting_service_account_key
  6. Run hosted_runner_deploy to do a ZDD onto the inactive shard. Terraform will automatically recreate the access key
  7. It is likely wise to do this on both shards of a given runner stack.

Ideally we will eventually move to using IAM roles instead of static access keys and IAM users for fleeting, which will mean this skill is unnecessary.

Identifying details about the job run on a specific ephemeral job machine using Opensearch

Section titled “Identifying details about the job run on a specific ephemeral job machine using Opensearch”
  1. Access the customer’s Opensearch
  2. Run the query fluentd_tag: "*-manager-logs" AND json.instance-id: "*{instance_id}*" AND json.project: "**" substituting in your instance_id.
  3. The log that returns should include
  • json.gitlab_user_id
  • json.instance-id
  • json.internal-address
  • json.job
  • json.namespace_id
  • json.organization_id
  • json.project
  • json.project_full_path
  • json.root_namespace_id
  • json.runner
  • json.runner_name
  • json.time

etc etc

Manually blocking a specific IP on a hosted runner VPC via the AWS console

Section titled “Manually blocking a specific IP on a hosted runner VPC via the AWS console”
  1. Breakglass into the customer’s DHR AWS account.
  2. Go to VPCs in AWS.
  3. Go to Network ACLs
  4. Find the Network ACL for the specific VPC
  5. Click on the Outbound Rules tab
  6. Edit Outbound Rules
  7. Add a new rule with a lower Rule Number than any existing rule
  8. Set Type: All traffic
  9. Set Destination: the specific IP/Port of the IP to be blocked in CIDR notation (e.g. 127.0.0.1/32)
  10. Set Allow/Deny = Deny
  11. Save

It is very important to follow the breakglass procedure and record what was done to the Network ACLs for follow up and ideally codification.

Ideally we will eventually move to using AWS Firewall Rules which have the ability to block entire domains instead of just IP addresses.