Skip to content

Debugging MacOS Runners

This document provides a comprehensive guide for debugging issues with AWS MacOS runner instances. It covers monitoring instance health, accessing instances, debugging nested VMs, and handling common problems.

⚠️ IMPORTANT: MacOS dedicated hosts are fundamentally different from typical Linux VMs:

  • Bare Metal Hardware: Although VMs are created for jobs in the MacOS hosts, these are physical Mac machines (Dedicated Hosts on AWS) not virtualized instances.
  • Provisioning Time: Can take hours to provision a new dedicated host.
  • Deprovisioning Time: Can take hours to fully release and clean up.
  • Limited Availability: Hardware availability is constrained by AWS’s allocation by region and zones.
  • Deployment Delays: Deployments can be significantly delayed waiting for available hardware.
  • Regional Constraints: MacOS machine types are limited to specific AWS regions.
  • Capacity Planning: Requires careful planning due to hardware limitations.
  • Cost Implications: Dedicated hosts incur costs even when not actively running jobs.
  • Hardware Shortages: Periodic unavailability of Mac hardware in AWS for specific zones or regions.
  • Extended Wait Times: Deployments have been delayed by hours or days waiting for capacity.
  • Scaling Limitations: ASG scaling events can fail due to insufficient dedicated host capacity.
  1. AWS Console ASG View

  2. Common Scaling Activity Statuses

    • Successful: Green checkmark with completion timestamp
    • In Progress: Yellow icon with “Scaling” status
    • Failed: Red X with error description. For example, “Insufficient capacity”

Determining autoscaling group (ASG) health

Section titled “Determining autoscaling group (ASG) health”
  • Access via AWS Console
    • Medium Macs are in account 215928322474
    • Large Macs are in account 730335264460
  • Check for instances in unhealthy states
  • Check scaling activity history for unexpected terminations or failed operations
  • Check the EC2 Dedicated Hosts dashboard for abnormal states
  • Monitor for hosts in “pending” or “released” states that might indicate provisioning issues
  • Verify vCPU utilization is present for all active hosts
  • Nesting logs: /Users/ec2-user/nesting.log
  • MacOS init logs (from user-data script on Mac host): /var/log/amazon/ec2/ec2-macos-init.log

Refer to access.md for information on how to access MacOS VMs.

Debugging connection between runner manager and host Mac

Section titled “Debugging connection between runner manager and host Mac”
  • Confirm established connection with host Mac

    Terminal window
    # On runner manager
    ss -tn | grep HOST_MAC_PRIVATE_IP4
  • Confirm established connections from host Mac to runner manager

    Terminal window
    # On the host Mac
    sudo lsof -i@RUNNER_MANAGER_PRIVATE_IP4
  • List network connections between host Mac and nesting VMs. For every VM in nesting list there should be a connection.

    Terminal window
    # On the host Mac
    sudo lsof [email protected] | grep nesting
  • Nesting settings

    # On the host Mac
    cat /Users/ec2-user/nesting.json
  • List available nesting images

    # On the host Mac
    ls /Volumes/VMData/images
Terminal window
launchctl list | grep nesting
Terminal window
nesting
Terminal window
nesting version
Terminal window
nesting list

Output will provide the ID, image, and localhost:port of the running VM. For example:

Terminal window
bb9wtbcq macos-14-xcode-15 127.0.0.1:60835

The SSH userid and password for VMs can be found in the associated runner manager instance in the [runners.autoscaler.vm_isolation.connector_config] section of /etc/gitlab-runner/config.toml.

Terminal window
ssh -p PORT [email protected]
  1. Service not starting

    # Restart the nesting server
    sudo launchctl unload /Library/LaunchDaemons/nesting.plist
    sudo launchctl load /Library/LaunchDaemons/nesting.plist
  2. Connection issues

    # Check network connectivity
    sudo tcpdump -i any port 22 | grep RUNNER_MANAGER_PRIVATE_IP4