Debugging MacOS Runners
This document provides a comprehensive guide for debugging issues with AWS MacOS runner instances. It covers monitoring instance health, accessing instances, debugging nested VMs, and handling common problems.
Known Issues
Section titled “Known Issues”- Host Acquisition Problems: Extended wait times for host availability
- job VM Performance Variance: Historical I/O performance issues (resolved)
- Extended Wait Times: Deployments have been delayed by hours or days waiting for capacity.
Performance Considerations
Section titled “Performance Considerations”Historical performance issues, especially with I/O variance, were traced to EBS lazy loading of large AMI images. The current architecture addresses this by:
- Using dedicated EBS volumes for job VM disks
- Pre-downloading images at startup
- Ensuring full EBS performance from the start
For detailed performance characteristics, see AWS EBS documentation.
Monitoring Scaling Progress and Failures
Section titled “Monitoring Scaling Progress and Failures”Auto Scaling Group Health Monitoring
Section titled “Auto Scaling Group Health Monitoring”Checking ASG Progress
Section titled “Checking ASG Progress”-
AWS Console ASG View
- Navigate to AWS Console Auto Scaling Groups
- Select the relevant ASG (medium or large Macs)
- Check Activity tab for scaling events
-
Common Scaling Activity Statuses
- Successful: Green checkmark with completion timestamp
- In Progress: Yellow icon with “Scaling” status
- Failed: Red X with error description. For example, “Insufficient capacity”
Determining autoscaling group (ASG) health
Section titled “Determining autoscaling group (ASG) health”Health Metrics and Monitoring
Section titled “Health Metrics and Monitoring”AWS ASG dashboard
Section titled “AWS ASG dashboard”- Access via AWS Console
- Staging Macs are in account
251165465090
- Medium Macs are in account
215928322474
- Large Macs are in account
730335264460
- Staging Macs are in account
- Check for instances in unhealthy states
- Check scaling activity history for unexpected terminations or failed operations
Mac host metrics
Section titled “Mac host metrics”- Check the EC2 Dedicated Hosts dashboard for abnormal states
- Monitor for hosts in “pending” or “released” states that might indicate provisioning issues
- Verify vCPU utilization is present for all active hosts
Runner manager metrics
Section titled “Runner manager metrics”Logs Location
Section titled “Logs Location”- Nesting logs:
/Users/ec2-user/nesting.log
- MacOS init logs (from
user-data
script on Mac host):/var/log/amazon/ec2/ec2-macos-init.log
Access to MacOS instances and job VMs
Section titled “Access to MacOS instances and job VMs”Refer to access.md for information on how to access MacOS instances and job VMs.
Debugging connection between runner manager and host Mac
Section titled “Debugging connection between runner manager and host Mac”-
Confirm established connection with host Mac
Terminal window # On runner managerss -tn | grep HOST_MAC_PRIVATE_IP4 -
Confirm established connections from host Mac to runner manager
Terminal window # On the host Macsudo lsof -i@RUNNER_MANAGER_PRIVATE_IP4 -
List network connections between the MacOS host instance and nesting job VMs. For every job VM in
nesting list
there should be a connection.Terminal window # On the host Mac
Debugging Nesting Client/Server
Section titled “Debugging Nesting Client/Server”Logs Locations
Section titled “Logs Locations”-
Nesting settings
# On the host Maccat /Users/ec2-user/nesting.json -
List available nesting images
# On the host Macls /Volumes/VMData/images
Managing Nested Job VMs with Nesting client
Section titled “Managing Nested Job VMs with Nesting client”Confirm nesting service is running
Section titled “Confirm nesting service is running”# On the host Maclaunchctl list | grep nesting
Get nesting help
Section titled “Get nesting help”# On the host Macnesting
Get nesting version
Section titled “Get nesting version”# On the host Macnesting version
List running job VMs
Section titled “List running job VMs”# On the host Macnesting list
Output will provide the ID, image, and localhost:port of the running job VM. For example:
bb9wtbcq macos-14-xcode-15 127.0.0.1:60835
SSH from MacOS instance into job VM
Section titled “SSH from MacOS instance into job VM”The SSH userid and password for job VMs can be found in the associated runner manager instance in the [runners.autoscaler.vm_isolation.connector_config]
section of /etc/gitlab-runner/config.toml
.
# On the host Mac
Possible Nesting Issues
Section titled “Possible Nesting Issues”-
Service not starting
# Restart the nesting serversudo launchctl unload /Library/LaunchDaemons/nesting.plistsudo launchctl load /Library/LaunchDaemons/nesting.plist -
Connection issues
# Check network connectivitysudo tcpdump -i any port 22 | grep RUNNER_MANAGER_PRIVATE_IP4