Debugging MacOS Runners
This document provides a comprehensive guide for debugging issues with AWS MacOS runner instances. It covers monitoring instance health, accessing instances, debugging nested VMs, and handling common problems.
AWS MacOS Dedicated Host Characteristics
Section titled “AWS MacOS Dedicated Host Characteristics”Critical Differences from Linux VMs
Section titled “Critical Differences from Linux VMs”⚠️ IMPORTANT: MacOS dedicated hosts are fundamentally different from typical Linux VMs:
- Bare Metal Hardware: Although VMs are created for jobs in the MacOS hosts, these are physical Mac machines (Dedicated Hosts on AWS) not virtualized instances.
- Provisioning Time: Can take hours to provision a new dedicated host.
- Deprovisioning Time: Can take hours to fully release and clean up.
- Limited Availability: Hardware availability is constrained by AWS’s allocation by region and zones.
Hardware Availability Challenges
Section titled “Hardware Availability Challenges”- Deployment Delays: Deployments can be significantly delayed waiting for available hardware.
- Regional Constraints: MacOS machine types are limited to specific AWS regions.
- Capacity Planning: Requires careful planning due to hardware limitations.
- Cost Implications: Dedicated hosts incur costs even when not actively running jobs.
Historical Issues
Section titled “Historical Issues”- Hardware Shortages: Periodic unavailability of Mac hardware in AWS for specific zones or regions.
- Extended Wait Times: Deployments have been delayed by hours or days waiting for capacity.
- Scaling Limitations: ASG scaling events can fail due to insufficient dedicated host capacity.
Monitoring Scaling Progress and Failures
Section titled “Monitoring Scaling Progress and Failures”Auto Scaling Group Health Monitoring
Section titled “Auto Scaling Group Health Monitoring”Checking ASG Progress
Section titled “Checking ASG Progress”-
AWS Console ASG View
- Navigate to AWS Console Auto Scaling Groups
- Select the relevant ASG (medium or large Macs)
- Check Activity tab for scaling events
-
Common Scaling Activity Statuses
- Successful: Green checkmark with completion timestamp
- In Progress: Yellow icon with “Scaling” status
- Failed: Red X with error description. For example, “Insufficient capacity”
Determining autoscaling group (ASG) health
Section titled “Determining autoscaling group (ASG) health”Health Metrics and Monitoring
Section titled “Health Metrics and Monitoring”AWS ASG dashboard
Section titled “AWS ASG dashboard”- Access via AWS Console
- Medium Macs are in account
215928322474
- Large Macs are in account
730335264460
- Medium Macs are in account
- Check for instances in unhealthy states
- Check scaling activity history for unexpected terminations or failed operations
Mac host metrics
Section titled “Mac host metrics”- Check the EC2 Dedicated Hosts dashboard for abnormal states
- Monitor for hosts in “pending” or “released” states that might indicate provisioning issues
- Verify vCPU utilization is present for all active hosts
Runner manager metrics
Section titled “Runner manager metrics”Logs Location
Section titled “Logs Location”- Nesting logs:
/Users/ec2-user/nesting.log
- MacOS init logs (from
user-data
script on Mac host):/var/log/amazon/ec2/ec2-macos-init.log
Access to MacOS VMs
Section titled “Access to MacOS VMs”Refer to access.md for information on how to access MacOS VMs.
Debugging connection between runner manager and host Mac
Section titled “Debugging connection between runner manager and host Mac”-
Confirm established connection with host Mac
Terminal window # On runner managerss -tn | grep HOST_MAC_PRIVATE_IP4 -
Confirm established connections from host Mac to runner manager
Terminal window # On the host Macsudo lsof -i@RUNNER_MANAGER_PRIVATE_IP4 -
List network connections between host Mac and nesting VMs. For every VM in
nesting list
there should be a connection.Terminal window # On the host Mac
Debugging Nesting Client/Server
Section titled “Debugging Nesting Client/Server”Logs Locations
Section titled “Logs Locations”-
Nesting settings
# On the host Maccat /Users/ec2-user/nesting.json -
List available nesting images
# On the host Macls /Volumes/VMData/images
Managing Nested VMs with Nesting client
Section titled “Managing Nested VMs with Nesting client”Confirm nesting service is running
Section titled “Confirm nesting service is running”launchctl list | grep nesting
Get nesting help
Section titled “Get nesting help”nesting
Get nesting version
Section titled “Get nesting version”nesting version
List running VMs
Section titled “List running VMs”nesting list
Output will provide the ID, image, and localhost:port of the running VM. For example:
bb9wtbcq macos-14-xcode-15 127.0.0.1:60835
SSH into VM
Section titled “SSH into VM”The SSH userid and password for VMs can be found in the associated runner manager instance in the [runners.autoscaler.vm_isolation.connector_config]
section of /etc/gitlab-runner/config.toml
.
Possible Nesting Issues
Section titled “Possible Nesting Issues”-
Service not starting
# Restart the nesting serversudo launchctl unload /Library/LaunchDaemons/nesting.plistsudo launchctl load /Library/LaunchDaemons/nesting.plist -
Connection issues
# Check network connectivitysudo tcpdump -i any port 22 | grep RUNNER_MANAGER_PRIVATE_IP4