Troubleshooting HostedRunnersLoggingServiceUsageReplicationErrorSLOViolation
When the HostedRunnersLoggingServiceUsageReplicationErrorSLOViolation
alert is triggered, it indicates that replication has stopped for some reason. This issue is not related to the runner account and should be investigated from the tenant account perspective.
Possible Causes
Section titled “Possible Causes”The primary reasons for replication failure are:
- Permission issues – The required IAM roles or policies may be misconfigured.
- Underlying network issues – Connectivity problems between AWS services could prevent replication.
Steps to Investigate
Section titled “Steps to Investigate”- Check the status of the last objects in the S3 bucket to determine when replication stopped.
- Verify the replication configuration in AWS S3 to identify potential permission or network issues.
Resolution Steps
Section titled “Resolution Steps”AWS does not automatically retry replication for pending objects once a failure occurs. You must manually replicate the objects by following these steps:
-
Break the glass to access the tenant infrastructure.
-
Navigate to the S3 bucket.
-
Go to Batch Operations and create a new job.
-
Under Manifest, select
Create manifest using S3 Replication configuration
to identify unreplicated objects. -
Replication configuration source bucket should be in the same account, and choose the bucket name with format
{customer_name}-hosted-runner-usage
. -
Leave the filter as it is.
-
For the replication status, choose
failed
. -
Check the Save Batch Operations manifest.
-
The location for batch manifest should be
{customer_name}-hosted-runner-usage-report
and leave the rest as it is. -
Click Next.
-
For Operation type, choose
Replicate
. -
Click Next.
-
For completion report bucket, choose the same bucket selected in step 9, and the scope should be all tasks.
-
For IAM permission, open the search bar and filter by
{customer_name}-runner-s3-replication-role
. -
Click Next and Create job.
It takes a few minutes to prepare the job. Wait until the job is ready and has the status Awaiting your confirmation to run
. Click on it and run the job. Wait for the job to finish.
After it completes successfully:
- Check the report bucket and find the job report by job ID.
- Review the manifest to ensure all replications were successful.
- Check the job failed rate at the end, which should be zero.