Runway Platform Service
- Service Overview
- Alerts: https://alerts.gitlab.net/#/alerts?filter=%7Btype%3D%22runway%22%2C%20tier%3D%22inf%22%7D
- Label: gitlab-com/gl-infra/production~“Service::Runway”
Logging
Section titled “Logging”Troubleshooting Pointers
Section titled “Troubleshooting Pointers”- ../ai-gateway/code-suggestions.md
- Cloudflare: Managing Traffic
- Disaster Recovery Gameday Schedule
- Breakglass
- ../uncategorized/subnet-allocations.md
- Woodhouse-Slack Overview
- Restore/Backup Runway-managed Cloud SQL
- Cloud SQL Restore Pipeline Troubleshooting
- Privileged Access Management
Summary
Section titled “Summary”Runway is an experimental PaaS for stage groups to deploy and operate services. Runway is currently built with GitLab CI/CD, GitLab Environments, and GCP Cloud Run.
Not to be confused with a service that is managed by Runway, e.g. AI Gateway.
Architecture
Section titled “Architecture”For diagram, refer to architecture blueprint.
Performance
Section titled “Performance”Runway is a platform, so services determine deployment frequency. Performance primarly depends on following factors:
- Deployment pipeline failures
- Deployment pipeline duration
Due to versioning releases in source projects, these SLIs can be lagging indicators that do not occur until subsequent deployment is triggered by a service.
Scalability
Section titled “Scalability”Runway is a platform, so services determine workload rightsizing. Scalability primarly depends on following factors:
- Resources (CPU, Memory)
- Instances (Minimum, Maximum, Concurrency)
When investigating short-term saturation with a service deployed to Runway, you may need to scale on behalf of service owner. Long-term saturation resources are monitored with capacity planning.
Horizontal
Section titled “Horizontal”By default, Runway will scale up instances to handle all incoming requests. When a service is not receiving any traffic, instances are scaled down to zero.
Minimum instances
Section titled “Minimum instances”The minimum number of instances of a service. To update, set configuration in runway.yml
of source project.
Recommendation: Use this setting if you need to reduce cold start latency for a service.
Maximum instances
Section titled “Maximum instances”The maximum number of instances of a service. To update, set configuration in runway.yml
of source project.
Recommendation: Use this setting if you need to limit the number of connections to a backing service, e.g. database.
Maximum instance concurrent requests
Section titled “Maximum instance concurrent requests”The maximum number of concurrent requests per instance of the service. To update, set configuration in runway.yml
of source project. When tuning concurrency, consider increasing memory.
Recommendation: Use this setting if you need to either optimize cost efficiency, or limit concurrency of a service.
Vertical
Section titled “Vertical”By default, Runway will provision lightweight CPU and memory resources limits of 1000m
and 512Mi
, respectively. When a resource limit is exceeded, instance is terminated.
Memory
Section titled “Memory”The memory limit of an instance. To update, set configuration in runway.yml
of source project.
The CPU limit of an instance. To update, set configuration in runway.yml
of source project.
CPU Boost
Section titled “CPU Boost”Provide additional CPU during instance startup time. To update, set configuration in runway.yml
of source project.
Recommendation: Use this setting if you need to reduce cold start latency for a service.
Capacity Planning
Section titled “Capacity Planning”Runway provides capacity planning for saturation resources of a service. To view forecasts, refer to Tamland page.
Availability
Section titled “Availability”Runway is a platform that depends on GitLab.com and GCP, so deployments cannot occur when components are unavailable.
Regions
Section titled “Regions”Runway is a platform, so services determine region availability. Runway supports multi-region deployments across 40 GCP regions. The default region is us-east1
. For more information, refer to documentation.
Quotas
Section titled “Quotas”Runway is a platform, so services could be impacted by Cloud Run quota limits. To request quota increase, refer to GCP console.
Monitoring/Alerting
Section titled “Monitoring/Alerting”Runway is a platform, so services determine reliability. Cloud Run metrics are made available to services by scrapping with Stackdriver exporter.
When investigating issues with a service deployed to Runway, you may need to drill-down on behalf of service owner:
Troubleshooting
Section titled “Troubleshooting”How do I rollback?
Section titled “How do I rollback?”To rollback a deployment for Runway service, you have two options:
- Revert MR, or
- Re-run previous deployment job (Example)
How do I promote to production?
Section titled “How do I promote to production?”By default, Runway automatically promotes to production after delay of 10 minutes. To promote sooner, you can manually play production Promote
job.
How do I rotate secret?
Section titled “How do I rotate secret?”Runway secrets are stored in Vault and integrated with Secret Manager. To rotate a secret, refer to documentation.
Links to Infrastructure and Tooling
Section titled “Links to Infrastructure and Tooling”- Runway Deployments
- Runway Services
- Runway Artifacts
- Runway Application Load Balancers
- Runway Secrets (GSM)
- Runway Secrets (Vault)
- Runway Provisioner
- Runway Reconciler
- Runway CI Tasks
- Runway GCP Projects