Skip to content

CloudSQLDatabaseDown

This alert means that the Cloud SQL database for the environment is unavailable. Any services using the Cloud SQL database that is down are most likely to be degraded or unavailable as well.

If you receive this alert, it is expect that you verify the Cloud SQL database status, look for a change or cause of the outage, and possibly open a support ticket with Google Cloud if warranted.

Consider using the Tech Stack or Metrics Catalog to determine which service owner may need to be involved.

This metric is exported from Google Stackdriver metrics and will report a value of 0 for down and 1 for up. The thresholds for the metric are to page on any database that reports as down for 5m (or longer). These values were chosen to reflect an aggresive response to a database being down.

Normally, the Cloud SQL databases in an environment should reflect as up.

Example view of the metric showing a failure. CloudSQLDatabaseDown Metric Source

This alert should fire rarely since it is a managed Google Cloud service. It is more likely to notify the EoC due to a configuration change, or maintenance work conducted by our own teams. Due to this, silencing the alert is possible, but a duration should be short and appropriate to the cause of the database being down.

Refer to the metrics above for a reference of the metric when down.

At the time of this being written, Cloud SQL is only in use in a few environments (ops and pre). While GitLab.com would not be directly affected by a database being unavailable in Cloud SQL, there are some key dependent services that could create a high severity incident.

  • Packagecloud - Customer facing service for GitLab packages.
  • Sentry - Internal error tracking service.
  • Grafana - Internal monitoring and metrics service.

The pre environment also depends on several Cloud SQL Databases.

Here is the expression the alert uses in Grafana.

stackdriver_cloudsql_database_cloudsql_googleapis_com_database_up{env="ops"} != 1

Since this metric represents a binary signal of up or down, there is little benefit in having dashboards to show this information.

Logs for Cloud SQL are probably best references in the ops or pre projects’ Log Explorer in the Google Cloud web console.

  1. Try to identify from the alert expression which database is unavailable.
  2. Examine the service to verify if it is working or showing problems. This may help rule out false positive alerts.
  3. Log into the Google Cloud web console and find the Cloud SQL database and look for health information.
  4. Check the Google Cloud Status page for related managed service outages.
  • N/A

This service is managed by Google Cloud. Outside of our own changes, any degredation in cloud services could contribute to a Cloud SQL database being unavailable.

If the outage is due to a Google Cloud issue, you will need to open a support ticket via the web console. If you also need more synchronous help, you can try to ask for help in the #ext-google-cloud slack channel.