Skip to content

Rails SQL Apdex alerts

When we see an SQL Apdex alert is important to quickly asses the impact and rule out common causes like abuse and identify problematic queries. This runbook covers some of the topics that were discussed in the EOC Firedrill.

  • Be aware of the primary database for log queries
  • Check the dashboards and log links below to asses root cause
  • See if there is a quick recovery, if not page the IMOC who will bring in the CMOC in case we need to make a status page update

To find the exact query by the query_id from thanos on the matching Postgres node where the query was handled run

select queryid, substr(query ,1, 5000) from pg_stat_statements where queryid='xxxxx';

For any of the above queries, you can search for json.fingerprint on the left list of fields, click on it to see if a particular fingerprint is dominating slow queries or timeouts. From this, you can get the full query (or the endpoint ID) which will help to narrow down the performance degradation

For more detailed information about slow queries, see the runbook for collecting pg data

Often abuse can be the source of DB degradation, to see if there might be abuse happening reference the abuse runbook