Container Registry database post-deployment migrations
Context
Section titled “Context”Until recently the registry only had support for regular database schema migrations. After completing the GitLab.com registry upgrade/migration (gitlab-org&5523), we’re now in a position where the database has grown enough to make simple changes (like creating new indexes) take a long time to execute, so we had to introduce support for post-deployment migrations.
Regular schema migrations are automatically applied by the registry helm chart using a migrations job, introduced in gitlab-org/charts/gitlab#2566. The mid/long-term goal is to have a similar automation for post-deployment migrations (gitlab-com/gl-infra/delivery#3926).
Meanwhile, we’re already feeling the need to ship post-deployment migrations, so we had to move forward with a short-term solution. This implies skipping any post-deployment migrations during deployments and then raising a change request to have these manually applied from within a registry instance after deploying a version that includes new post-deployment migrations.
This document provides instructions for SREs to apply post-deployment migrations.
There’s a private recording from delivery team for applying the migrations: https://www.youtube.com/watch?v=QFH11OE91Vw
Applying post-deployment migrations
Section titled “Applying post-deployment migrations”This should be done from within a registry instance in K8s, using the built-in registry
CLI. If needed, you can look at the relevant CLI documentation here.
-
Confirm that the registry version indicated in the Change Request matches the one (and there is only one) running in the target environment (dashboard);
-
Connect to any cluster from the environment for which maintenance is occurring (runbook);
Note that the regional clusters in
gprd
andgstg
do not have any Registry pods. So, you can connect to any one of the three zonal clusters. -
Find the oldest container registry Pod (ignore Pods that have
-migrations-
in the name!) and access it usingkubectl
:Terminal window POD_NAME=$(kubectl get pods -n gitlab -l app=registry --sort-by=.metadata.creationTimestamp -o name | grep -v -- "-migrations-" | head -n 1) && \[ -n "$POD_NAME" ] && kubectl exec -n gitlab -it $POD_NAME -- /bin/bash || \echo "Pod name \"$POD_NAME\" is invalid."[!note] If you are running
kubectl exec
on a cluster in thegprd
environment, then notify SIRT viaSlack -> SIRTBot -> Notify SecOps button
that you are exec-ing into the pod with a link to the change request, as they will receive a SIRT alert about it. -
List pending migrations:
Terminal window SKIP_POST_DEPLOYMENT_MIGRATIONS=false registry database migrate status /etc/docker/registry/config.ymlYou should see something like this:
pre-deployment:+---------------------------------------------------------------------------------+--------------------------------------+| MIGRATION | APPLIED |+---------------------------------------------------------------------------------+--------------------------------------+| 20210503145024_create_top_level_namespaces_table | 2022-11-29 14:12:58.477128 +0000 WET || 20220803114849_update_gc_track_deleted_layers_trigger | 2022-11-29 14:13:00.209522 +0000 WET || ... | ... |+---------------------------------------------------------------------------------+--------------------------------------+post-deployment:+---------------------------------------------------------------------------------+--------------------------------------+| MIGRATION | APPLIED |+---------------------------------------------------------------------------------+--------------------------------------+| 20210503145024_post_add_layers_simplified_usage_index_batch_0 | 2022-11-29 14:12:58.477128 +0000 WET || 20220803114849_post_add_layers_simplified_usage_index_batch_1 | 2022-11-29 14:13:00.209522 +0000 WET || ... | ... || 20221123174403_post_add_layers_simplified_usage_index_batch_2 | |+---------------------------------------------------------------------------------+--------------------------------------+In this example, there is one pending post-deployment migration named
20221123174403_post_add_layers_simplified_usage_index_batch_2
. You know it’s a pending post-deploy migration because it is in thepost-deployment
table section and becauseAPPLIED
is empty.Note that we’re explicitly disabling the
SKIP_POST_DEPLOYMENT_MIGRATIONS
environment variable for these commands. If we don’t, then the registry CLI will ignore post-deployment migrations. This environment variable is set totrue
for our deployments (sample) to avoid having these migrations applied alongside regular schema migrations during upgrades. -
Confirm that there are no pending regular migrations in the list above;
-
Confirm that the number and name of pending post-deployment migrations matches those described in the change request;
-
Suspend execution if any conditions above are not met. Contact the change request development DRI to evaluate results and determine how to proceed. Do not continue until explicit guidance is received from the DRI. cancel the change if no response is received within the operational window;
-
Proceed to apply post-deployment migrations:
Terminal window SKIP_POST_DEPLOYMENT_MIGRATIONS=false registry database migrate up /etc/docker/registry/config.ymlYou should see something like this:
post-deployment:20221123174403_post_add_layers_simplified_usage_index_batch_2OK: applied 0 pre-deployment migration(s), 1 post-deployment migration(s) and 0 background migration(s) -
Wait for the above to complete and confirm there are no pending migrations:
Terminal window SKIP_POST_DEPLOYMENT_MIGRATIONS=false registry database migrate status /etc/docker/registry/config.ymlYou should see something like this:
pre-deployment:+---------------------------------------------------------------------------------+--------------------------------------+| MIGRATION | APPLIED |+---------------------------------------------------------------------------------+--------------------------------------+| 20210503145024_create_top_level_namespaces_table | 2022-11-29 14:12:58.477128 +0000 WET || 20220803114849_update_gc_track_deleted_layers_trigger | 2022-11-29 14:13:00.209522 +0000 WET || ... | ... |+---------------------------------------------------------------------------------+--------------------------------------+post-deployment:+---------------------------------------------------------------------------------+--------------------------------------+| MIGRATION | APPLIED |+---------------------------------------------------------------------------------+--------------------------------------+| 20210503145024_post_add_layers_simplified_usage_index_batch_0 | 2022-11-29 14:12:58.477128 +0000 WET || 20220803114849_post_add_layers_simplified_usage_index_batch_1 | 2022-11-29 14:13:00.209522 +0000 WET || ... | ... || 20221123174403_post_add_layers_simplified_usage_index_batch_2 | 2022-12-14 12:31:57.42551 +0000 WET |+---------------------------------------------------------------------------------+--------------------------------------+Note that
APPLIED
in thepost-deployment
table is no longer empty.
Monitoring
Section titled “Monitoring”The migrations tool used by the registry (link) does not report when each individual migration has been applied, only when all pending are done (or one fails). As result, when applying multiple migrations, the registry CLI will output the list of all migrations to apply and wait for all to be applied (or for one to fail) before providing additional feedback (success or failure).
While the tool does not support realtime feedback, if applying multiple long-running migrations and wanting to know the progress of each one, we can use the registry database migrate status /etc/docker/registry/config.yml
CLI command on another registry pod to see the list of migrations already applied.
Alternatively, we can look directly at the post_deploy_schema_migrations
table (from where the database migrate status
reads post-deployment migrations) on the registry database with the following query:
SELECT * FROM post_deploy_schema_migrations ORDER BY applied_at DESC LIMIT 10;
The output will look like follows:
id | applied_at---------------------------------------------------------------+------------------------------- 20221123174403_post_add_layers_simplified_usage_index_batch_2 | 2022-12-21 19:02:19.923828+00 20221123174403_post_add_layers_simplified_usage_index_batch_1 | 2022-12-21 19:02:19.923828+00 20221123174403_post_add_layers_simplified_usage_index_batch_0 | 2022-12-21 19:02:19.923828+00 ... | ...
(10 rows)
As each post-deployment migration is applied, it will be inserted in this table, with the current time set in applied_at
. So we can glance at this query result when wondering how many post-deployment migration have been already applied.
Follow this guide on how to connect to the registry database using tsh
.