#121 Database schema migration runs in pod postStart event which is not ideal
Closed: Fixed 2 years ago Opened 2 years ago by dcallagh.

Currently in our test template and stage/prod YAML files for OpenShift, we run waiverdb db upgrade as a postStart lifecycle event for the web pods:

[...]
  template:
    metadata:
      labels:
        app: waiverdb
        service: web 
    spec:
      containers:
      - name: web 
[...]
        lifecycle:
          postStart:
            exec:
              command: [ /bin/sh, -i, -c, "waiverdb db upgrade" ]
[...]

This is what gives us an empty database with populated schema in the test environment, and this is what's intended to handle schema migrations for real deployments (although we have never tested this to my knowledge).

Doing it in the postStart event seems to have the following problems. We should either find a better strategy (what is the prior art for this?) or satisfy ourselves that the following are non-issues with the postStart event.

  • The OpenShift docs point out that the postStart and container entrypoint are executed concurrently, there is no guarantee which happens first. It seems like a bad idea to let the web app start up while the schema migration is in progress. Although maybe there is no harm in this?

  • I also thought this would be subject to a race between multiple pods in the same deployment, however I don't think this is true anymore since Openshift does not consider a pod "running" until the postStart event is finished, and in a rolling deployment it does not start the second pod until the first one is running and healthy. But we should confirm this experimentally.

  • What happens if the migration fails? See #115. In principle the schema migration for waiverdb 0.6 can fail if it hits a result_id which it doesn't know how to handle. In that case the schema will be left in a half-upgraded state (that's our problem) but will Openshift notice the failure and also roll back the deployment correctly?

  • Speaking of rolling back, we are never invoking any downgrade procedure. If a deployment has a schema migration and the deployment fails, at some point we need to run the downgrade steps so we can re-attempted the deployment later. Or maybe in a cloud-native world the answer is that you simply never do any downgrades, in which case all upgrades must be idempotent?


Okay, well we now know how it behaves when there is a failure in postStart, because it's failing in the Jenkins tests right now. :-)

OpenShift simply kills the pod when postStart fails, and so it goes into a crash loop and the deployment ends up in an Error state.

In the pod event log we see:

 Killing container with id docker://web:FailedPostStartHook

but it seems that there is no way to get the logs/output of the postStart script. There is also no way to launch a debug container, because OpenShift considers the container to have failed to start because postStart doesn't work. So this postStart event is definitely not the right place for the migration.

@ralph found https://github.com/openshift/training/blob/master/14-Lifecycle-Pre-and-Post-Deployment-Hooks.md#add-a-database-migration-file which seems to suggest we should be using deployment hooks and not pod hooks for doing the database migration. That sounds much more plausible.

Odd. I tried the following patch, but open.paas hung trying to start the pre-hook. oc get pods showed invalidImageName. From other docs, I saw a claim that it should refer to something like spec.templates.spec.containers[0].name, but that seems to be not be the case.

diff --git a/openshift/waiverdb-test-template.yaml b/openshift/waiverdb-test-template.yaml
index c59080e..09020f8 100644
--- a/openshift/waiverdb-test-template.yaml
+++ b/openshift/waiverdb-test-template.yaml
@@ -167,6 +167,14 @@ objects:
       service: web
   spec:
     replicas: 2
+    strategy:
+      type: Rolling
+      rollingParams:
+        pre:
+          failurePolicy: abort
+          execNewPod:
+            containerName: web
+            command: [ /bin/sh, -i, -c, "waiverdb db upgrade" ]
     selector:
       environment: "test-${TEST_ID}"
       service: web
@@ -181,10 +189,6 @@ objects:
       image: "docker-registry.engineering.redhat.com/factory2/waiverdb:${WAIVERDB_APP_VERSION}"
       ports:
       - containerPort: 8080
-          lifecycle:
-            postStart:
-              exec:
-                command: [ /bin/sh, -i, -c, "waiverdb db upgrade" ]
       volumeMounts:
       - name: config-volume
         mountPath: /etc/waiverdb

I think the problem is that our container spec does not actually define any image -- we took that out, because we use the ImageChangeTrigger instead. If you define both then it caused some weird behaviour where OpenShift was re-deploying every time we ran oc apply.

So I guess when the execNewPod refers to the web container spec, it doesn't understand which image we want to use.

Actually that's true for the prod configuration we have right now, but it's not the case in waiverdb-test-template.yaml. That is, the test template does have an image specified for the web container... so I am not sure why it wouldn't be able to pull it. :confused:

When I try the exact patch from your comment @ralph I see that the pre-deployment hook pod pulls its image successfully, and starts up, but then crashes:

IOError: [Errno 2] Unable to load configuration file (No such file or directory): '/etc/waiverdb/settings.py'

I added ; sleep 300 to the hook command to give me a chance to poke around in the Openshift UI before it cleans up the pod.

Oh right, we have to also list the volumes we want to inherit from the web container definition.

The next problem is that the pre deployment hook pod may (probably always will) execute before the Postgres pod is up and ready for connections, causing the migration to fail like this:

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not connect to server: No route to host

The OpenShift answer to this kind of problem appears to be: just make your depending service die or keep retrying until the dependent service is up. Which is okay for the web app (it will just keep failing its health check until Postgres is reachable, then it will start working). But it doesn't really help this situation with the pre-deployment hook pod.

Anyway, I have rigged up a special command which we can use to sleep until Postgres is connectable, then do the migration: PR#123

However this reveals the actual problem with our new migration, it fails like this:

...
INFO  [alembic.runtime.migration] Running upgrade 0a74cdab732a -> f2772c2c64a6, waive absence of result
Traceback (most recent call last):
...
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) relation "waiver" does not exist
LINE 2: FROM waiver
             ^
[SQL: 'SELECT waiver.id AS waiver_id, waiver.result_id AS waiver_result_id, waiver.subject AS waiver_subject, waiver.testcase AS waiver_testcase, waiver.username AS waiver_username, waiver.proxied_by AS waiver_proxied_by, waiver.product_version AS waiver_product_version, waiver.waived AS waiver_waived, waiver.comment AS waiver_comment, waiver.timestamp AS waiver_timestamp \nFROM waiver'] 

The problem above is because, by default, on Postgres Alembic will run all migrations inside one single big transaction. That means the waiver table doesn't even exist yet, when we try to SELECT out of it.

There is a transaction_per_migration=True setting we can use, but that's not sufficient... With that in place, instead it seems that the migration deadlocks inside Postgres. I am guessing this is because we are trying to SELECT from a table which we have ALTER'ed in the same transaction?

PosgtreSQL DDL is transactional, so running the migrations in one transaction shouldn't be a problem.

Are you sure the database has been populated at this point? Is alembic_version set to the right value?

We could split the existing migration into two. One that adds/alters columns and a second that does the SELECT and INSERT (all with transaction_per_migration=True).

PR #124 is an attempt at splitting the migration.

edit: I think it will do the trick!

@mikeb I think the DDL is transactional, but not in the same way that normal SQL statements are... That is, the SELECT and the CREATE TABLE are not really participating in the same transaction. Same reason that having a SELECT in the same transaction as ALTER TABLE deadlocks. I am still learning about this though. All my experience is from MySQL land, where transactional DDL is an unattainable pipe dream :-P

I guess the "cloud native" way to do migrations like this is you first deploy a version of the app which adds the new columns, and runs some migration code (at runtime) to populate the new columns, but doesn't use the new columns yet... Then later, you deploy a new version which uses the new columns... And eventually, you deploy a new version which drops the old columns when they are no longer in use.

But Alembic isn't really designed with that kind of pattern in mind.

OK.. jenkins ran fine once in my rbean namespaced job...

.. but now it fails again. @gnaponie and I tried to duplicate it in open.paas, and we could, but it seems to be network/cluster issues there, not necessarily issues with our job.

I'm trying now to recreate the situation in @mikeb's new cluster, just to see what happens. The first problem is that it needs to trust the CA cert for the internal registry to pull our image in the first place.

OK - the new f2ocp cluster seems to work every time I try to oc apply our configs. I'll convert my rbean namespaced job next to see if jenkins can thrive with it.

... and, we ran into some CA cert issues with the f2ocp cluster. See the waiverdb-mikeb namespaced jenkins job for the latest run for now.

This should be all set now.

Metadata Update from @ralph:
- Issue status updated to: Closed (was: Open)

2 years ago

Metadata Update from @dcallagh:
- Issue assigned to dcallagh
- Issue close_status updated to: Fixed
- Issue set to the milestone: 0.7

2 years ago

Login to comment on this ticket.

Metadata