#129 Database migration for 0.6 creates waiver.subject column as TEXT type but it should be JSON
Closed: Fixed 6 years ago Opened 6 years ago by dcallagh.

The model definition in the Waiver class says that the subject column type should be sqlalchemy_utils.JSONType, which is a special which uses the native Postgres JSON type if available else falls back to TEXT.

However the corresponding migration which added that column created it as TEXT instead.

As a result, the migration succeeds and the application works but when waivers are inserted the JSON-encoded string value is saved to the column, and then it comes back out of the API as a string:

"subject": "{\"item\": \"python-gradunwarp-1.0.3-1.fc24\", \"type\": \"koji_build\"}"...

It should be:

"subject": {"item": "python-gradunwarp-1.0.3-1.fc24", "type": "koji_build"}...

The fix we want is essentially:

--- a/waiverdb/migrations/versions/f2772c2c64a6_waive_absence_of_result.py
+++ b/waiverdb/migrations/versions/f2772c2c64a6_waive_absence_of_result.py
@@ -10,2 +10,3 @@
 import sqlalchemy as sa
+from sqlalchemy_utils import JSONType

@@ -18,3 +19,3 @@
 def upgrade():
-    op.add_column('waiver', sa.Column('subject', sa.Text(), nullable=True, index=True))
+    op.add_column('waiver', sa.Column('subject', JSONType, nullable=True, index=True))
     op.add_column('waiver', sa.Column('testcase', sa.Text(), nullable=True, index=True))

but it is more complicated than that.

For one thing, any deployment which has already undergone the bad migration now has string values in that column which will need to be migrated again (hopefully using some built-in Postgres functions...)

Also, we can't create an index on that column because apparently the JSON type is not indexable. That may quickly become a performance issue for Waiverdb. There is a different, newer type called JSONB which is indexable and seems to be preferred in Postgres nowadays, but I am not sure if Fedora infra's Postgres is new enough to have it, and the sqlalchemy_utils.JSONType is not aware of it (it appears to always just use JSON and not JSONB).

Also, this is probably a good example of why we need tests which prove that the schema migrations are producing a schema which matches what the Python model objects say that the schema should be.

After causing a number of bugs like this in Beaker, a few years ago I implemented a quite thorough set of tests for this (including handling all of MySQL's quirks)...

https://git.beaker-project.org/cgit/beaker/tree/IntegrationTests/src/bkr/inttest/server/test_database_migration.py

Ohh it's worse than that too.

If the column is correctly created as JSON type, the application fails in other ways, suggesting that we have never actually tested it using Postgres' JSON type...

127.0.0.1 - - [22/Feb/2018 16:53:40] "GET /api/v1.0/waivers/ HTTP/1.1" 500 -
Error on request:
Traceback (most recent call last):
[...]
  File "/home/dcallagh/work/waiverdb/waiverdb/api_v1.py", line 150, in get
    return json_collection(query, args['page'], args['limit'])
  File "/home/dcallagh/work/waiverdb/waiverdb/utils.py", line 37, in json_collection
    p = query.paginate(page, limit)
  File "/usr/lib/python2.7/site-packages/flask_sqlalchemy/__init__.py", line 476, in paginate
    items = self.limit(per_page).offset((page - 1) * per_page).all()
[...]
ProgrammingError: (psycopg2.ProgrammingError) could not identify an equality operator for type json
LINE 4: FROM waiver GROUP BY waiver.subject, waiver.testcase) ORDER ...
                             ^
 [SQL: 'SELECT waiver.id AS waiver_id, waiver.result_id AS waiver_result_id, waiver.subject AS waiver_subject, waiver.testcase AS waiver_testcase, waiver.username AS waiver_username, waiver.proxied_by AS waiver_proxied_by, waiver.product_version AS waiver_product_version, waiver.waived AS waiver_waived, waiver.comment AS waiver_comment, waiver.timestamp AS waiver_timestamp \nFROM waiver \nWHERE waiver.id IN (SELECT max(waiver.id) AS max_1 \nFROM waiver GROUP BY waiver.subject, waiver.testcase) ORDER BY waiver.timestamp DESC, waiver.timestamp DESC \n LIMIT %(param_1)s OFFSET %(param_2)s'] [parameters: {'param_1': 10, 'param_2': 0}]

So apparently we should be using the native SQLAlchemy support for JSON columns, instead of sqlalchemy_utils.JSONType. I found this issue, which is about having sqlalchemy-utils support JSONB instead of JSON, and the end result is that sqlalchemy_utils.JSONType will just be deprecated and removed:

https://github.com/kvesteri/sqlalchemy-utils/issues/94#issuecomment-173176521

and indeed, the native SQLALchemy JSON support appears to work nicely and do everything we need... except it doesn't support SQLite. 🙄

My proposal would be:

  1. Make the tests and dev server run on Postgres, and stop using SQLite. SQLite is convenient but we keep missing bugs because we aren't ever testing on Postgres.
  2. Make the column be JSONB, so that we can have proper indexes and comparison support and everything magically works.
  3. Write a migration which handles moving from TEXT with embedded JSON values, to JSONB.

I can start on this tomorrow.

I agree. I would prefer to use always Postgres. So we will not have these problems anymore...

My proposal would be...

:+1: to this.

One of the hassles of local postgres for development is just... setting up postgres. sqlite requires no upfront work for new developers.

Since we have openshift templates on file, maybe we can combine those with some instructions for minishift to make an easy but complete process for new devs.

I don't think we even need to bother with Minishift or anything. I wrote some quick Postgres setup instructions into the README in PR#130. I think it will be enough.

See PR#131. I haven't tackled writing a test case for the migrations yet, although I really want to. For now, I tested the upgrade and downgrade by hand, to ensure it successfully converts the JSON-encoded string values.

This is fixed in 0.8, although the next problem is that we only have Postgres 9.2 in Fedora, which lacks the JSONB column type: #134.

Metadata Update from @dcallagh:
- Issue assigned to dcallagh
- Issue close_status updated to: Fixed
- Issue set to the milestone: 0.8
- Issue status updated to: Closed (was: Open)

6 years ago

Login to comment on this ticket.

Metadata