#9234 Bodhi gives unusual number of 500 errors today: NoSuchColumnError: "Could not locate column in row for column 'comments.id'"
Opened 3 months ago by churchyard. Modified 2 months ago

I've checked https://status.fedoraproject.org/ and there is nothing out of the ordinary for bodhi, so I assume this is not known.

Every single time I load almost any bodhi page I get:

500 Internal Server Error

Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/pyramid_mako/__init__.py", line 148, in __call__
    result = template.render_unicode(**system)
  File "/usr/lib/python3.8/site-packages/mako/template.py", line 481, in render_unicode
    return runtime._render(
  File "/usr/lib/python3.8/site-packages/mako/runtime.py", line 878, in _render
    _render_context(
  File "/usr/lib/python3.8/site-packages/mako/runtime.py", line 920, in _render_context
    _exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
  File "/usr/lib/python3.8/site-packages/mako/runtime.py", line 947, in _exec_template
    callable_(context, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/bodhi/server/templates/master.html", line 142, in render_body
    ${self.body()}
  File "/usr/local/lib/python3.8/site-packages/bodhi/server/templates/update.html", line 198, in render_body
    <!--<h4>Comments <span class="badge badge-secondary">${len(update.comments)}</span>
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/orm/attributes.py", line 287, in __get__
    return self.impl.get(instance_state(instance), dict_)
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/orm/attributes.py", line 723, in get
    value = self.callable_(state, passive)
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/orm/strategies.py", line 759, in _load_for_state
    return self._emit_lazyload(
  File "<string>", line 1, in <lambda>

  File "/usr/lib64/python3.8/site-packages/sqlalchemy/orm/strategies.py", line 900, in _emit_lazyload
    q(session)
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/ext/baked.py", line 545, in all
    return list(self)
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/orm/loading.py", line 101, in instances
    cursor.close()
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/orm/loading.py", line 81, in instances
    rows = [proc(row) for row in fetch]
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/orm/loading.py", line 81, in <listcomp>
    rows = [proc(row) for row in fetch]
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/orm/loading.py", line 525, in _instance
    tuple([row[column] for column in pk_cols]),
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/orm/loading.py", line 525, in <listcomp>
    tuple([row[column] for column in pk_cols]),
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/engine/result.py", line 681, in _key_fallback
    util.raise_(
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
NoSuchColumnError: "Could not locate column in row for column 'comments.id'"

A quick refresh makes it go away until I load a new page.

I also got the Could not locate column in row for column 'comments.id' message in the bottom right corner when I was:

  • creating an update (according to the UI, the update was not successful, but it was in fact created)

  • edition an update to switch builds (according to the UI, the edit was not successful, and it indeed wasn't, next submit made it successful but the update does not admit the build swap ever happened - https://bodhi.fedoraproject.org/updates/FEDORA-2020-982b2950db (I can only see in me email notification that the build NEVR got changed)


@cverna can you help on this and also let us know who to hand this to in future.

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: bodhi, groomed, medium-gain, medium-trouble, ops

3 months ago

I think I have fixed it for now with https://github.com/fedora-infra/bodhi/commit/8f7d7859330e1b370bbeb02e9c9f6f7a53f94dc3.

I am mostly away from keyboard this weekend but i ll keep monitoring this ticket, and look for a proper fix next week.

Ok happens again :( so going to try reverting back to bodhi 5.4.1

Ok so this is still happening in 5.4.1 so I reverted back to 5.5 we are getting the following traceback

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/pyramid/tweens.py", line 41, in excview_tween
    response = handler(request)
  File "/usr/lib/python3.8/site-packages/pyramid/router.py", line 147, in handle_request
    response = _call_view(
  File "/usr/lib/python3.8/site-packages/pyramid/view.py", line 667, in _call_view
    response = view_callable(context, request)
  File "/usr/lib/python3.8/site-packages/pyramid/config/views.py", line 169, in __call__
    return view(context, request)
  File "/usr/lib/python3.8/site-packages/pyramid/config/views.py", line 188, in attr_view
    return view(context, request)
  File "/usr/lib/python3.8/site-packages/pyramid/config/views.py", line 214, in predicate_wrapper
    return view(context, request)
  File "/usr/lib/python3.8/site-packages/pyramid/viewderivers.py", line 436, in rendered_view
    result = view(context, request)
  File "/usr/lib/python3.8/site-packages/pyramid/viewderivers.py", line 144, in _requestonly_view
    response = view(request)
  File "/usr/lib/python3.8/site-packages/cornice/service.py", line 502, in wrapper
    response = view_(request)
  File "/usr/local/lib/python3.8/site-packages/bodhi/server/services/releases.py", line 327, in query_releases_json
    total = db.execute(count_query).scalar()
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/orm/scoping.py", line 163, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1291, in execute
    return self._connection_for_bind(bind, close_with_result=True).execute(
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1014, in execute
    return meth(self, multiparams, params)
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1127, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
    util.raise_(
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
    self.dialect.do_execute(
  File "/usr/lib64/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) lost synchronization with server: got message type "e", length 1937009006

[SQL: SELECT count(DISTINCT releases.id) AS count_1 
FROM releases]
(Background on this error at: http://sqlalche.me/e/13/e3q8 )

I have also attached the pod logs
bodhi-web-58.log

Could it be caused by some dependency change? I see the infra repo was added to the container two days ago, maybe something was upgraded to a version that causes problems.

Every single time I load almost any bodhi page I get

I don't seem to be able to reproduce this, though I don't see a resolution here or in the upstream ticket.

What's the current status? Can someone still reproduce this?

~30 minutes ago over the API:

$ fedora-update-feedback 
Username: churchyard
FAS Password: 
Authenticating with bodhi ...
Querying dnf for installed packages ...
Querying bodhi for updates ...
 Updates (testing): [ =======================================                                                                                                                                                             ]  20% Error: "Remote bodhi instance returned an error message: BodhiError {\n    errors: [\n        {\n            \"location\": \"body\",\n            \"description\": \"\\\"Could not locate column in row for column \\\'comments.id\\\'\\\"\",\n            \"name\": \"NoSuchColumnError\",\n        },\n    ],\n    status: \"error\",\n}"

The web interface does not give the error that often (possibly not at all) now.

I think perhaps @cverna fixed this? Is anyone still seeing it?

Issue status updated to: Closed (was: Open)
Issue close_status updated to: Fixed

3 months ago

Issue status updated to: Open (was: Closed)

3 months ago

Still seeing this :-( - a couple of minutes ago [Aug 25 around 15:50 UTC]

curl 'https://bodhi.fedoraproject.org/updates/?rows_per_page=10&releases=F32F'

Gave me a response:

{"status": "error", "errors": [{"location": "body", "name": "NoSuchColumnError", "description": "\"Could not locate column in row for column 'testcases.id'\""}]}

Might be a different problem, previously I always got comments.id.

The underlying traceback that @cverna pasted: https://pagure.io/fedora-infrastructure/issue/9234#comment-671224 seems to indicate a problem with the connection from postgresql getting in a bad state - so I doubt the exact reported error matters.

From some quick google searching, it seems likely that the problem is either:
a) a concurrency problem - perhaps the two concurrent threads in the WSGI daemon process are sharing resources but not locking them [or same with Bodhi's composer threads]
b) the postgresql server unexpectedly closing the connections to Bodhi - because of query timeouts or something

Temporarily changing the WSGI configuration to a single thread per process might one place to start.

However, before that I scaled the web pod to 1 instance instead of 2 and that seems to be helping?

So, yeah, when there's just 1 web pod it seems to be happy?

So what should we do here? Does that help us isolate what the problem might be?

The error I got now:

500 Internal Server Error

"Could not locate column in row for column 'builds.type'"

Login to comment on this ticket.

Metadata
Boards 1
ops Status: In Review
Attachments 2
Attached 3 months ago View Comment
png
Attached 3 months ago View Comment