Sometimes, when pushing to dist-git, the script that sends the fedora-messaging payload (Note: I am assuming here) times out.
~/FedoraWork/fedora/kde-updates-work/plasma/libksysguard ~/FedoraWork/fedora/kde-updates-work/plasma [rawhide 45d84fc] 6.4.91 3 files changed, 10 insertions(+), 3 deletions(-) Énumération des objets: 9, fait. Décompte des objets: 100% (9/9), fait. Compression par delta en utilisant jusqu'à 24 fils d'exécution Compression des objets: 100% (5/5), fait. Écriture des objets: 100% (5/5), 746 octets | 746.00 Kio/s, fait. Total 5 (delta 2), réutilisés 0 (delta 0), réutilisés du paquet 0 (depuis 0) remote: Emitting a message to the fedora-messaging message bus. remote: * Publishing information for 1 commits remote: Traceback (most recent call last): remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2262, in _wrap_pool_connect remote: return fn() remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 354, in connect remote: return _ConnectionFairy._checkout(self) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 751, in _checkout remote: fairy = _ConnectionRecord.checkout(pool) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 483, in checkout remote: rec = pool._do_get() remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 138, in _do_get remote: self._dec_overflow() remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__ remote: compat.reraise(exc_type, exc_value, exc_tb) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 129, in reraise remote: raise value remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 135, in _do_get remote: return self._create_connection() remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 299, in _create_connection remote: return _ConnectionRecord(self) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 428, in __init__ remote: self.__connect(first_connect_check=True) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 630, in __connect remote: connection = pool._invoke_creator(self) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect remote: return dialect.connect(*cargs, **cparams) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 453, in connect remote: return self.dbapi.connect(*cargs, **cparams) remote: File "/usr/lib64/python3.6/site-packages/psycopg2/__init__.py", line 130, in connect remote: conn = _connect(dsn, connection_factory=connection_factory, **kwasync) remote: psycopg2.OperationalError: could not connect to server: Connection timed out remote: Is the server running on host "db01.rdu3.fedoraproject.org" (10.16.163.41) and accepting remote: TCP/IP connections on port 5432? remote: remote: remote: The above exception was the direct cause of the following exception: remote: remote: Traceback (most recent call last): remote: File "/usr/lib/python3.6/site-packages/pagure/hooks/files/post-receive", line 48, in <module> remote: run_hook_file(hooktype) remote: File "/usr/lib/python3.6/site-packages/pagure/hooks/__init__.py", line 551, in run_hook_file remote: session, repo, user=username, namespace=namespace remote: File "/usr/lib/python3.6/site-packages/pagure/lib/query.py", line 2874, in _get_project remote: return query.one() remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3275, in one remote: ret = self.one_or_none() remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3244, in one_or_none remote: ret = list(self) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3317, in __iter__ remote: return self._execute_and_instances(context) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3339, in _execute_and_instances remote: querycontext, self._connection_from_session, close_with_result=True remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3354, in _get_bind_args remote: mapper=self._bind_mapper(), clause=querycontext.statement, **kw remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3332, in _connection_from_session remote: conn = self.session.connection(**kw) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 1123, in connection remote: execution_options=execution_options, remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 1129, in _connection_for_bind remote: engine, execution_options remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 430, in _connection_for_bind remote: conn = bind._contextual_connect() remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2226, in _contextual_connect remote: self._wrap_pool_connect(self.pool.connect, None), remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2266, in _wrap_pool_connect remote: e, dialect, self remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1536, in _handle_dbapi_exception_noconnection remote: util.raise_from_cause(sqlalchemy_exception, exc_info) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 383, in raise_from_cause remote: reraise(type(exception), exception, tb=exc_tb, cause=cause) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 128, in reraise remote: raise value.with_traceback(tb) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2262, in _wrap_pool_connect remote: return fn() remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 354, in connect remote: return _ConnectionFairy._checkout(self) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 751, in _checkout remote: fairy = _ConnectionRecord.checkout(pool) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 483, in checkout remote: rec = pool._do_get() remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 138, in _do_get remote: self._dec_overflow() remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__ remote: compat.reraise(exc_type, exc_value, exc_tb) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 129, in reraise remote: raise value remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 135, in _do_get remote: return self._create_connection() remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 299, in _create_connection remote: return _ConnectionRecord(self) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 428, in __init__ remote: self.__connect(first_connect_check=True) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 630, in __connect remote: connection = pool._invoke_creator(self) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect remote: return dialect.connect(*cargs, **cparams) remote: File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 453, in connect remote: return self.dbapi.connect(*cargs, **cparams) remote: File "/usr/lib64/python3.6/site-packages/psycopg2/__init__.py", line 130, in connect remote: conn = _connect(dsn, connection_factory=connection_factory, **kwasync) remote: sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not connect to server: Connection timed out remote: Is the server running on host "db01.rdu3.fedoraproject.org" (10.16.163.41) and accepting remote: TCP/IP connections on port 5432? remote: remote: (Background on this error at: http://sqlalche.me/e/e3q8) To ssh://pkgs.fedoraproject.org/rpms/libksysguard 836d9d0..45d84fc rawhide -> rawhide
@abompard you have any ideas here? why does fedora-messaging even need to connect to the db? or is this something after the message?
db01 is getting some weird load spikes... but it's never hitting max connections (2000) so it should not ever be timing out.
Metadata Update from @james: - Issue tagged with: medium-gain, medium-trouble
Metadata Update from @james: - Issue assigned to abompard
Metadata Update from @james: - Issue priority set to: Waiting on Assignee (was: Needs Review)
No, this is before sending the message. Pagure gathers information about the project in pagure/hooks/__init__.py line 551 to build the data that will be sent with the message.
pagure/hooks/__init__.py
Yeah and it's probably unrelated to the connection pool timeout on the client side (that we changed when moving datacenters) because the connection pool has just been created by the hook on startup. Could this be a network thing ? (there was a MTU issue at some point in the new DC no?)
Ah ha.
Yep.
I wonder if this is related to the timeouts we are seeing from kojipkgs and pkgs01 from proxies... thats the same vlan crossing (prod vs build).
I had a similar issue in https://paste.sr.ht/~gotmax23/ae3f42c566407fb1420a3aa31c7d8165b50fbfcb, although in that case, the push seems to have been rejected completely. When I pushed it a second time, it worked.
I also got a different variation today. Git pushing would fail with the error that not all sources are uploaded, when they were. Trying again would work.
That's probably an issue with the fedpkg git push hook and not an infra problem.
fedpkg
So, unfortunately I don't see this error logged on the server side, so it's hard to try and match it up to anything on the db server at the same time. ;(
Perhaps we can enable more debugging logs for hooks?
Or if any of you see this again, can you record the exact time it happened at?
Sure, I'm about to do quite a bit of git pushing in the next days so I can let you know!
Hey @farchord , any news on this? Thanks! :-)
It hasn't happened in a while since you had those firewall changes done.
ok, lets go ahead and close this now... if anyone sees it feel free to re-open or file a new issue.
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.