Today I noticed that the space on koji01.stg disk is getting low. When I started to investigate I found out that the httpd logs and kojira logs have multiple GB.
In httpd error_log there was this error over and over again:
error_log
2024-10-04 13:26:53,106 [WARNING] m=listRPMs u=None p=645502 r=10.3.166.74:52526 koji.xmlrpc: Traceback (most recent call last): File "/usr/lib/python3.12/site-packages/kojihub/kojixmlrpc.py", line 273, in _wrap_handler response = handler(environ) ^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/kojihub/kojixmlrpc.py", line 300, in handle_rpc return self._dispatch(method, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/kojihub/kojixmlrpc.py", line 324, in _dispatch self.check_session() File "/usr/lib/python3.12/site-packages/kojihub/kojixmlrpc.py", line 306, in check_session context.session = auth.Session() ^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/kojihub/auth.py", line 140, in __init__ raise koji.AuthError('Invalid session or bad credentials') koji.AuthError: Invalid session or bad credentials
I disabled both httpd and kojira on the machine as the space was running low. I kept them there for anybody to investigate.
I checked the koji-hub.keytab, but that seems fine. I also looked at the changes in ansible repository, but there isn't anything related to koji01.stg in the last commits.
koji01.stg
As it's staging I don't think it's that urgent, but would be good to solve that as soon as possible
According to the logs the issue started happening at [Fri Oct 04 02:33:21.386398 2024]. So it's filling the error log for few hours already.
[Fri Oct 04 02:33:21.386398 2024]
Found errors in kojira.logs about the read-only filesystem. Could this be the same issue that the builders are experiencing?
EDIT: This seems to be happening for few days already, probably nothing related to the issue
It was mostly the kojira logs from before I did a prod->staging sync. The read-only errors there were 100% correct. The way we setup koji in staging is to have prod koji volume mounted as a seperate volume (read-only). It had gotten so far out of sync tho that it was trying to remove old repos on the production volume. It definitely shouldn't be allowed to do that. ;)
I nuked the kojira log and restarted things.
I am not sure what that invalid session was from. It has the proxy ip of course and user is "None" so its not logged in. It's not happening now that I see. ;(
So, I think everything is back to normal. Sorry for not seeing the gigantic kojira log.
Metadata Update from @kevin: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
Thanks for fixing that. The issue was more with the httpd error_log which had around 20 GB if I remember correctly.
Log in to comment on this ticket.