odcs-backend01 has two nagios alerts, we should fix whatever is causing them:
A alert for 'Check for fedmsg-hub-3 proc'. If this service has moved to fedora-messaging,, we should remove this check.
Disk_Space_/ It looks like it's unable to remove old composes, as they have a different uid. Perhaps a chown needs to be done?
CC members of sysadmin-odcs: @cverna @jkaluza @lsedlar @mizdebsk
@kevin, please remove the fedmsg-hub-3 process check, it uses celery now. You can add check for "odcs-celery-backend" systemd service status instead.
@jkaluza "Check for fedmsg-hub-3 proc" check checks for odcs-celery-backend process
The logs of the worker are getting filled by the following
[2019-12-18 14:54:30,363: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/work/x86_64/repo/repodata/3f4c84c31509db46efd766d692846ebe78e87d4bf751d99fc334cc3677c55b86-other.xml.gz: "PermissionError(13, 'Permission denied')" [2019-12-18 14:54:30,363: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/work/x86_64/repo/repodata/fcdc749b2e2dae6a0568a7036a9bd0e0c72dea8ca2f43ff907b34bd73094e7a4-primary.xml.gz: "PermissionError(13, 'Permission denied')" [2019-12-18 14:54:30,363: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/work/x86_64/repo/repodata/repomd.xml: "PermissionError(13, 'Permission denied')" [2019-12-18 14:54:30,363: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/work/x86_64/repo/repodata: "OSError(39, 'Directory not empty')" [2019-12-18 14:54:30,363: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/work/x86_64/repo: "OSError(39, 'Directory not empty')" [2019-12-18 14:54:30,363: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/work/x86_64/repo_package_list/Temporary.x86_64.debuginfo.conf: "PermissionError(13, 'Permission denied')" [2019-12-18 14:54:30,363: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/work/x86_64/repo_package_list/Temporary.x86_64.rpm.conf: "PermissionError(13, 'Permission denied')" [2019-12-18 14:54:30,364: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/work/x86_64/repo_package_list: "OSError(39, 'Directory not empty')" [2019-12-18 14:54:30,364: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/work/x86_64: "OSError(39, 'Directory not empty')" [2019-12-18 14:54:30,364: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/work: "OSError(39, 'Directory not empty')" [2019-12-18 14:54:30,364: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/COMPOSE_ID: "PermissionError(13, 'Permission denied')" [2019-12-18 14:54:30,364: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0/STATUS: "PermissionError(13, 'Permission denied')" [2019-12-18 14:54:30,364: WARNING/ForkPoolWorker-2] Cannot remove some files in /srv/odcs/odcs-292-1-20190509.n.0: "OSError(39, 'Directory not empty')"
It looks like the cleanup tasks does not have the permission to delete these repositories. Looking at /srv/odcs/ at lot of the directories are owned by :
/srv/odcs/
drwxr-xr-x. 5 986 dnsmasq 4096 Nov 2 18:58 odcs-653-1-20191102.n.0 drwxr-xr-x. 5 986 dnsmasq 4096 Nov 2 21:08 odcs-655-1-20191102.n.0 drwxr-xr-x. 5 986 dnsmasq 4096 Nov 3 12:06 odcs-658-1-20191103.n.0 drwxr-xr-x. 5 986 dnsmasq 4096 Nov 3 13:28 odcs-659-1-20191103.n.0 drwxr-xr-x. 5 986 dnsmasq 4096 Nov 5 12:36 odcs-660-1-20191105.n.0 drwxr-xr-x. 5 986 dnsmasq 4096 Nov 5 12:38 odcs-661-1-20191105.n.0
Running chown -R odcs:fedmsg odcs-* on /srv/odcs and restart the celery service seems to have done the trick.
chown -R odcs:fedmsg odcs-*
Metadata Update from @cverna: - Issue assigned to cverna
I have replaced the old process check by a nagios service check
https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=f92813ef70842ee4597ed2fcb992fb6bc3f5d182
Metadata Update from @cverna: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.