Creating ticket myself here to track request, but was discussed earlier today in #centos-ci irc channels with CI tenants (thanks @evgeni !)
Some Duffy ci nodes were stuck in provisioning for long time :
duffy=> select id,created_at, retired_at, state, pool from nodes where state='provisioning'; id | created_at | retired_at | state | pool --------+----------------------------+------------+--------------+-------------------------------- 376610 | 2024-06-06 08:15:15.801887 | | provisioning | metal-ec2-c5n-centos-9s-x86_64 376494 | 2024-06-05 16:40:46.14815 | | provisioning | metal-ec2-c5n-centos-9s-x86_64 (2 rows)
Metadata Update from @arrfab: - Issue assigned to arrfab
Metadata Update from @arrfab: - Issue tagged with: centos-ci-infra, feature-request, high-gain, low-trouble
for the record, I just manually updated DB to set these to "failed" state and then duffy tried again to provision enough nodes to match the fill-level for that pool.
Idea to be implemented : cron job that would look in DB and put such "stuck" nodes into failed mode, instead of waiting for CI tenants and/or monitoring (we also reports to zabbix usage, etc) to inform us.
Idea :
UPDATE nodes SET state='failed' WHERE state='provisioning' AND created_at<now()-interval '20min';
Implemented today
Metadata Update from @arrfab: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.