#1424 [RFE] duffy/CI : ensuring nodes stuck in provisioning mode will be failing and so reprovisioned
Closed: Fixed 7 months ago by arrfab. Opened 7 months ago by arrfab.

Creating ticket myself here to track request, but was discussed earlier today in #centos-ci irc channels with CI tenants (thanks @evgeni !)

Some Duffy ci nodes were stuck in provisioning for long time :

duffy=> select id,created_at, retired_at, state, pool from nodes where state='provisioning';
   id   |         created_at         | retired_at |    state     |              pool              
--------+----------------------------+------------+--------------+--------------------------------
 376610 | 2024-06-06 08:15:15.801887 |            | provisioning | metal-ec2-c5n-centos-9s-x86_64
 376494 | 2024-06-05 16:40:46.14815  |            | provisioning | metal-ec2-c5n-centos-9s-x86_64
(2 rows)

Metadata Update from @arrfab:
- Issue assigned to arrfab

7 months ago

Metadata Update from @arrfab:
- Issue tagged with: centos-ci-infra, feature-request, high-gain, low-trouble

7 months ago

for the record, I just manually updated DB to set these to "failed" state and then duffy tried again to provision enough nodes to match the fill-level for that pool.

Idea to be implemented : cron job that would look in DB and put such "stuck" nodes into failed mode, instead of waiting for CI tenants and/or monitoring (we also reports to zabbix usage, etc) to inform us.

Idea :

UPDATE nodes SET state='failed' WHERE state='provisioning' AND created_at<now()-interval '20min';

Metadata Update from @arrfab:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

7 months ago

Log in to comment on this ticket.

Metadata