Hey!
For the last couple of hours CentOS CI can't seem to provision any node through Duffy - all attempts die with an error and the pools are empty:
$ duffy client list-sessions { "action": "get", "sessions": [] } $ duffy client request-session pool=virt-ec2-t2-centos-8s-x86_64,quantity=1 { "error": { "detail": "can't reserve nodes: quantity=1 pool='virt-ec2-t2-centos-8s-x86_64'" } } $ duffy client show-pool virt-ec2-t2-centos-8s-x86_64 { "action": "get", "pool": { "name": "virt-ec2-t2-centos-8s-x86_64", "fill_level": 10, "levels": { "provisioning": 0, "ready": 0, "contextualizing": 1, "deployed": 0, "deprovisioning": 0 } } } $ duffy client show-pool metal-ec2-c5n-centos-8s-x86_64 { "action": "get", "pool": { "name": "metal-ec2-c5n-centos-8s-x86_64", "fill_level": 5, "levels": { "provisioning": 0, "ready": 0, "contextualizing": 0, "deployed": 0, "deprovisioning": 0 } } }
virt-ec2-t2-centos-8s-x86_64 seems to be back in action (thanks, @arrfab!), but there still seems to be something wrong with the metal-ec2-c5n-centos-8s-x86_64 pool. It was working fine for a while, but now it again fails to provision any machine - the provisioning field now keeps flipping between '5' and '0', without anything being provisioned:
virt-ec2-t2-centos-8s-x86_64
metal-ec2-c5n-centos-8s-x86_64
provisioning
$ duffy client show-pool metal-ec2-c5n-centos-8s-x86_64 { "action": "get", "pool": { "name": "metal-ec2-c5n-centos-8s-x86_64", "fill_level": 5, "levels": { "provisioning": 0, "ready": 0, "contextualizing": 0, "deployed": 1, "deprovisioning": 0 } } }
Metadata Update from @arrfab: - Issue assigned to arrfab
Metadata Update from @arrfab: - Issue priority set to: 🔥 Urgent 🔥 (was: Needs Review) - Issue tagged with: centos-ci-infra, high-gain, medium-trouble
I had a quick look and we're actually limited by the number of available metal nodes in AWS (same problem as already discussed, as AWS api answers message": "We currently do not have sufficient c5n.metal capacity in the Availability Zone you requested ), so duffy still tries to provision some. For example here was last status :
message": "We currently do not have sufficient c5n.metal capacity in the Availability Zone you requested
duffy client show-pool metal-ec2-c5n-centos-8s-x86_64 { "action": "get", "pool": { "name": "metal-ec2-c5n-centos-8s-x86_64", "fill_level": 5, "levels": { "provisioning": 0, "ready": 0, "contextualizing": 0, "deployed": 6, "deprovisioning": 0 } } }
So already more than what you saw.
Now the explanation about why it failed : RHEL 8.7 was released yesterday and so duffy was updated earlier today. Problem is that ansible-core was bumped from 2.12 (using python 3.8) to 2.13 (forcing python 3.9) and duffy couldn't itself (through ansible-runner) import ansible, as not in the same python version.
ansible-runner
We had that issue already on CentOS Stream 8 but the versionlock was different as the version initially pushed to Stream is different than the one that went to RHEL 8.7 GA. We reflected that correct ansible-core-2.12.2-4.el8_6 NVR in our ansible inventory to lock to that ansible version and duffy-tasks was able again to call ansible to provision nodes.
ansible-core-2.12.2-4.el8_6
Metadata Update from @arrfab: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.