#962 CentOS CI can't provision any node
Closed: Fixed with Explanation 19 days ago by arrfab. Opened 19 days ago by mrc0mmand.

Hey!

For the last couple of hours CentOS CI can't seem to provision any node through Duffy - all attempts die with an error and the pools are empty:

$ duffy client list-sessions
{
  "action": "get",
  "sessions": []
}
$ duffy client request-session pool=virt-ec2-t2-centos-8s-x86_64,quantity=1
{
  "error": {
    "detail": "can't reserve nodes: quantity=1 pool='virt-ec2-t2-centos-8s-x86_64'"
  }
}
$ duffy client show-pool virt-ec2-t2-centos-8s-x86_64
{
  "action": "get",
  "pool": {
    "name": "virt-ec2-t2-centos-8s-x86_64",
    "fill_level": 10,
    "levels": {
      "provisioning": 0,
      "ready": 0,
      "contextualizing": 1,
      "deployed": 0,
      "deprovisioning": 0
    }
  }
}
$ duffy client show-pool metal-ec2-c5n-centos-8s-x86_64
{
  "action": "get",
  "pool": {
    "name": "metal-ec2-c5n-centos-8s-x86_64",
    "fill_level": 5,
    "levels": {
      "provisioning": 0,
      "ready": 0,
      "contextualizing": 0,
      "deployed": 0,
      "deprovisioning": 0
    }
  }
}

virt-ec2-t2-centos-8s-x86_64 seems to be back in action (thanks, @arrfab!), but there still seems to be something wrong with the metal-ec2-c5n-centos-8s-x86_64 pool. It was working fine for a while, but now it again fails to provision any machine - the provisioning field now keeps flipping between '5' and '0', without anything being provisioned:

$ duffy client show-pool metal-ec2-c5n-centos-8s-x86_64
{
  "action": "get",
  "pool": {
    "name": "metal-ec2-c5n-centos-8s-x86_64",
    "fill_level": 5,
    "levels": {
      "provisioning": 0,
      "ready": 0,
      "contextualizing": 0,
      "deployed": 1,
      "deprovisioning": 0
    }
  }
}

Metadata Update from @arrfab:
- Issue assigned to arrfab

19 days ago

Metadata Update from @arrfab:
- Issue priority set to: 🔥 Urgent 🔥 (was: Needs Review)
- Issue tagged with: centos-ci-infra, high-gain, medium-trouble

19 days ago

I had a quick look and we're actually limited by the number of available metal nodes in AWS (same problem as already discussed, as AWS api answers message": "We currently do not have sufficient c5n.metal capacity in the Availability Zone you requested ), so duffy still tries to provision some. For example here was last status :

duffy client show-pool metal-ec2-c5n-centos-8s-x86_64
{
  "action": "get",
  "pool": {
    "name": "metal-ec2-c5n-centos-8s-x86_64",
    "fill_level": 5,
    "levels": {
      "provisioning": 0,
      "ready": 0,
      "contextualizing": 0,
      "deployed": 6,
      "deprovisioning": 0
    }
  }
}

So already more than what you saw.

Now the explanation about why it failed : RHEL 8.7 was released yesterday and so duffy was updated earlier today. Problem is that ansible-core was bumped from 2.12 (using python 3.8) to 2.13 (forcing python 3.9) and duffy couldn't itself (through ansible-runner) import ansible, as not in the same python version.

We had that issue already on CentOS Stream 8 but the versionlock was different as the version initially pushed to Stream is different than the one that went to RHEL 8.7 GA.
We reflected that correct ansible-core-2.12.2-4.el8_6 NVR in our ansible inventory to lock to that ansible version and duffy-tasks was able again to call ansible to provision nodes.

Metadata Update from @arrfab:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

19 days ago

Login to comment on this ticket.

Metadata
Boards 1
CentOS CI Infra Status: Backlog