#962 CentOS CI can't provision any node
Closed: Fixed with Explanation 2 years ago by arrfab. Opened 2 years ago by mrc0mmand.

Hey!

For the last couple of hours CentOS CI can't seem to provision any node through Duffy - all attempts die with an error and the pools are empty:

$ duffy client list-sessions
{
  "action": "get",
  "sessions": []
}
$ duffy client request-session pool=virt-ec2-t2-centos-8s-x86_64,quantity=1
{
  "error": {
    "detail": "can't reserve nodes: quantity=1 pool='virt-ec2-t2-centos-8s-x86_64'"
  }
}
$ duffy client show-pool virt-ec2-t2-centos-8s-x86_64
{
  "action": "get",
  "pool": {
    "name": "virt-ec2-t2-centos-8s-x86_64",
    "fill_level": 10,
    "levels": {
      "provisioning": 0,
      "ready": 0,
      "contextualizing": 1,
      "deployed": 0,
      "deprovisioning": 0
    }
  }
}
$ duffy client show-pool metal-ec2-c5n-centos-8s-x86_64
{
  "action": "get",
  "pool": {
    "name": "metal-ec2-c5n-centos-8s-x86_64",
    "fill_level": 5,
    "levels": {
      "provisioning": 0,
      "ready": 0,
      "contextualizing": 0,
      "deployed": 0,
      "deprovisioning": 0
    }
  }
}

virt-ec2-t2-centos-8s-x86_64 seems to be back in action (thanks, @arrfab!), but there still seems to be something wrong with the metal-ec2-c5n-centos-8s-x86_64 pool. It was working fine for a while, but now it again fails to provision any machine - the provisioning field now keeps flipping between '5' and '0', without anything being provisioned:

$ duffy client show-pool metal-ec2-c5n-centos-8s-x86_64
{
  "action": "get",
  "pool": {
    "name": "metal-ec2-c5n-centos-8s-x86_64",
    "fill_level": 5,
    "levels": {
      "provisioning": 0,
      "ready": 0,
      "contextualizing": 0,
      "deployed": 1,
      "deprovisioning": 0
    }
  }
}

Metadata Update from @arrfab:
- Issue assigned to arrfab

2 years ago

Metadata Update from @arrfab:
- Issue priority set to: 🔥 Urgent 🔥 (was: Needs Review)
- Issue tagged with: centos-ci-infra, high-gain, medium-trouble

2 years ago

I had a quick look and we're actually limited by the number of available metal nodes in AWS (same problem as already discussed, as AWS api answers message": "We currently do not have sufficient c5n.metal capacity in the Availability Zone you requested ), so duffy still tries to provision some. For example here was last status :

duffy client show-pool metal-ec2-c5n-centos-8s-x86_64
{
  "action": "get",
  "pool": {
    "name": "metal-ec2-c5n-centos-8s-x86_64",
    "fill_level": 5,
    "levels": {
      "provisioning": 0,
      "ready": 0,
      "contextualizing": 0,
      "deployed": 6,
      "deprovisioning": 0
    }
  }
}

So already more than what you saw.

Now the explanation about why it failed : RHEL 8.7 was released yesterday and so duffy was updated earlier today. Problem is that ansible-core was bumped from 2.12 (using python 3.8) to 2.13 (forcing python 3.9) and duffy couldn't itself (through ansible-runner) import ansible, as not in the same python version.

We had that issue already on CentOS Stream 8 but the versionlock was different as the version initially pushed to Stream is different than the one that went to RHEL 8.7 GA.
We reflected that correct ansible-core-2.12.2-4.el8_6 NVR in our ansible inventory to lock to that ansible version and duffy-tasks was able again to call ansible to provision nodes.

Metadata Update from @arrfab:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

2 years ago

Log in to comment on this ticket.

Metadata
Boards 1
CentOS CI Infra Status: Backlog