#8131 openqa playbook failing on rabbit/queue , reason not clear
Closed: Fixed 4 years ago by kevin. Opened 4 years ago by adamwill.

Hi folks! The openqa playbook in ansible is failing on RabbitMQ queue tasks, but I'm not sure why. It fails like this:

PLAY [configure fedora-messaging queues on openQA servers] ***********************************************************

TASK [rabbit/queue : Validate parameters] ****************************************************************************
Thursday 22 August 2019  21:23:40 +0000 (0:00:01.283)       0:06:22.927 ******* 
ok: [openqa01.qa.fedoraproject.org] => {
    "changed": false, 
    "msg": "All assertions passed"
}
ok: [openqa-stg01.qa.fedoraproject.org] => {
    "changed": false, 
    "msg": "All assertions passed"
}

TASK [rabbit/queue : Create the openqa user in RabbitMQ] *************************************************************
Thursday 22 August 2019  21:23:40 +0000 (0:00:00.104)       0:06:23.031 ******* 
ok: [openqa-stg01.qa.fedoraproject.org -> rabbitmq01.phx2.fedoraproject.org]
ok: [openqa01.qa.fedoraproject.org -> rabbitmq01.phx2.fedoraproject.org]

TASK [rabbit/queue : Create the openqa_scheduler queue in RabbitMQ] **************************************************
Thursday 22 August 2019  21:23:44 +0000 (0:00:03.556)       0:06:26.588 ******* 
fatal: [openqa01.qa.fedoraproject.org -> rabbitmq01.phx2.fedoraproject.org]: FAILED! => {"changed": false, "details": "{\"error\":\"not_authorised\",\"reason\":\"User not authorised to access virtual host\"}", "msg": "Invalid response from RESTAPI when trying to check if queue exists"}
fatal: [openqa-stg01.qa.fedoraproject.org -> rabbitmq01.phx2.fedoraproject.org]: FAILED! => {"changed": false, "details": "{\"error\":\"not_authorised\",\"reason\":\"User not authorised to access virtual host\"}", "msg": "Invalid response from RESTAPI when trying to check if queue exists"}

The relevant playbook section is this:

- name: configure fedora-messaging queues on openQA servers
  hosts: openqa:openqa_stg
  user: root
  gather_facts: True

  vars_files:
   - /srv/web/infra/ansible/vars/global.yml
   - "/srv/private/ansible/vars.yml"
   - /srv/web/infra/ansible/vars/{{ ansible_distribution }}.yml

  roles:
  - role: rabbit/queue
    username: "openqa"
    queue_name: "openqa{{ openqa_env_suffix }}_scheduler"
    routing_keys:
        - "org.fedoraproject.prod.pungi.compose.status.change"
        - "org.fedoraproject.prod.bodhi.update.request.testing"
        - "org.fedoraproject.prod.bodhi.update.edit"
    write_queues:
        - "ci"
        - "openqa"
    vars:
      # yes, even the staging scheduler listens to production, it
      # has to or else it wouldn't schedule any jobs
      env: "production"
      env_suffix: ""
    tags: ['rabbit']

I cannot figure out why it's failing :( and I'm pretty sure I haven't touched it since the last time it worked. @jcline , can you help at all? Thanks.


Additionally, bodhi-backend is also failing the same way, making me think there's a core change that broke something.

@abompard any ideas?

Note, I'm trying to switch openQA production to fedora-messaging but can't do it until this is fixed...

ok, this should be fixed now.

As near as I can tell, the monitoring work added some perms for the nagios user on / (default vhost), which somehow made admin not have any privs on that (changed the defaults?).

Adding admin to have full perms on / also fixed things up.

:bouquet:

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Thanks kevin, that was really not obvious

Login to comment on this ticket.

Metadata