When I committed https://pagure.io/fedora-infra/ansible/c/810326a4413ddcb037b0661b01267ca0e8eb69d7?branch=master the change wasn't propagated to batcave.
I had to create another commit, which synced the repo immediately: https://pagure.io/fedora-infra/ansible/c/22e6aebc845aa54033fe36e83f1d83d1a226473b?branch=master
Hm this is "amusing":
on batcave01:
$ journalctl -lru mirror_pagure_ansible --since=2020-05-12 |grep /srv/web/infra/ansible May 12 04:44:53 batcave01.phx2.fedoraproject.org May 12 04:19:58 batcave01.phx2.fedoraproject.org May 12 04:04:01 batcave01.phx2.fedoraproject.org May 12 00:00:36 batcave01.phx2.fedoraproject.org
on batcave13:
# journalctl -lru mirror_pagure_ansible --since=2020-05-12 |grep /srv/web/infra/ansible May 12 04:44:49 batcave13.rdu2.fedoraproject.org May 12 04:32:31 batcave13.rdu2.fedoraproject.org May 12 04:19:54 batcave13.rdu2.fedoraproject.org May 12 04:14:14 batcave13.rdu2.fedoraproject.org May 12 04:03:59 batcave13.rdu2.fedoraproject.org May 12 00:00:35 batcave13.rdu2.fedoraproject.org
So batcave01 "missed" 2 messages that batcave13 saw. I'm wondering if there is something up with the queue that batcave01 uses since it's the one that was used when setting things up.
Some more debugging info:
on rabbitmq01:
# rabbitmqctl list_consumers -p /pubsub |grep mirror mirror_pagure_ansible_13 <rabbit@rabbitmq02.phx2.fedoraproject.org.2.28264.3077> 6c08520f-8634-4154-9871-55aa1b228054 true 0 [] mirror_pagure_ansible <rabbit@rabbitmq03.phx2.fedoraproject.org.2.27068.5779> c29fbadb-8ee9-498b-84da-d92de20785ef true 0 [] mirror_pagure_ansible <rabbit@rabbitmq03.phx2.fedoraproject.org.2.22890.6304> 139c705a-fe5c-4004-96fb-137485f98d77 true 0 []
So it looks like we have two consumer for the queue that batcave01 uses (in which case messages are sent to the consumers in a round robin fashion). This explains the behaviour we're seeing but I'm confused as to how it got this way :(
@abompard and I have scheduled sometime to dive a little more into this as currently I don't understand what's going on
Metadata Update from @pingou: - Issue assigned to pingou - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: rabbitmq
ahhhh. I bet I know. :)
I stood up a batcave01.iad2.fedoraproject.org I wonder if this one connected to the same queue as batcave01.phx2?
Ahahah, https://pagure.io/fedora-infra/ansible/blob/master/f/roles/mirror_pagure_ansible/templates/mirror_pagure_ansible.cfg#_37 bingo!
\รณ/
So I've adjusted ansible so that the 3 bat caves have 3 different queues and this is the current outcome:
# rabbitmqctl list_consumers -p /pubsub |grep mirror mirror_pagure_ansible_13 <rabbit@rabbitmq02.phx2.fedoraproject.org.2.28264.3077> 6c08520f-8634-4154-9871-55aa1b228054 true 0 [] mirror_pagure_ansible <rabbit@rabbitmq03.phx2.fedoraproject.org.2.7017.6347> bc047838-138d-41e1-87d7-9fa0ccd2981f true 0 [] mirror_pagure_ansible_iad2 <rabbit@rabbitmq02.phx2.fedoraproject.org.2.5191.3578> 67db4678-8e91-4bdb-81f2-6cce227135f1 true 0 []
So for me, this should be fixed
From what I can see, the last two commits pushed to the ansible repo made it on all three machines.
Let's close this and re-open if needed.
Thanks for raising this @praiskup !
Metadata Update from @pingou: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.