#8977 Greenwave fails to access rabbitmq in iad2
Closed: Fixed 3 years ago by cverna. Opened 3 years ago by pingou.

When trying to deploy greenwave in openshift in iad2, it looks like the fedora-messaging consumer/pod is failing to start.

The logs show the error:

[fedora_messaging.cli ERROR] The TCP connection appears to have started, but the TLS or AMQP handshake with the broker failed; check your connection and authentication parameters and ensure your user has permission to access the vhost

Could this be a port issue?


Note: I figure this is also going to impact bodhi when we get to it, but I guess solving this one will also solve it for it (bodhi).

@abompard Can you take a look at it? Thanks.

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: groomed, high-gain, medium-trouble

3 years ago

This is an issue also for ODCS (ticket closed as duplicate: https://pagure.io/fedora-infrastructure/issue/8978).

At least copr deployment scripts seem to be affected:

 __________________________________________________                            
< TASK [rabbit/user : Create the user in RabbitMQ] >                           
 --------------------------------------------------                            
       \   ,__,                                                                
        \  (oo)____                                                            
           (__)    )\                                                          
              ||--|| *                                                         

Saturday 06 June 2020  09:58:06 +0000 (0:00:00.070)       0:06:52.890 *********
Saturday 06 June 2020  09:58:06 +0000 (0:00:00.070)       0:06:52.890 *********
fatal: [copr-be-dev.aws.fedoraproject.org]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"rabbitmq01.stg.aws.fedoraproject.org\". Make sure this host can be reached over ssh: ssh: Could not resolve ho
stname rabbitmq01.stg.aws.fedoraproject.org: Name or service not known\r\n", "unreachable": true}

OK @praiskup 's problem here is slightly different:

the def file roles/rabbit/queue/defaults/main.yml was changed to have an environment variable trying to set up the rabbitmq per datacenter. Your system is in the aws datacenter so it it going to break. I will try to figure out a fix.

So, rabbitmq.fedoraproject.org in iad2 is resolving to:

rabbitmq.fedoraproject.org has address 209.132.181.15
rabbitmq.fedoraproject.org has address 209.132.181.16

it should be connecting to the phx2 (currently active cluster). I am not sure why it's not, thats the part that needs more investigation.

The second issue is a change I made to make iad2 playbooks work back when we were initially installing/testing things. We can likely revert that to phx2 now and then when we move the rabbitmq cluster to iad2 monday we can switch it to iad2. Or we could just leave it and switch it monday.

Can everyone please try this again? I think it's working (I see greenwave connected at least).

@kevin, ODCS can connect to rabbitmq now, but it timeouts on Authenticating with server using x509 (certfile: /etc/odcs/odcs-rabbitmq.crt, keyfile: /etc/odcs/odcs-rabbitmq.key)

I'm also not sure how sane the current configuration is. Based on your comment, it seems iad2 services are connected to the same rabbitmq instance as phx2 services which probably mixes them together.

right now there are 2 independent clusters. One in iad2 and one in phx2.

All instances everywhere should resolve rabbitmq.fedoraproject.org to the phx2 one.

However, we are moving that one later today and then there will be only one cluster... in iad2.

It seems this is still/again broken for ODCS:

Connection workflow failed: AMQPConnectionWorkflowFailed: 2 e
xceptions in all; last exception - AMQPConnectorSocketConnectError: timeout("TCP connection attempt timed out: 'rabbitmq.fedoraproject.org'/(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('10.
3.163.77', 5671))",); first exception - AMQPConnectorSocketConnectError: timeout("TCP connection attempt timed out: 'rabbitmq.fedoraproject.org'/(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', 
('10.3.163.76', 5671))",)

greenwave consumer is now up and running, closing this ticket. Another ticket was created to track the other hosts that have an issue with the connection to rabbitmq --> https://pagure.io/fedora-infrastructure/issue/9003

Metadata Update from @cverna:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata