#11640 Can not connect to rdu-cc lab over ipv4 from some locations
Closed: Fixed 6 months ago by praiskup. Opened 6 months ago by praiskup.

I'm unable to ssh -vvv -4 copr@vmhost-x86-copr01.rdu-cc.fedoraproject.org from
any of our machines hosted in us-east-1 location.

Tested from these instances (subnet subnet-0995f6a466849f4c3):

i-0ba958e718cb91fa3
i-062172568528899ab
i-0007e54a85c0e56b1

Log output:

[resalloc@copr-be ~][PROD]$ ssh -vvv -4 copr@vmhost-x86-copr01.rdu-cc.fedoraproject.org
OpenSSH_8.8p1, OpenSSL 3.0.9 30 May 2023
debug1: Reading configuration data /var/lib/resallocserver/.ssh/config
debug1: /var/lib/resallocserver/.ssh/config line 1: Applying options for *
debug1: /var/lib/resallocserver/.ssh/config line 9: Applying options for vmhost-x86-copr01.rdu-cc.fedoraproject.org
debug1: Reading configuration data /etc/ssh/ssh_config
debug3: /etc/ssh/ssh_config line 55: Including file /etc/ssh/ssh_config.d/50-redhat.conf depth 0
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug2: checking match for 'final all' host vmhost-x86-copr01.rdu-cc.fedoraproject.org originally vmhost-x86-copr01.rdu-cc.fedoraproject.org
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 3: not matched 'final'
debug2: match not found
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only)
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-]
debug3: kex names ok: [curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512]
debug1: configuration requests final Match pass
debug1: re-parsing configuration
debug1: Reading configuration data /var/lib/resallocserver/.ssh/config
debug1: /var/lib/resallocserver/.ssh/config line 1: Applying options for *
debug2: add_identity_file: ignoring duplicate key ~/.ssh/id_rsa
debug1: /var/lib/resallocserver/.ssh/config line 9: Applying options for vmhost-x86-copr01.rdu-cc.fedoraproject.org
debug1: Reading configuration data /etc/ssh/ssh_config
debug3: /etc/ssh/ssh_config line 55: Including file /etc/ssh/ssh_config.d/50-redhat.conf depth 0
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug2: checking match for 'final all' host vmhost-x86-copr01.rdu-cc.fedoraproject.org originally vmhost-x86-copr01.rdu-cc.fedoraproject.org
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 3: matched 'final'
debug2: match found
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-]
debug3: kex names ok: [curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512]
debug2: resolving "vmhost-x86-copr01.rdu-cc.fedoraproject.org" port 22
debug3: resolve_host: lookup vmhost-x86-copr01.rdu-cc.fedoraproject.org:22
debug3: ssh_connect_direct: entering
debug1: Connecting to vmhost-x86-copr01.rdu-cc.fedoraproject.org [8.43.85.57] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x48
debug2: fd 3 setting O_NONBLOCK
debug1: connect to address 8.43.85.57 port 22: Connection timed out
ssh: connect to host vmhost-x86-copr01.rdu-cc.fedoraproject.org port 22: Connection timed out

From my personal laptop (Brno, custom net provider) it seems to work fine.

It works fine when I connect using ipv6, ssh -vvv -6 ..., it just works.

I can reproduce this with all our rdu-cc machines, isn't RDU lab filtering
something on ipv4?


Fedora Copr can not start machines in RDU lab because of this, which means that the build throughput is very bad.

Workaround:
1. cron job every 2 minutes: for i in vmhost-x86-copr01.rdu-cc.fedoraproject.org vmhost-x86-copr02.rdu-cc.fedoraproject.org vmhost-x86-copr03.rdu-cc.fedoraproject.org vmhost-x86-copr04.rdu-cc.fedoraproject.org vmhost-p08-copr01.rdu-cc.fedoraproject.org vmhost-p08-copr02.rdu-cc.fedoraproject.org vmhost-p09-copr01.rdu-cc.fedoraproject.org; do ssh -6 copr@$i true 2>/dev/null; done
2. ssh config that keeps controlling chanel open

This way, even though -4 is requested, ssh doesn't open new controlling channel ...

Can you ssh from your AWS systems to people01.fedoraproject.org or pagure02.fedoraproject.org? Those are both in the same colocation. If you can't, I would then look at trying to connect to ports 443 and 80 on those systems.

My first guess is that there may be an outbound firewall rule for your 'zone' which has come into play. I do not know of any filtering beyond that which would be AWS specific (though it would help to know what IP address your systmes are coming out from to help debug this)

Metadata Update from @phsmoura:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: aws, low-gain, low-trouble, ops

6 months ago

No, people01 with a slightly different failure:

debug2: fd 3 setting O_NONBLOCK
debug1: connect to address 152.19.134.196 port 22: No route to host
ssh: connect to host people01.fedoraproject.org port 22: No route to host

people02 works.

I now better tested ICMP, and it has the same problems.

  • ping -6 to hypervisors works
  • ping -4 to hypervisors doesn't work
  • ping -6 and -4 to people02 works
  • neither ping -4 nor ping -6 works to people01
    And I noticed now that this actually isn't only about ICMP

Metadata Update from @praiskup:
- Issue untagged with: aws, low-gain, low-trouble, ops
- Issue priority set to: Needs Review (was: Waiting on Assignee)

6 months ago

oops I forgot that it was people02 AND it is not in the same datacentre. people01 was removed a long time ago but its DNS entry is still there

$ host people02.fedoraproject.org
people02.fedoraproject.org has address 152.19.134.199
people02.fedoraproject.org has IPv6 address 2600:2701:4000:5211:dead:beef:a7:9474
$ host pagure02.fedoraproject.org
pagure02.fedoraproject.org has address 8.43.85.76
pagure02.fedoraproject.org has IPv6 address 2620:52:3:1:dead:beef:cafe:fed8
$ host download-cc-rdu01.fedoraproject.org
download-cc-rdu01.fedoraproject.org has address 8.43.85.72
download-cc-rdu01.fedoraproject.org has IPv6 address 2620:52:3:1:dead:beef:cafe:fed1

The two hosts in the community cage which should be able to be gotten to on ports 443 and 22 should be download-cc-rdu01 and pagure02.fedoraproject.org. If you are able to do this, then it is something with the specific COPR systems. If you can't do this but can do so on pagure02, then it may be due to some sort of firewall change which happened during the switch upgrades last week.

Can you try the following from the AWS hosts to see what ip address they are presenting?

curl -4 https://ipv4.icanhazip.com

That way if that network has been blocked for a reason, Red Hat IT can figure it out?

If you can get to the SSH on 152.19.134.199 but not to the SSH on 8.43.85.76 then I am going to say that it is probably a firewall issue on the community cage part. If you can get to the

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: aws, medium-gain, medium-trouble, ops, rdu-cc

6 months ago

If you can get to the SSH on 152.19.134.199 but not to the SSH on 8.43.85.76 then I am going to say that it is probably a firewall issue on the community cage part.

This is exactly what is happening now.

Metadata Update from @praiskup:
- Issue untagged with: aws, medium-gain, medium-trouble, ops, rdu-cc
- Issue priority set to: Needs Review (was: Waiting on Assignee)

6 months ago
[resalloc@copr-be-dev ~][STG]$ curl -4 https://ipv4.icanhazip.com
18.208.10.131
[root@copr-be ~][PROD]# curl -4 https://ipv4.icanhazip.com
52.44.175.77

Update from IRC channel earlier today:

12:51:47   nirik | seems to be ipv4 down, but ipv6 up? 

It looks like some sort of network problem with the entire cage.

Yes. It's affecting the entire site. ;(

I filed an internal ticket... hopefully they can fix it.

From the IT ticket, it appears there's a certain fiber cut affecting ISP in North Carolina. They are trying to fix it right now. I updated https://status.fedoraproject.org/

From the IT ticket, it appears there's a certain fiber cut affecting ISP in North Carolina. They are trying to fix it right now. I updated https://status.fedoraproject.org/

Outage Started
Nov 23 2023, 20:00 UTC 

Was it rather 22nd?

Yes, sorry, I did not pay attention to details... I fixed the metadata (according to the info from IT) and closed the event, the issue should be resolved by now (at least Copr seems to work fine now).

Metadata Update from @praiskup:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

6 months ago

Login to comment on this ticket.

Metadata