#10543 Add IP v6 verification for fedorapeople
Closed: Fixed with Explanation a year ago by kevin. Opened 2 years ago by misc.

Describe what you would like us to do:


We found out today that fedorapeople.org IP v6 was not correct since a few weeks. It wasn't noticed because there is a working IP v4, and most OS would fallback on that.

But nagios seems to only verify IP v4 connectivity for fedorapeople.org (and maybe others). It should also, when applicable and if doable, check IP v6.

When do you need this to be done by? (YYYY/MM/DD)


No deadline, just log it as a easyfix for a apprentice.

(not sure if that's really a easyfix, cause nagios is a bit complex)


Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-gain, low-trouble, ops

2 years ago

It seems we aren't monitoring any ipv6 addresses. ;( I know we used to, but seems it got dropped somewhere on the way. ;(

On Wed, Feb 9, 2022 at 7:06 PM Kevin Fenzi pagure@pagure.io wrote:

kevin added a new comment to an issue you are following:
``
It seems we aren't monitoring any ipv6 addresses. ;( I know we used to,
but seems it got dropped somewhere on the way. ;(

It can only be done from noc02 and it was needing extra attention due to
network issues. So it t was removed with the idea we would get a better
replacement later

``

To reply, visit the link below or just reply to this email
https://pagure.io/fedora-infrastructure/issue/10543

--
Stephen Smoogen, Red Hat Automotive
Let us be kind to one another, for most of us are fighting a hard battle.
-- Ian MacClaren

@kevin @smooge what kind of check you need to set up ? do you have any ipv4 example ?

I no longer work in Fedora so I have no updates.

Is this something that could be done in Python? Or does it have to be done in Nagios?

well, nagios can call python scripts... they just need to return the right things. Look at any of the other python plugins we have in ansible... :)

@kevin , could you assign me this issue ?
I have a question, check connectivity means verify ssh connect within ipv6 or just dns record for fedorapeople.org ?

@seddik Sure.

We have 2 nagios servers. noc01 and noc02. 02 is also (nagios-external).

noc01 has no ipv6 address, so it can't check them. However, 02 does.

So, we want to add some check to 02 that does a ping test for the ipv6 ip for fedorapeople.org (and probibly other servers that have ipv6 addresses).

You can take a look in our ansible repo under roles/nagios/server to see how it generates the nagios config from the termplates and ansible variables.
Unfortunately it's pretty complex, but between that and looking at the actual config on noc01/noc02 you might see a easy way to add some ipv6 ping checks.

Metadata Update from @kevin:
- Issue assigned to seddik

2 years ago

I run some GitHub Actions workflows that pull RPMs from my https://fedorapeople.org/~ktdreyer/ space. Sadly the "package" Ansible module (on CentOS 8 Stream) does not gracefully fall back to IPv4. Ansible was displaying errors like:

fatal: [instance]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'bz1827758': Yum repo downloading error: Downloading error(s): repodata/e63a4bfb49908c05ed76b6f3bfba45a5a4eb8956f1bd980edbb45b52152f5f17-primary.xml.gz - Cannot download, all mirrors were already tried without success", "rc": 1, "results": []}

It's interesting that it got far enough to discover the unique filename of my primary.xml.gz, so it must have successfully loaded repomd.xml, and then failed on the subsequent HTTP request for primary.xml.gz. This is what first led me to suspect IPv4/IPv6 connectivity.

I added ip_resolve: 4 to my Ansible tasks that configure fedorapeople.org repos, and now my GitHub Actions pass.

I'm wrong. ip_resolve: 4 was a red-herring, and I actually took it out of my playbooks now. I ran my GitHub Action many times, and I still saw odd errors about metadata download failures (eg. for filelists.xml).

After removing ip_resolve: 4, I added retries in Ansible, specifically retries: 3 and delay: 10, and the problem has not resurfaced yet.

@kevin sorry for the delay ..
i developed new script to check ipv6 ping result, so just get back to me if you're agree with this output message .

PING OK - Packet loss=0%, RTT=13.682 ms

just a part of code to read/test from here https://paste.centos.org/view/d87dc183
FYI : the hostname will be passed as an argument

Your paste seems to have expired. ;( I think they only stick around for a day or two. ;(

In any case, we don't need a new script... nagios has a check command called 'check-host-alive6' for checking a host via ipv6 (there is also a 'check-host-alive4' for ipv4).

I think you can look at:

roles/nagios_server/templates/nagios/hosts/ibiblio-hosts.cfg.j2

which is the template ansible uses to setup the ibiblio hosts (of which people02 is one).

You can see how it's expanded by looking on noc02.fedoraproject.org in /etc/nagios/hosts/ibiblio-hosts.cfg

I think what we need to do is duplicate the above template file and replace where it has eth0_ipv4_ip with eth0_ipv6_ip and add a line with 'check_command check-host-alive6'.
We may also need to add to where it has {{ host }} something like "{{ host }}-ipv6" so it doesn't conflict with the others.

@Kevin Fenzi ok i see, i will get back you with right config .

Le mer. 12 oct. 2022 =C3=A0 01:01, Kevin Fenzi pagure@pagure.io a =C3=A9c=
rit :

kevin added a new comment to an issue you are following:
``
Your paste seems to have expired. ;( I think they only stick around for a=
day or two. ;(

In any case, we don't need a new script... nagios has a check command cal=
led 'check-host-alive6' for checking a host via ipv6 (there is also a 'chec=
k-host-alive4' for ipv4).

I think you can look at:

roles/nagios_server/templates/nagios/hosts/ibiblio-hosts.cfg.j2

which is the template ansible uses to setup the ibiblio hosts (of which p=
eople02 is one).

You can see how it's expanded by looking on noc02.fedoraproject.org in /e=
tc/nagios/hosts/ibiblio-hosts.cfg

I think what we need to do is duplicate the above template file and repla=
ce where it has eth0_ipv4_ip with eth0_ipv6_ip and add a line with 'check_c=
ommand check-host-alive6'.
We may also need to add to where it has {{ host }} something like "{{ hos=
t }}-ipv6" so it doesn't conflict with the others.

``

To reply, visit the link below or just reply to this email
https://pagure.io/fedora-infrastructure/issue/10543

Metadata Update from @kevin:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

a year ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog