#11172 issues fetching info from fasjson.fedoraproject.org
Closed: Fixed with Explanation 2 years ago by kevin. Opened 2 years ago by arrfab.

in the last days, we have multiple issues fetching needed users/groups from fasjson (to reflect that in cbs.centos.org koji / ACLs).
it's flapping so it works and sometimes doesn't (can come with some specific time if that helps, from our fasjson sync logs)
Wondering if that's related to the newly implement github action to query for all fedora users in FAS (see https://github.com/t0xic0der/fuas/actions)
if fasjson is just hammering one ipa server (and causing ns-slapd going crazy on a specific ipa backend server), all next requests are failing (probably due to timeout at the fasjson level but something to investigate in the fasjson logs ?)


Metadata Update from @zlopez:
- Issue tagged with: Needs investigation

2 years ago

Metadata Update from @phsmoura:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

2 years ago

It seems like a ipa server issue, not fasjson. ;(

For example:

[Sun Mar 12 15:24:19.915902 2023] [wsgi:error] [pid 9190:tid 9193] [remote 10.128.2.1:34872] 
ldap.LOCAL_ERROR: {'result': -2, 'desc': 'Local error', 'ctrls': [], 'info': "SASL(-1): gener
ic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (
Cannot contact any KDC for realm 'FEDORAPROJECT.ORG')"}
10.128.2.1 - SMTP/mail.centos.org@FEDORAPROJECT.ORG [12/Mar/2023:15:24:01 +0000] "GET /v1/gro
ups/ HTTP/1.1" 500 290 "-" "python-requests/2.6.0 CPython/2.7.5 Linux/3.10.0-1160.81.1.el7.x8
6_64"

ie, fasjson cannot contact the kdc...

I can't see much on the ipa servers, everything is running, kdc proxy is up and processing seemingly normally...

I did restart httpd on them in case kdc-proxy was somehow messed up without logging anything. ;(
If that doesn't help, after freeze I can update/reboot them and see if that gets things back to working...

I've since done a few more things:

  • Setup the sweeper script to clean out old ccache entries (ipa01 had like 200k of them).

  • I have just updated and rebooted 01/02/03

Please check and see if there's any issues after this comment. ;)

it still continues, to a point where I consider even stopping the zabbix/monitoring notifications for it :/

I just dug around for a while and still can't find any real log thats showing whats going on. fasjson just can't contact the KDC, but the KDC logs don't really show any issues. ;(

Can you pinpoint the first of these alerts? Perhaps we can figure out what might have changed around that exact time?

Failing that we may need to ask IPA folks for help.

Oh, side note: fasjson is hard coding ipa01 as it's kdc... another thing we could try is just dropping that and let it use any kdc in the cluster....

Ha. So, I hit the same problem on another host (cannot contact kdc) and dug around and noticed sssd said that it couldn't resolve ipa01.iad2.fedoraproject.org...

So I looked and there's some .cn ip thats hitting our dns servers pretty hard. We have rate limiting, but perhaps it's enough to disrupt it.

I blocked them in iptables. So, I guess lets see if thats related?

So, looks like this came down to a script that was hitting fasjson pretty hard and causing timeouts.

We have worked with the script owner and are going to setup a better way to get that information.

So, I think we can close this now. Please re-open if you see it again or if there's anything further to do on our side.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

2 years ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog