#9990 sssd fails on ipsilon servers, causing authentication issues.
Opened 3 months ago by q5sys. Modified 3 days ago

Describe what you would like us to do:


The error message says to reach out to you all... so here I am following instructions.

When do you need this to be done by? (YYYY/MM/DD)


Preferably before the election cycle ends, as Id like to vote.

400error.png


Update:
Others are reporting the same thing, and having success with logging into other fedora sites with their FAS accounts and then being able to log in. I did the same and was able to log in. So I'm not sure what the issues is, but wanted to report back my findings.

I think it's not that going to another site helps, but that it's transitory and people should just retry...

But investigating.

I got the same behavior. Clicking back and logging in again let me in. I also hit this on the OpenShift console, so I don't think it's specific to the Elections app. I saw it on another application this morning, but I don't remember which one it was.

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: high-gain, high-trouble, ops

3 months ago

Please re-try now. I have made a change that I hope will keep it working.

If folks are still seeing issues, please let me know your account name and what time exactly it was?

I'm currently experiencing this issue while trying to login in Openshift Web Console.
username: mattia
time: 2021-05-29 15:16 UTC

I can successfully login into accounts.fedoraproject.org

I have the same issue with the elections app.

username: obudai
time: 2021-05-31 9:37 UTC

Most puzzling.

[Sat May 29 15:16:21.724960 2021]... authentication failed for user mattia: Authentication failure

[Mon May 31 09:35:56.396373 2021] ... authentication failed for user obudai: Authentication failure ...

So from the ipsilon point of view it seems like it's the wrong password. :(

@abompard or @puiterwijk any ideas here?

It's apparently only happenning on ipsilon02, when I end up on ipsilon01 I can login fine.

I get this in the journal:

May 31 17:50:57 ipsilon02.iad2.fedoraproject.org httpd[701471]: pam_sss(ipsilon:auth): authentication failure; logname= uid=48 euid=48 tty= ruser= rhost=192.168.1.195 user=abompard
May 31 17:50:57 ipsilon02.iad2.fedoraproject.org httpd[701471]: pam_sss(ipsilon:auth): received for user abompard: 4 (System error)

Restarting sssd seems to have fixed it.

ok. Sounds like that was it. I do wonder why sssd went into that state though. ;(

If anyone still has any problems, please re-open or file a new issue. Thanks!

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 months ago

ok. Sounds like that was it. I do wonder why sssd went into that state though. ;(

If anyone still has any problems, please re-open or file a new issue. Thanks!

Can't reopen, but I just saw this again with Elections and another app (either commblog or wiki, I forget now)

Yeah, it looks like 02 is hitting those errors again. ;(

01 has almost no cases of it...

@abompard any idea of the underlying cause here?
i'm running the playbook over 02 to make sure there's nothing just out of sync config wise.

Metadata Update from @kevin:
- Issue status updated to: Open (was: Closed)

2 months ago

In the journal I see quite a few entries like:

kernel: sssd_be[1511570]: segfault at 8 ip 00007f0d148d1e24 sp 00007ffffa182258 error 4 in libdbus-1.so.3.19.13[7f0d148ac000+30000]
systemd-coredump[1513354]: [🡕] Process 1513356 (sssd_be) of user 0 dumped core.
Stack trace of thread 1511570:
[...]

An yesterday there were lines like sssd_be[1508324]: LDAP connection error: unknown error.

I don't know where that could come from but I'll open a ticket with the sssd folks. And I'll restart sssd in the meantime.

Metadata Update from @ryanlerch:
- Issue priority set to: Waiting on External (was: Waiting on Assignee)

2 months ago

I just had this issue when trying to login to ask.fedoraproject.org. Even resetting my password didn't help. I was able to login here (first time) and now it seems to work again on ask.fedoraproject.org as well, which agrees with @q5sys observation.

I confirm, exactly this error message was shown 3 times today.

I came cross this login issue today. Depending on which Fedora service I started from had different outcomes.

Via discussion.fedoraproject.org and clicking "Log In" button I am seeing the SC 400 error repeatedly. No matter the number of attempts. Did not have a linked account set-up.

Whereas via Pagure fedora-infrastructure forum it took 3 attempts before my credentials were accepted.

I just had the same issue as @paulgb - ask.fedoraroject.org wouldn't let me login (HTTP 400). But I could login to pagure.io after which ask.fedorproject.org was fine.

I'm seeing HTTP 400 on the wiki right now. With Badges, I'm getting a different behavior: Logging in to badges keeps redirecting me back to the login page (with a different ipsilon_transaction_id each time). If I go back to badges, I'm still not logged in.

On CommBlog, I'm getting the same loop behavior that I see with Badges

I got two different reports of login problems this morning, and discovered that I get an error which looks like the initial image when I try to log in to ask, but

 OpenID request was cancelled

when trying to log into pagure.io in a private window.

im also seing this error "400 - Bad Request

User not authenticated at continue" and "OpenID request was cancelled" wen having trouble logging in to ask.fedoraproject.org to report an issue with my sound card on my acer laptop
some of my failed kernel tests are not showing up in my email notifications for today.
it seems to go away after resetting the password over and over again several times. but this is happened before.
I think you guys might want to consider printing carbon copies and doing offline backups for sensitive tickets

@duffy is reporting "OpenID request was cancelled" errors logging into Pagure right now.

Possibly this is a different issue, but it all feels connected.

@abompard any news here?

Perhaps we can at least add a nagios check so we know when this starts and can restart it? Or just restart sssd every hour or 15min or something?

Would moving the ipsilon hosts over to rhel7 or rhel8 work around this?

I have added a note to fedorastatus about this issue. If you are reading this coming from that, please:

  1. Make sure you can login ok to https://accounts.fedoraproject.org with the same username/password. This issue does NOT affect direct logins to the accounts page, so if you can't login there, there is something else wrong, not this issue.

  2. If you can login to https://accounts.fedoraproject.org ok but not anywhere else, please note the issue to #fedora-admin channel on irc.libera.chat or mail admin@fedoraproject.org and note that you may be seeing this issue. We are working to track it down and trying to watch when it happens so we can restart it before it affects anyone, but we may miss that during off hours.

Sorry for the trouble. ;(

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog
Attachments 1
Attached 3 months ago View Comment