Currently, our two pagure instances, https://pagure.io and https://src.fedoraproject.org use two ways of authentication:
However, the Pagure software itself has support for using OpenID Connect (OIDC) for both the web login and API, it is just not enabled on our instances (pagure.io and src.fp.org). The main reason for this is that ipsilon requires changes for the API token side to be usable (i believe the interface for granting and revoking OIDC api tokens is not implemented fully).
Anyway, long story long, the proposal here is to enable OIDC on our two Pagure instances for web login only and leave the current pagure-implemented API token system in place, until the changes can be implemented in ipsilon.
This was discussed previously in this ticket: https://pagure.io/fedora-infrastructure/issue/7377
There is also a mini-initiative to port all the remaining applications away from OpenID to OIDC, so completion of this proposal would cross pagure off the list there. https://pagure.io/fedora-infrastructure/issue/10241
cc: @pingou
Metadata Update from @humaton: - Issue tagged with: high-gain, medium-trouble, ops
Metadata Update from @mohanboddu: - Issue untagged with: high-gain - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: medium-gain
Metadata Update from @zlopez: - Issue tagged with: pagure
So, I guess this is still stalled because we need ipsilon to support variable scopes?
Metadata Update from @zlopez: - Issue tagged with: blocked
[backlog refinement] This is still blocked on ipsilon support for variable scopes.
Metadata Update from @zlopez: - Issue assigned to zlopez
Metadata Update from @zlopez: - Issue untagged with: blocked
I tried to enable OIDC for pagure in staging. Here is the PR and few adjustments I need to do PR1 and PR2.
I was able to log in, but how can I check that it's really used?
EDIT: It seems that it's still using OpenID instead of OpenIDC
So the issue was in the variable environment used by pagure, it's different from every other staging app, so the configuration was not used at all.
I created a PR to fix that, to reflect the changes I did locally on the staging machine.
Currently I'm hitting this error, but otherwise I'm being correctly redirected to id.stg.fedoraproject.org and back to stg.pagure.io:
id.stg.fedoraproject.org
stg.pagure.io
ssl.SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:897)
and according to https://stackoverflow.com/questions/44316292/ssl-sslerror-tlsv1-alert-protocol-version this means that there is old version of OpenSSL library. I checked and there isn't newer version on RHEL 8, where the staging instance of pagure is currently deployed.
How do we want to continue? Do we want to update the machine to RHEL 9?
NOTE: I reverted the local changes on the machine, so the login works again.
So, here's the background...
"staging" means staging hosts in iad2.
pagure-stg01 is NOT in iad2, and cannot talk to any staging stuff directly. So, thats why it's not really in staging. It's only staging for pagure deployments/testing, not part of our staging env otherwise.
That said, OIDC should work I would think against staging. I am not sure whats going on with that message. ;( Perhaps @abompard could look?
So the staging ipsilon is running on Fedora 38, so this could be the difference between crypto policies between RHEL 8 (it's running python 3.6) and Fedora 38. So updating to RHEL 9 could help, but I never did it in Fedora Infra, so I would be glad for guidance.
First, we need to get all pagure ready in epel9. It's not currently branched and I don't know how many packages are missing and what needs doing to get it running there.
Once thats done, we would need to save off the database and content, reinstall with rhel9, sync that back and reload it.
So, not at all anything easy.
I wouldn't think this would be a rhel8 vs rhel9 thing, but I suppose it could be. We globally disable tlsv1 tho:
SSLProtocol +all -SSLv3 -TLSv1 -TLSv1.1
Could it help to just use newer python on RHEL8 to have a newer ssl python library?
About moving the Pagure to epel9 this could be something that @dherrera or @carlgeorge could help.
We can instead try Fedora 38, the pagure is already packaged for it. It's worth a try?
It seems that somebody is working on getting pagure to EPEL9.
@ryanlerch was able to test OIDC in Pagure on Fedora 38 in tiny-stage and confirmed that it works. So I'm setting this back to blocked till the pagure will be packaged for EPEL 9.
Metadata Update from @zlopez: - Assignee reset
Metadata Update from @zlopez: - Issue marked as blocking: #11297
After getting help from @abompard we were able to enable OIDC on staging pagure. I will try to enable it on production on Monday, don't want to break it during weekend.
Enabling the OIDC on production pagure started throwing following errors:
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/3o7r) Unable to connect to WSGI daemon process 'pagure' on '/etc/httpd/run/wsgi.2206235.0.1.sock' after multiple attempts as listener backlog limit was exceeded or the socket does not exist.
After some investigation I found out that this was caused by limiting the number of processes spawned by httpd to 1. We needed that because the OIDC by default is just using dictionary and having multiple processes caused that the response from ipsilon was not received by the pagure instance that sent the request.
We discussed this with @abompard and these are the options we currently have:
We already tried the first option, which doesn't work on production (even after adding more threads). We can either investigate it more and make it work with one process or try the other options, the best option is 3) as this solves more than on issue.
For now I will disable OIDC on production, so this doesn't happen again.
I created a ticket in pagure for the migration https://pagure.io/pagure/issue/5401
Since we now have a new version of flask-oidc that is based on authlib, I investigated migrating Pagure to that. Here's what I found: EL8 does not have authlib packaged, we could package it but the version of Flask on EL8 is ancient (<1.0) and authlib won't work with it. As a result, the newer Flask-OIDC won't work with Pagure on EL8, and neither will pure Authlib.
I think we have two options: - Upgrade to EL9, where everything is packaged. - Run Pagure in a virtualenv instead of running it with the RPMs. It could still be a virtualenv based on the system's Python 3.6, but at that point we should consider basing it off another packaged version that is actually supported upstream, such as python3.11. To make sure builds are reproductible, we should lock the versions of the dependencies in the virtualenv. But then we have an update problem. I can investigate the best way to do that for an app that is not packaged with poetry, if that's the route we want to take.
I would very much prefer to use rpms/go to RHEL9. virtualenv would work, but we aren't really as savvy at keeping them updated/knowing when something needs fixing/updating.
Given the work to move to a new gitforge is this worth keeping open?
So, given that we are looking to move to another forge and are currently blocked on this, lets close this for now.
I am asking ab to ask keycloak folks about openid support, if we can get that we can move to that sooner rather than later, but if not, we can move after we move git forges or if a rhel9 pagure becomes available.
Metadata Update from @kevin: - Issue close_status updated to: Will Not/Can Not fix - Issue status updated to: Closed (was: Open)
I talked to Keycloak folks and they have never been asked in past to implement OpenID 1.0 or 2.0. None of their customers asked for that, so they have OAuth 2.0/OIDC implementation only. Since the whole idea is to make it a stop-gap solution, there is no reason for them to work on it.
I think, realistically, if you are unable to upgrade to RHEL9, packaging needed updates to flask and authlib in a custom repo in COPR (for example), would unblock you.
Yeah, so, I guess lets reopen this.
Can we identify exactly what packages are too old in rhel8? Once we have that list we could look and see if just doing compat packages would work or is too daunting. At the same time we could look at whats left for rhel9 pagure support.
Sounds reasonable?
Metadata Update from @kevin: - Issue status updated to: Open (was: Closed)
OK, I think we got it working with the authlib-based rewrite of flask-oidc. The staging version is currently running with it, using packages built in the infra repo (python3-authlib and python3-flask-oidc).
python3-authlib
python3-flask-oidc
I think we can schedule the migration of the prod version when it's practical (after freeze maybe?).
Wow. Awesome!
Yes, definitely after freeze, but can do it then...
Checking in on this work. Any updates to share?
Yes! Pagure.io is now using OIDC. We delayed the switch of dist-git to oidc because it had a weird setup with mod_auth_oidc, but I've switched staging to OIDC and it's apparently working fine, so I think we're ready to switch prod to OIDC as well. We just need to setup a planned outage.
mod_auth_oidc
Metadata Update from @phsmoura: - Issue untagged with: blocked
When would be good to do that? next week sometime? ;)
Yes! Tue or Wed in my morning would work I think, at that hour people in the US are still asleep. I'll set something up
I guess this is now done? ;)
Feel free to reopen if there's more to do.
Metadata Update from @kevin: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.