#10372 [proposal] Move Pagure (pagure.io and src.fp.o) to use OIDC for web login
Opened 2 years ago by ryanlerch. Modified 5 months ago

Currently, our two pagure instances, https://pagure.io and https://src.fedoraproject.org use two ways of authentication:

  1. OpenID via Ipsilon for logging into the pagure web interface, and
  2. API tokens, which are Pagure's own implementation.

However, the Pagure software itself has support for using OpenID Connect (OIDC) for both the web login and API, it is just not enabled on our instances (pagure.io and src.fp.org). The main reason for this is that ipsilon requires changes for the API token side to be usable (i believe the interface for granting and revoking OIDC api tokens is not implemented fully).

Anyway, long story long, the proposal here is to enable OIDC on our two Pagure instances for web login only and leave the current pagure-implemented API token system in place, until the changes can be implemented in ipsilon.

This was discussed previously in this ticket:
https://pagure.io/fedora-infrastructure/issue/7377

There is also a mini-initiative to port all the remaining applications away from OpenID to OIDC, so completion of this proposal would cross pagure off the list there.
https://pagure.io/fedora-infrastructure/issue/10241


Metadata Update from @humaton:
- Issue tagged with: high-gain, medium-trouble, ops

2 years ago

Metadata Update from @mohanboddu:
- Issue untagged with: high-gain
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain

2 years ago

Metadata Update from @zlopez:
- Issue tagged with: pagure

2 years ago

So, I guess this is still stalled because we need ipsilon to support variable scopes?

Metadata Update from @zlopez:
- Issue tagged with: blocked

2 years ago

[backlog refinement]
This is still blocked on ipsilon support for variable scopes.

Metadata Update from @zlopez:
- Issue assigned to zlopez

a year ago

Metadata Update from @zlopez:
- Issue untagged with: blocked

10 months ago

I tried to enable OIDC for pagure in staging. Here is the PR and few adjustments I need to do PR1 and PR2.

I was able to log in, but how can I check that it's really used?

EDIT: It seems that it's still using OpenID instead of OpenIDC

So the issue was in the variable environment used by pagure, it's different from every other staging app, so the configuration was not used at all.

I created a PR to fix that, to reflect the changes I did locally on the staging machine.

Currently I'm hitting this error, but otherwise I'm being correctly redirected to id.stg.fedoraproject.org and back to stg.pagure.io:

ssl.SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:897)

and according to https://stackoverflow.com/questions/44316292/ssl-sslerror-tlsv1-alert-protocol-version this means that there is old version of OpenSSL library. I checked and there isn't newer version on RHEL 8, where the staging instance of pagure is currently deployed.

How do we want to continue? Do we want to update the machine to RHEL 9?

NOTE: I reverted the local changes on the machine, so the login works again.

So, here's the background...

"staging" means staging hosts in iad2.

pagure-stg01 is NOT in iad2, and cannot talk to any staging stuff directly. So, thats why it's not really in staging.
It's only staging for pagure deployments/testing, not part of our staging env otherwise.

That said, OIDC should work I would think against staging. I am not sure whats going on with that message. ;(
Perhaps @abompard could look?

So the staging ipsilon is running on Fedora 38, so this could be the difference between crypto policies between RHEL 8 (it's running python 3.6) and Fedora 38. So updating to RHEL 9 could help, but I never did it in Fedora Infra, so I would be glad for guidance.

First, we need to get all pagure ready in epel9. It's not currently branched and I don't know how many packages are missing and what needs doing to get it running there.

Once thats done, we would need to save off the database and content, reinstall with rhel9, sync that back and reload it.

So, not at all anything easy.

I wouldn't think this would be a rhel8 vs rhel9 thing, but I suppose it could be. We globally disable tlsv1 tho:

SSLProtocol +all -SSLv3 -TLSv1 -TLSv1.1

Could it help to just use newer python on RHEL8 to have a newer ssl python library?

About moving the Pagure to epel9 this could be something that @dherrera or @carlgeorge could help.

We can instead try Fedora 38, the pagure is already packaged for it. It's worth a try?

It seems that somebody is working on getting pagure to EPEL9.

@ryanlerch was able to test OIDC in Pagure on Fedora 38 in tiny-stage and confirmed that it works. So I'm setting this back to blocked till the pagure will be packaged for EPEL 9.

Metadata Update from @zlopez:
- Assignee reset

10 months ago

Metadata Update from @zlopez:
- Issue tagged with: blocked

10 months ago

Metadata Update from @zlopez:
- Issue marked as blocking: #11297

9 months ago

After getting help from @abompard we were able to enable OIDC on staging pagure. I will try to enable it on production on Monday, don't want to break it during weekend.

Metadata Update from @zlopez:
- Issue untagged with: blocked

9 months ago

Enabling the OIDC on production pagure started throwing following errors:

sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/3o7r)

Unable to connect to WSGI daemon process 'pagure' on '/etc/httpd/run/wsgi.2206235.0.1.sock' after multiple attempts as listener backlog limit was exceeded or the socket does not exist.

After some investigation I found out that this was caused by limiting the number of processes spawned by httpd to 1. We needed that because the OIDC by default is just using dictionary and having multiple processes caused that the response from ipsilon was not received by the pagure instance that sent the request.

We discussed this with @abompard and these are the options we currently have:

  1. the easiest: configure apache to only spawn 1 process (with multiple threads)
  2. write a dict-like wrapper for pymemcache and setup a memcached server
  3. the longest: migrate Pagure from flask-oidc to authlib

We already tried the first option, which doesn't work on production (even after adding more threads). We can either investigate it more and make it work with one process or try the other options, the best option is 3) as this solves more than on issue.

For now I will disable OIDC on production, so this doesn't happen again.

I created a ticket in pagure for the migration https://pagure.io/pagure/issue/5401

Since we now have a new version of flask-oidc that is based on authlib, I investigated migrating Pagure to that. Here's what I found:
EL8 does not have authlib packaged, we could package it but the version of Flask on EL8 is ancient (<1.0) and authlib won't work with it.
As a result, the newer Flask-OIDC won't work with Pagure on EL8, and neither will pure Authlib.

I think we have two options:
- Upgrade to EL9, where everything is packaged.
- Run Pagure in a virtualenv instead of running it with the RPMs. It could still be a virtualenv based on the system's Python 3.6, but at that point we should consider basing it off another packaged version that is actually supported upstream, such as python3.11.
To make sure builds are reproductible, we should lock the versions of the dependencies in the virtualenv. But then we have an update problem. I can investigate the best way to do that for an app that is not packaged with poetry, if that's the route we want to take.

I would very much prefer to use rpms/go to RHEL9. virtualenv would work, but we aren't really as savvy at keeping them updated/knowing when something needs fixing/updating.

Metadata Update from @zlopez:
- Issue tagged with: blocked

5 months ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog