#9152 staging noggin deployment planning
Closed: Fixed a month ago by kevin. Opened 9 months ago by kevin.

Greetings.

We are now ready to build up our staging env again and I figured I would file a ticket to coordinate noggin deployment along with all the other things we need for it.

Some questions:

  • Does noggin work/will it work in openshift? If so, I can do a openshift deployment first, if not, we can just do it in a vm.

  • unfortunately (or perhaps fortunately), we didn't save the old staging ipa server, so I did a new deployment from scratch in a vm. (ipa01.stg.iad2.fedoraproject.org).
    Does noggin need anything from the ipa server configuration wise? The playbook is currently failing on:
    ipa: ERROR: Host 'id.stg.fedoraproject.org' does not have corresponding DNS A/AAAA record, but it does... not sure whats going on there.

  • I'm assuming noggin needs ipa and ipsilon and a proxy, any other services?

Things we need to figure out:

  • Should we just start from 0 for now? (ie, have admins make accounts, etc) or do we want to try and migrate data from prod?

  • we need to figure out ssh access/replacement for fasClient

  • we need to figure out sudo access/replacement for pam_url

cc @abompard @pingou @puiterwijk


Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: groomed, medium-gain, medium-trouble

9 months ago

Does noggin work/will it work in openshift? If so, I can do a openshift deployment first, if not, we can just do it in a vm.

Yes, The two webapps will run in OpenShift. I have the yaml files I used for the CommuniShift and RH instance deployment so I can reuse them for staging. If you make me a role folder in Ansible I'll put them there. I'll also need a couple secrets obviously.

unfortunately (or perhaps fortunately), we didn't save the old staging ipa server, so I did a new deployment from scratch in a vm. (ipa01.stg.iad2.fedoraproject.org). Does noggin need anything from the ipa server configuration wise?

I'll will need to have the freeipa-fas plugin installed, but IPA should be installable without it, and we can add it later.

The playbook is currently failing on: ipa: ERROR: Host 'id.stg.fedoraproject.org' does not have corresponding DNS A/AAAA record, but it does... not sure whats going on there.

Hmm, not sure either.

I'm assuming noggin needs ipa and ipsilon and a proxy, any other services?

I'll need to connect to the RabbitMQ servers for Fedora Messaging, but I think that's all.

Should we just start from 0 for now? (ie, have admins make accounts, etc) or do we want to try and migrate data from prod?

We can try the migration script.

we need to figure out ssh access/replacement for fasClient
we need to figure out sudo access/replacement for pam_url

That should just be running ipa-client-install, I believe.

Metadata Update from @abompard:
- Issue untagged with: groomed, medium-gain, medium-trouble
- Issue priority set to: Needs Review (was: Waiting on Assignee)

9 months ago

Metadata Update from @abompard:
- Issue tagged with: groomed, medium-gain, medium-trouble

9 months ago

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

9 months ago

Note that I also have the Noggin stack packaged as RPMs and it can be deployed to a container or a VM with it. This also includes freeipa-fas plugin. Without the plugin installed and FreeIPA reconfigured with it, Noggin will break pretty badly.

Oh, and here's the COPR where I've built all this: https://copr.fedorainfracloud.org/coprs/ngompa/fedora-aaa/

I'm basically waiting on @abompard's approval before upstreaming these into Fedora itself.

ok, we now have a staging openshift cluster up and running. It doesn't have any web or remote access yet however (that needs some firewall rules setup for staging proxies).

That said, I think we can start working on deploying noggin there anytime.

I assume ipsilon needs some config adjustment to talk only to IPA and not fas? We should do that too and deploy it in openshift.

Also, does IPA need any changes?

@abompard since we have no auth currently or web interface, how about I just put your ssh key for root on os-control01.stg.iad2.fedoraproject.org ? You can login there as root and ssh to os-master01.stg.iad2.fedoraproject.org (or any of the cluster) to debug things. Is that acceptable? Or do you just need to deploy via playbook and don't need any more access? Anything else you need? Once we have noggin up and ipsilon, we can look at sorting out our ssh / local admin accounts plans.

Yeah I'll also need access to the FreeIPA server to deploy the extension (or rather: be allowed to run the playbook that will do it, probably the same as the freeipa one)

Then I can start on ipsilon & noggin. I have the openshift yaml files for noggin but I haven't written playbooks, I can start with that once I have access. I haven't used root on openshift yet, I hope I won't break things...

Status update:

  • Noggin and FASJSON are deployed in staging openshift, but I can't check that they actually work because there's currently no way to get to the web UIs (if I understand correctly the proxies aren't set up)

  • Ipsilon is not deployed, and I'll work on that today. It needs to be configured to pull information from IPA/LDAP and not FAS, so there are configuration changes to do to the current playbook/role. I'd welcome help from someone who know Ipsilon well to figure out those changes and how to deploy them, because we can't use the nice --ipa installation switch with containers

Ipsilon does have a plugin to get info from LDAP, but I don't think we can use it as-is, because the FAS info plugin seems to do more stuff (be careful @nphilipp or @ryanlerch , the infofas plugin that is actually deployed is a modified version that lives in ansible/roles/ipsilon/files/infofas.py) especially around the AWS roles.

I'm not sure how much of a IPA client setup Ipsilon will need to switch to IPA-based authentication and information, but we can't run ipa-client-install in a container, so we'll have to figure something out. For FASJSON I'm using a lightweight system that gets the IPA CA cert and a service keytab, but I'm not sure it'll be sufficient for Ipsilon. Maybe the IPA folks ( @cheimes ?) can shed a light on that as they've probably had container-based IPA clients in the past already.

Finally, the IPA server in staging seems to be up and running fine with the FAS plugin installed. There are almost no users at the moment, so we can start testing the import script, but it's there and machines where fasClient used to run can be enrolled to enable SSH access.

That's all for me today! I'll be off most of next week but @nphilipp and @ryanlerch are around (hop, right under the bus, you're welcome).

Ipsilon is not deployed, and I'll work on that today. It needs to be configured to pull information from IPA/LDAP and not FAS, so there are configuration changes to do to the current playbook/role. I'd welcome help from someone who know Ipsilon well to figure out those changes and how to deploy them, because we can't use the nice --ipa installation switch with containers

I believe @hellcp has a working Ipsilon configuration with FreeIPA that he could share to template for the container deployment.

That'd be very nice, thanks.

@puiterwijk is more qualified to talk about IPA integration for Ipsilon. As far as I remember Ipsilon uses SSSD, mod_auth_gssapi, and mod_lookup_identity. The lookup identity module depends on SSSD's info pipe and D-Bus. At a minimum you have to configure authselect as authselect select sssd, configure SSSD, enable info-pipe in SSSD config, and have a minimal IPA installation with default.conf, ipa.crt, /etc/krb5.keytab, and a keytab for HTTPd.

We met today to move this deployment forward.

I have created a db-fas01.stg.iad2.fedoraproject.org that has the prod fas db loaded in it.

You should now be able to deploy a fas in staging openshift and migrate from it to noggin.

Should we keep this open? Or close until we see more to do? The sssd part of things still needs to be sorted tho.

Thanks Kevin!

I would like to keep this open for now in case the team run into any issues deploying so we have a reference of what was asked.

Happy to open a separate issue for sssd if that would help too.

freeipa-fas is now in EPEL8 and Fedora. I'm progressing through shipping the rest of the Noggin stack in Fedora.

python-noggin-messages is now submitted for Fedora 33: https://bodhi.fedoraproject.org/updates/FEDORA-2020-ff21d2d01a

I cannot build it for EPEL8 due to lack of Poetry in EPEL 8.

Filed https://bugzilla.redhat.com/show_bug.cgi?id=1898395 but poetry seems orphaned now perhaps in favor of poetry-core.

That's odd, I'm pretty sure @thrnciar maintains it, which is confirmed by https://src.fedoraproject.org/rpms/poetry

See on the side where all epel and fedora bugs are overridden to 'orphan' ? and indeed the bug I filed above is assigned to orphan. :(

When the package was unorphaned, we haven't noticed it had bugzilla overrides set to orphan. I've reset to defaults now.

Apps in staging that don't seem to be working:

  • mirrormanager
  • kerneltest
  • koschei
  • mailman
  • COPR

Apps in staging that don't seem to be working:

  • mirrormanager

https://admin.stg.fedoraproject.org/mirrormanager/ seems to work? I can even login ok... :)
Or was there another part that wasn't working?

  • kerneltest

This seems to be a schema thing... prod is running the old version in a vm, and I just copied prod->stg db, so I think we need to load whatever schema this uses, but I am not sure where it is. ;(

  • koschei

Wasn't deployed at all. I did so and fixed some things and now it's up and I can login. ;)

  • mailman

I never deployed a stg mailman because I figured we would just do so once we had the new versions packaged up. Is that still reasonable to wait, or do you need it now?

  • COPR

stg copr looks like it might be pointed to our stg proxies from back when we were trying to deploy frontends in stg. I think we just need to update this in dns. @praiskup does that sound right?

Great stuff. Keep em coming. :clock830:

stg copr looks like it might be pointed to our stg proxies from back when we
were trying to deploy frontends in stg. I think we just need to update this in
dns. @praiskup does that sound right?

I'm not sure what proxies you mean? We have
"devel"
and "production" only, and the "staging" was kind of a different thing when it
was still running (different playbook for frontend, attempt to move copr
frontend to a better shape, aim to move to openshift).

So speaking of staging, it is not running at all now and in the next few weeks
we will have no time to make it working again unfortunately.

https://admin.stg.fedoraproject.org/mirrormanager/ seems to work? I can even login ok... :)
Or was there another part that wasn't working?

Oh cool, it wasn't working when I tried, a database was missing.

  • kerneltest

This seems to be a schema thing... prod is running the old version in a vm, and I just copied prod->stg db, so I think we need to load whatever schema this uses, but I am not sure where it is. ;(

I'll investigate this.

  • koschei

Wasn't deployed at all. I did so and fixed some things and now it's up and I can login. ;)

Very cool, thanks!

  • mailman

I never deployed a stg mailman because I figured we would just do so once we had the new versions packaged up. Is that still reasonable to wait, or do you need it now?

Nah we can totally wait and leave it for last.

  • COPR

speaking of staging, it is not running at all now and in the next few weeks
we will have no time to make it working again unfortunately.

Understood. @praiskup could the "devel" instance be pointed at the staging ipsilon instance for a short time so we/you can test auth? It's id.stg.fedoraproject.org and since you're using OpenID, it should require no configuration on our part (I think).

So, next steps here I think are:

  • wait for the sync of fas->noggin data to test with.

  • get sssd / ssh/sudo rules setup in our ansible so people can ssh in and sudo via sssd.

  • wider testing of both then move on to prod

Or am I missing things?

@praiskup could the "devel" instance be pointed at the staging ipsilon instance

Yes, tracking in https://pagure.io/copr/copr/issue/1614

Metadata Update from @smooge:
- Issue tagged with: dev, ops

4 months ago

So, just to update with my understanding of deployment:

  • The fas->noggin sync is still being polished (or is it done now?)

** Did we still need to figure out what fas users/groups we do NOT want to sync?

** Did the issue of no groups on some users get solved?

  • sssd / ssh / sudo:

** I merged the PR from @nphilipp and applied it in staging, but it still doesn't work right. The groups are not as it expects or something. Can we schedule time to work on this?

** sudo I think just needs some more ipa rules being added for each host

NOTE: This work is BLOCKING the mbs 3.0 deployment, because they can't get to their staging instance to upgrade it. ;( I'd really appreciate moving it forward soon at least to the point where they are unblocked.

  • We should figure out prod plans. I think we may want to start with a fresh ipa cluster (but then everyone would have to change their password... which we can do, but always flames from that). We need to figure out timing based on the fedora 34 cycle. We also need to figure out how to cleanly remove fasClient and users from a install, or plan on re-installing everything.

Let me know if I can help out/meet on the ssh/sudo and scheduling questions. :)

  • The fas->noggin sync is still being polished (or is it done now?)

We put some improvements and bug fixes into fas2ipa recently and as far as I know, we don't have any more open issues on that front.

** Did we still need to figure out what fas users/groups we do NOT want to sync?

IIRC, we decided to transfer all groups now and clean out later.

** Did the issue of no groups on some users get solved?

I'm not sure about the details but it looks like users that were created before fas2ipa was run for the first time didn't get their groups—Ryan Lerch’s and my user have no groups at all, yours only has sysadmin-main. @abompard, your user seems to have a full set of groups, did it only get transferred with the fas2ipa run?

I have a suspicion that fas2ipa only updates user/group relations on existing users if it detects changes on the user's data. I'll look into it today.

  • sssd / ssh / sudo:

** I merged the PR from @nphilipp and applied it in staging, but it still doesn't work right. The groups are not as it expects or something. Can we schedule time to work on this?

Aside from the group membership issue, we should verify that the HBAC rules on IPA in staging are configured properly. As I wrote the PR I'm happy to assist there—I'm free anytime in my afternoon until 6pm CET (5pm UTC, 12pm EST, 9am PST) today or later in the night if needs be.

** sudo I think just needs some more ipa rules being added for each host

That's what I work on now when not debugging fas2ipa. :wink: I hope to have something to show at the end of the day.

I have a suspicion that fas2ipa only updates user/group relations on existing users if it detects changes on the user's data. I'll look into it today.

This doesn't seem to be the case, running fas2ipa for my user (re)creates group memberships for it, regardless of if it did or didn't exist before—I removed some group memberships from the existing user and it restored them without issue.

@abompard @kevin @mobrien How about running the fas2ipa script in staging again, first only for users with known problems, then for everybody? As the whole set of users takes very long to process, kicking it off before the weekend seems like a good idea to me.

@abompard @kevin @mobrien How about running the fas2ipa script in staging again, first only for users with known problems, then for everybody? As the whole set of users takes very long to process, kicking it off before the weekend seems like a good idea to me.

+1

Aside from the group membership issue, we should verify that the HBAC rules on IPA in staging are configured properly. As I wrote the PR I'm happy to assist there—I'm free anytime in my afternoon until 6pm CET (5pm UTC, 12pm EST, 9am PST) today or later in the night if needs be.

Yes, I think the hbac rules are not right. ;)
I am trying to login to ipsilon01.stg...

A test shows:

ipa hbactest --user=kevin --host=ipsilon01.stg.iad2.fedoraproject.org --service=sshd


Access granted: False

Not matched rules: allow_systemd-user
Not matched rules: group/sysadmin-main
Not matched rules: ipsilon
Not matched rules: shell-access/host/ipsilon01.stg.iad2.fedoraproject.org

the group/sysadmin-main has sysadmin-main as a group which I do, but it's not matching?

Also, I am a bit confused because we have 'ssh' 'sshd' and 'shell-access'. Perhaps we could just stick to one of them? :)

I also had to make some changes to sshd_config but thats pretty minor.

Happy to work on it, but it's already after your time... if you want we can try and find an early time next week and I can get up early and we can work on it?

Not matched rules: group/sysadmin-main

This one should match. Let's review this together on Monday.

Happy to work on it, but it's already after your time... if you want we can try and find an early time next week and I can get up early and we can work on it?

Thanks for the consideration! But it hopefully doesn't take that long, so let's meet after my dinner? Say around 7:30pm CET, 10:30am PST?

Sure! added a calendar invite... we can meet in #fedora-noc and go to video if it looks like it's better for the debugging. Thanks!

OK, I possibly found the issue: an HBAC rule needs to apply on a set of hosts and none was defined for group/sysadmin-main. So I've set it to apply on "all" hosts:

$ ipa hbacrule-mod group/sysadmin-main --hostcat all

And now auth with kevin seems to be allowed:

$ ipa hbactest --user=kevin --host=ipsilon01.stg.iad2.fedoraproject.org --service=sshd
--------------------
Access granted: True
--------------------
  Matched rules: group/sysadmin-main
  Not matched rules: allow_systemd-user
  Not matched rules: ipsilon
  Not matched rules: shell-access/host/ipsilon01.stg.iad2.fedoraproject.org

Does this seem right to you?

I've pushed the change in b60912 to avoid having manual changes in IPA, and because I'm pretty sure that's what we want, but I can revert it.

That does seem to get it working, but it doesn't make a homedir. I think thats a sssd option we need to set tho. :)

This boils down to running authselect select sssd with-mkhomedir with-sudo. Is this something we want in ansible as a manual playbook for the machines that got registered with IPA before we used ipa-client-install ... --mkhomedir ...?

I think we can just run that manually when we are ready to switch it on on all hosts, no need to run it every playbook run. ;)

So, the ssh rules are still not quite happy:

TASK [ipa/client : Create missing shell access user groups] *******************************************************************
Tuesday 19 January 2021  21:51:20 +0000 (0:00:03.614)       0:02:01.193 *******                                               
Tuesday 19 January 2021  21:51:20 +0000 (0:00:03.614)       0:02:01.192 *******                                               
ok: [mbs-frontend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=sysadmin-main)                   
ok: [mbs-backend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=sysadmin-main)                    
ok: [mbs-frontend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=s)                               
ok: [mbs-backend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=s)                                
ok: [mbs-frontend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=y)                               
ok: [mbs-backend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=y)                                
ok: [mbs-frontend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=s)                               
ok: [mbs-backend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=s)                                
ok: [mbs-frontend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=a)                               
ok: [mbs-backend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=a)                                 
ok: [mbs-frontend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=d)                                
ok: [mbs-backend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=d)                                 
ok: [mbs-frontend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=m)                               
ok: [mbs-frontend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=i)                               
ok: [mbs-backend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=m)                                
ok: [mbs-frontend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=n)                               
ok: [mbs-backend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] => (item=i)                                
failed: [mbs-frontend01.stg.iad2.fedoraproject.org -> ipa01.stg.iad2.fedoraproject.org] (item=-) => {"ansible_loop_var": "item"
, "changed": false, "item": "-", "msg": "group_add: -: invalid 'group_name': may only include letters, numbers, _, -, . and $"}

Something there isn't passing the right format to that loop. ;)

stg.pagure.io is unhappy because we aren't passing email address, which we need to get sssd to pass to ipsilon.

In ansible we have:

- name: configure SSSd to forward additional attributes (1/2)
  replace:
    path: /etc/sssd/sssd.conf
    regexp: ^ldap_user_extra_attrs = [\w,\s]+$
    replace: ldap_user_extra_attrs = mail, street, locality, st, postalCode, telephoneNumber, givenname, sn, fasTimeZone, fasLocale, fasIRCNick, fasGPGKeyId, fasCreationTime, fasStatusNote, fasRHBZEmail, fasGitHubUsername, fasGitLabUsername, fasWebsiteURL, fasIsPrivate, ipaSshPubKey
  tags:
  - ipsilon
  - config
  notify:
  - restart sssd

- name: configure SSSd to forward additional attributes (2/2)
  replace:
    path: /etc/sssd/sssd.conf
    regexp: ^user_attributes = [\w,\s+]+$
    replace: user_attributes = +mail, +street, +locality, +st, +postalCode, +telephoneNumber, +givenname, +sn, +fasTimeZone, +fasLocale, +fasIRCNick, +fasGPGKeyId, +fasCreationTime, +fasStatusNote, +fasRHBZEmail, +fasGitHubUsername, +fasGitLabUsername, +fasWebsiteURL, +fasIsPrivate, +ipaSshPubKey
  tags:
  - ipsilon
  - config
  notify:
  - restart sssd

But that no longer matches and I am not sure what the section should be. ;(

So, can we ditch those and make a sssd.conf template instead? That can include the 'allow ipsilon' to talk to sssd-fpl too.

We may also want to move this to a base/ task if we are going to be enabling it on all machines.

Uh that's weird I don't know why sssd.conf changed so much. I used a replace module to adapt better to future changes in config file but I guess that's not such a good idea. I'll make a template.

However I don't think it should be deployed on all machines, only Ipsilon needs to get these additional attributes.

Thanks for the sudo / sssd / ssh work! :)

The sudo rules don't seem 100% yet... for example, breilly is in sysadmin-mbs, which is in ipa_client_sudo_groups on mbs-backend01.stg, but when they sudo they get:

" user NOT authorized on host"

Two things I would really like to sort before we go to production:

  • disallow password login via ssh. Currently you can have your client not send any keys and it will ask for password and accept it for access. We don't want that.
  • 2fa for sudo. This is what we have now, and would like to keep. I guess that means anyone in a sysadmin group has to have a 2fa enrolled?
  • some more widespread testing in stg would be cool. If we could re-run the import to make sure everyone's groups are good, we could ask infrastructure list? Or even devel list? to try things out?

After that I know we want to head to prod, but we should come up with a deployment plan here as just having no auth for a while isn't going to make people happy.
So, I guess we are planning on keeping our current ipa servers/cluster? There's some groups we need to fix up... for example 'sysadmin-main' has a bunch of old folks in it. :)
If not we need to deploy the new cluster.
We need to deploy a new pair of ipsilon vm's in prod.
We need to deploy noggin in openshift. Can we do this on some different url at first and switch it when we switch over? Having it all ready would be nice.
The sync script has to run... can we run it once and then just run it on users that change so we don't have to have a long downtime while the script runs?
(ie, timestamp X, run it, finishes, check all messages since X, re-run on just those users, etc).
Hopefully we can enroll all the hosts then, remove fasClient on them, chown user homedirs to the new ipa uids. Or perhaps just delete them and recreate them? We could tell everyone to make sure and save anything they needed.

Then, we need an outage announced. In the window we take down fas container, bring up noggin on the real url. Take down ipsilon pods, point proxies to vm's. Point proxies to new ipa cluster (if we are doing that), then fix anything that breaks. ;)

Does that sound all reasonable? The only big question to me is the ipa cluster, I know we need to keep the current one to keep passwords, but we could just ask everyone to refresh their password (and boy will they be mad, but we can survive it).

I guess centos could switch anytime after we have noggin and ipsilon up and users migrated.

  • some more widespread testing in stg would be cool. If we could re-run the import to make sure everyone's groups are good, we could ask infrastructure list? Or even devel list? to try things out?

I agree that we need to give this more exposure before moving this to
production. Sending a "call for feedback" on devel-announce sounds appropriate
here considering the impact that this change will have (in theory none, in
practice everyone interacts with it).

We need to deploy noggin in openshift. Can we do this on some different url at first and switch it when we switch over? Having it all ready would be nice.

We originally talked about keeping FAS available as read-only for a few months.
Giving time to people having scripts interacting with FAS to port them over to
noggin.
If we do that, we'll have to use different URLs.

@kevin This should force public key auth.

https://pagure.io/fedora-infra/ansible/pull-request/377

Gotten from here

https://askubuntu.com/questions/1087609/freeipa-ssh-key-authentication-only
and also referenced here
https://serverfault.com/questions/783082/how-to-use-the-ssh-server-with-pam-but-disallow-password-auth

Well, perhaps, but then why does our "PasswordAuthentication no" not do it?

Side note: now that we have stg mostly working and synced, perhaps we should ask devel/infra lists to look? try making accounts, try logging into apps? That might give us some good feedback before prod?

Thanks for writing all that Kevin.

Indeed we need to come up with a deployment plan.

  • 2fa for sudo. This is what we have now, and would like to keep. I guess that means anyone in a sysadmin group has to have a 2fa enrolled?

Yes. The migration script does not migrate those tokens, since they are not in FAS, and I'm not even sure we can do that.
I haven't found how to force 2FA for sudo, though. @cheimes do you know if that's possible ?

  • some more widespread testing in stg would be cool. If we could re-run the import to make sure everyone's groups are good, we could ask infrastructure list? Or even devel list? to try things out?

Yes. I'll re-run the import script.

About that, we also need to come up with a plan for conflicting accounts between Fedora and CentOS. Right now they are skipped, but we need to handle them. Sometimes it's just the same person using different email addresses, but sometimes it's likely to be two actually different persons (for logins such as alan). What should we do about those? It's apparently possible to rename a user in IPA, but do you think it'll have larger consequences? IIRC Openshift rules are set on usernames, so that would be one. Email aliases too, no? Also, how do we decide who gets to keep their username?

After that I know we want to head to prod, but we should come up with a deployment plan here as just having no auth for a while isn't going to make people happy.

Yeah. I feel like there are still quite a few loose ends.

So, I guess we are planning on keeping our current ipa servers/cluster?

Yeah I think so.

There's some groups we need to fix up... for example 'sysadmin-main' has a bunch of old folks in it. :)

Oh, good point, I did not know that.

We need to deploy noggin in openshift. Can we do this on some different url at first and switch it when we switch over? Having it all ready would be nice.

As Pingou said we'll have to use a different URL if we want to keep FAS read-only for a while, for compatibility.
Also, I haven't looked at FAS to check if it's easy to actually make it read-only, that's one more thing we need to do.

The sync script has to run... can we run it once and then just run it on users that change so we don't have to have a long downtime while the script runs?
(ie, timestamp X, run it, finishes, check all messages since X, re-run on just those users, etc).

Sounds good, we need to write that.

Hopefully we can enroll all the hosts then, remove fasClient on them, chown user homedirs to the new ipa uids.

I noticed that we used to default homedirs to be in /home/fedora/<username>. Do we want to keep that? If yes we need to tweak the defaults a bit in Noggin but it's doable.

To keep track of all that we've just started a document for the migration plan: https://hackmd.io/7eMQ-Mj_RVq9DrIqiaGySQ

Thanks for writing all that Kevin.

Indeed we need to come up with a deployment plan.

  • 2fa for sudo. This is what we have now, and would like to keep. I guess that means anyone in a sysadmin group has to have a 2fa enrolled?

Yes. The migration script does not migrate those tokens, since they are not in FAS, and I'm not even sure we can do that.

Yeah, they are in the same database, but not really very integrated. yubikey support is part of fas, but ipa doesn't do that so no need to migrate those.

I haven't found how to force 2FA for sudo, though. @cheimes do you know if that's possible ?

  • some more widespread testing in stg would be cool. If we could re-run the import to make sure everyone's groups are good, we could ask infrastructure list? Or even devel list? to try things out?

Yes. I'll re-run the import script.

Thanks!

About that, we also need to come up with a plan for conflicting accounts between Fedora and CentOS. Right now they are skipped, but we need to handle them. Sometimes it's just the same person using different email addresses, but sometimes it's likely to be two actually different persons (for logins such as alan). What should we do about those? It's apparently possible to rename a user in IPA, but do you think it'll have larger consequences? IIRC Openshift rules are set on usernames, so that would be one. Email aliases too, no? Also, how do we decide who gets to keep their username?

Great question. Perhaps we could get a list of these and see how much it happens? I guess we could look and see which is more active, they keep the username and the less active one we rename to 'centos$foo' or 'fedora$foo' ?

After that I know we want to head to prod, but we should come up with a deployment plan here as just having no auth for a while isn't going to make people happy.

Yeah. I feel like there are still quite a few loose ends.

Always are with this sort of thing, but we will get there! :)

So, I guess we are planning on keeping our current ipa servers/cluster?

Yeah I think so.

There's some groups we need to fix up... for example 'sysadmin-main' has a bunch of old folks in it. :)

Oh, good point, I did not know that.

I can tweak it. We may want to also audit our sysadmin groups and drop people who aren't active anymore before we do the migration.

We need to deploy noggin in openshift. Can we do this on some different url at first and switch it when we switch over? Having it all ready would be nice.

As Pingou said we'll have to use a different URL if we want to keep FAS read-only for a while, for compatibility.
Also, I haven't looked at FAS to check if it's easy to actually make it read-only, that's one more thing we need to do.

Yeah, perhaps we could use the same host (admin.fedoraproject.org) but /newaccounts and /oldaccounts ?

The sync script has to run... can we run it once and then just run it on users that change so we don't have to have a long downtime while the script runs?
(ie, timestamp X, run it, finishes, check all messages since X, re-run on just those users, etc).

Sounds good, we need to write that.

ok.

Hopefully we can enroll all the hosts then, remove fasClient on them, chown user homedirs to the new ipa uids.

Note that we can have more downtime for this since it's just admins affected for the most part. (except src.fedoraproject.org commits I guess)

I noticed that we used to default homedirs to be in /home/fedora/<username>. Do we want to keep that? If yes we need to tweak the defaults a bit in Noggin but it's doable.

I think we did that to seperate fasclient users from 'normal' users. I'd be just fine moving back to /home/<username>

To keep track of all that we've just started a document for the migration plan: https://hackmd.io/7eMQ-Mj_RVq9DrIqiaGySQ

Great!

About that, we also need to come up with a plan for conflicting accounts between Fedora and CentOS. Right now they are skipped, but we need to handle them. Sometimes it's just the same person using different email addresses, but sometimes it's likely to be two actually different persons (for logins such as alan). What should we do about those? It's apparently possible to rename a user in IPA, but do you think it'll have larger consequences? IIRC Openshift rules are set on usernames, so that would be one. Email aliases too, no? Also, how do we decide who gets to keep their username?

Great question. Perhaps we could get a list of these and see how much it happens? I guess we could look and see which is more active, they keep the username and the less active one we rename to 'centos$foo' or 'fedora$foo' ?

From the email Fabian sent on centos-devel there are about 123 accounts that
conflict between Fedora and CentOS. Hopefully some of these accounts are the
same person using two different email addresses. To this end, Fabian asked that
people having accounts in FAS and ACO check to make sure that the email matches
on both systems.
There will however be a number of accounts that will exist in both systems for
two different persons. According to Fabian's email, Fedora has the precedence
(which solves the point of openshift on the Fedora infra Aurélien raised). We
should check with Fabian if there is anything linked to account name/username on
the CentOS side and if there is not, we can either: reach out to the folks and
ask them to re-enlist with a different username or automatically rename their
account to something like centos_$username.

Yep. Agreed on all that @pingou. :)

Also, thinking about it, it might be nice if we could get rid of packager accounts on src.fedoraproject.org as part of this. Currently they all have real accounts on the machine (via fasClient). If we could make them all just use pagure for ssh auth (as well as https auth) that would:

a) increase security by not having them all with real accounts.

b) avoid the 'special case' of packagers being the only ones who can use ssh push/pull.

We are all past staging now. :)

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

a month ago

Login to comment on this ticket.

Metadata
Boards 2
ops Status: Done
dev Status: Done