#9152 staging noggin deployment planning
Opened 6 months ago by kevin. Modified 18 hours ago

Greetings.

We are now ready to build up our staging env again and I figured I would file a ticket to coordinate noggin deployment along with all the other things we need for it.

Some questions:

  • Does noggin work/will it work in openshift? If so, I can do a openshift deployment first, if not, we can just do it in a vm.

  • unfortunately (or perhaps fortunately), we didn't save the old staging ipa server, so I did a new deployment from scratch in a vm. (ipa01.stg.iad2.fedoraproject.org).
    Does noggin need anything from the ipa server configuration wise? The playbook is currently failing on:
    ipa: ERROR: Host 'id.stg.fedoraproject.org' does not have corresponding DNS A/AAAA record, but it does... not sure whats going on there.

  • I'm assuming noggin needs ipa and ipsilon and a proxy, any other services?

Things we need to figure out:

  • Should we just start from 0 for now? (ie, have admins make accounts, etc) or do we want to try and migrate data from prod?

  • we need to figure out ssh access/replacement for fasClient

  • we need to figure out sudo access/replacement for pam_url

cc @abompard @pingou @puiterwijk


Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: groomed, medium-gain, medium-trouble

6 months ago

Does noggin work/will it work in openshift? If so, I can do a openshift deployment first, if not, we can just do it in a vm.

Yes, The two webapps will run in OpenShift. I have the yaml files I used for the CommuniShift and RH instance deployment so I can reuse them for staging. If you make me a role folder in Ansible I'll put them there. I'll also need a couple secrets obviously.

unfortunately (or perhaps fortunately), we didn't save the old staging ipa server, so I did a new deployment from scratch in a vm. (ipa01.stg.iad2.fedoraproject.org). Does noggin need anything from the ipa server configuration wise?

I'll will need to have the freeipa-fas plugin installed, but IPA should be installable without it, and we can add it later.

The playbook is currently failing on: ipa: ERROR: Host 'id.stg.fedoraproject.org' does not have corresponding DNS A/AAAA record, but it does... not sure whats going on there.

Hmm, not sure either.

I'm assuming noggin needs ipa and ipsilon and a proxy, any other services?

I'll need to connect to the RabbitMQ servers for Fedora Messaging, but I think that's all.

Should we just start from 0 for now? (ie, have admins make accounts, etc) or do we want to try and migrate data from prod?

We can try the migration script.

we need to figure out ssh access/replacement for fasClient
we need to figure out sudo access/replacement for pam_url

That should just be running ipa-client-install, I believe.

Metadata Update from @abompard:
- Issue untagged with: groomed, medium-gain, medium-trouble
- Issue priority set to: Needs Review (was: Waiting on Assignee)

6 months ago

Metadata Update from @abompard:
- Issue tagged with: groomed, medium-gain, medium-trouble

6 months ago

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

6 months ago

Note that I also have the Noggin stack packaged as RPMs and it can be deployed to a container or a VM with it. This also includes freeipa-fas plugin. Without the plugin installed and FreeIPA reconfigured with it, Noggin will break pretty badly.

Oh, and here's the COPR where I've built all this: https://copr.fedorainfracloud.org/coprs/ngompa/fedora-aaa/

I'm basically waiting on @abompard's approval before upstreaming these into Fedora itself.

ok, we now have a staging openshift cluster up and running. It doesn't have any web or remote access yet however (that needs some firewall rules setup for staging proxies).

That said, I think we can start working on deploying noggin there anytime.

I assume ipsilon needs some config adjustment to talk only to IPA and not fas? We should do that too and deploy it in openshift.

Also, does IPA need any changes?

@abompard since we have no auth currently or web interface, how about I just put your ssh key for root on os-control01.stg.iad2.fedoraproject.org ? You can login there as root and ssh to os-master01.stg.iad2.fedoraproject.org (or any of the cluster) to debug things. Is that acceptable? Or do you just need to deploy via playbook and don't need any more access? Anything else you need? Once we have noggin up and ipsilon, we can look at sorting out our ssh / local admin accounts plans.

Yeah I'll also need access to the FreeIPA server to deploy the extension (or rather: be allowed to run the playbook that will do it, probably the same as the freeipa one)

Then I can start on ipsilon & noggin. I have the openshift yaml files for noggin but I haven't written playbooks, I can start with that once I have access. I haven't used root on openshift yet, I hope I won't break things...

Status update:

  • Noggin and FASJSON are deployed in staging openshift, but I can't check that they actually work because there's currently no way to get to the web UIs (if I understand correctly the proxies aren't set up)

  • Ipsilon is not deployed, and I'll work on that today. It needs to be configured to pull information from IPA/LDAP and not FAS, so there are configuration changes to do to the current playbook/role. I'd welcome help from someone who know Ipsilon well to figure out those changes and how to deploy them, because we can't use the nice --ipa installation switch with containers

Ipsilon does have a plugin to get info from LDAP, but I don't think we can use it as-is, because the FAS info plugin seems to do more stuff (be careful @nphilipp or @ryanlerch , the infofas plugin that is actually deployed is a modified version that lives in ansible/roles/ipsilon/files/infofas.py) especially around the AWS roles.

I'm not sure how much of a IPA client setup Ipsilon will need to switch to IPA-based authentication and information, but we can't run ipa-client-install in a container, so we'll have to figure something out. For FASJSON I'm using a lightweight system that gets the IPA CA cert and a service keytab, but I'm not sure it'll be sufficient for Ipsilon. Maybe the IPA folks ( @cheimes ?) can shed a light on that as they've probably had container-based IPA clients in the past already.

Finally, the IPA server in staging seems to be up and running fine with the FAS plugin installed. There are almost no users at the moment, so we can start testing the import script, but it's there and machines where fasClient used to run can be enrolled to enable SSH access.

That's all for me today! I'll be off most of next week but @nphilipp and @ryanlerch are around (hop, right under the bus, you're welcome).

Ipsilon is not deployed, and I'll work on that today. It needs to be configured to pull information from IPA/LDAP and not FAS, so there are configuration changes to do to the current playbook/role. I'd welcome help from someone who know Ipsilon well to figure out those changes and how to deploy them, because we can't use the nice --ipa installation switch with containers

I believe @hellcp has a working Ipsilon configuration with FreeIPA that he could share to template for the container deployment.

That'd be very nice, thanks.

@puiterwijk is more qualified to talk about IPA integration for Ipsilon. As far as I remember Ipsilon uses SSSD, mod_auth_gssapi, and mod_lookup_identity. The lookup identity module depends on SSSD's info pipe and D-Bus. At a minimum you have to configure authselect as authselect select sssd, configure SSSD, enable info-pipe in SSSD config, and have a minimal IPA installation with default.conf, ipa.crt, /etc/krb5.keytab, and a keytab for HTTPd.

We met today to move this deployment forward.

I have created a db-fas01.stg.iad2.fedoraproject.org that has the prod fas db loaded in it.

You should now be able to deploy a fas in staging openshift and migrate from it to noggin.

Should we keep this open? Or close until we see more to do? The sssd part of things still needs to be sorted tho.

Thanks Kevin!

I would like to keep this open for now in case the team run into any issues deploying so we have a reference of what was asked.

Happy to open a separate issue for sssd if that would help too.

freeipa-fas is now in EPEL8 and Fedora. I'm progressing through shipping the rest of the Noggin stack in Fedora.

python-noggin-messages is now submitted for Fedora 33: https://bodhi.fedoraproject.org/updates/FEDORA-2020-ff21d2d01a

I cannot build it for EPEL8 due to lack of Poetry in EPEL 8.

Filed https://bugzilla.redhat.com/show_bug.cgi?id=1898395 but poetry seems orphaned now perhaps in favor of poetry-core.

That's odd, I'm pretty sure @thrnciar maintains it, which is confirmed by https://src.fedoraproject.org/rpms/poetry

See on the side where all epel and fedora bugs are overridden to 'orphan' ? and indeed the bug I filed above is assigned to orphan. :(

When the package was unorphaned, we haven't noticed it had bugzilla overrides set to orphan. I've reset to defaults now.

Apps in staging that don't seem to be working:

  • mirrormanager
  • kerneltest
  • koschei
  • mailman
  • COPR

Apps in staging that don't seem to be working:

  • mirrormanager

https://admin.stg.fedoraproject.org/mirrormanager/ seems to work? I can even login ok... :)
Or was there another part that wasn't working?

  • kerneltest

This seems to be a schema thing... prod is running the old version in a vm, and I just copied prod->stg db, so I think we need to load whatever schema this uses, but I am not sure where it is. ;(

  • koschei

Wasn't deployed at all. I did so and fixed some things and now it's up and I can login. ;)

  • mailman

I never deployed a stg mailman because I figured we would just do so once we had the new versions packaged up. Is that still reasonable to wait, or do you need it now?

  • COPR

stg copr looks like it might be pointed to our stg proxies from back when we were trying to deploy frontends in stg. I think we just need to update this in dns. @praiskup does that sound right?

Great stuff. Keep em coming. :clock830:

stg copr looks like it might be pointed to our stg proxies from back when we
were trying to deploy frontends in stg. I think we just need to update this in
dns. @praiskup does that sound right?

I'm not sure what proxies you mean? We have
"devel"
and "production" only, and the "staging" was kind of a different thing when it
was still running (different playbook for frontend, attempt to move copr
frontend to a better shape, aim to move to openshift).

So speaking of staging, it is not running at all now and in the next few weeks
we will have no time to make it working again unfortunately.

https://admin.stg.fedoraproject.org/mirrormanager/ seems to work? I can even login ok... :)
Or was there another part that wasn't working?

Oh cool, it wasn't working when I tried, a database was missing.

  • kerneltest

This seems to be a schema thing... prod is running the old version in a vm, and I just copied prod->stg db, so I think we need to load whatever schema this uses, but I am not sure where it is. ;(

I'll investigate this.

  • koschei

Wasn't deployed at all. I did so and fixed some things and now it's up and I can login. ;)

Very cool, thanks!

  • mailman

I never deployed a stg mailman because I figured we would just do so once we had the new versions packaged up. Is that still reasonable to wait, or do you need it now?

Nah we can totally wait and leave it for last.

  • COPR

speaking of staging, it is not running at all now and in the next few weeks
we will have no time to make it working again unfortunately.

Understood. @praiskup could the "devel" instance be pointed at the staging ipsilon instance for a short time so we/you can test auth? It's id.stg.fedoraproject.org and since you're using OpenID, it should require no configuration on our part (I think).

So, next steps here I think are:

  • wait for the sync of fas->noggin data to test with.

  • get sssd / ssh/sudo rules setup in our ansible so people can ssh in and sudo via sssd.

  • wider testing of both then move on to prod

Or am I missing things?

@praiskup could the "devel" instance be pointed at the staging ipsilon instance

Yes, tracking in https://pagure.io/copr/copr/issue/1614

Metadata Update from @smooge:
- Issue tagged with: dev, ops

a month ago

So, just to update with my understanding of deployment:

  • The fas->noggin sync is still being polished (or is it done now?)

** Did we still need to figure out what fas users/groups we do NOT want to sync?

** Did the issue of no groups on some users get solved?

  • sssd / ssh / sudo:

** I merged the PR from @nphilipp and applied it in staging, but it still doesn't work right. The groups are not as it expects or something. Can we schedule time to work on this?

** sudo I think just needs some more ipa rules being added for each host

NOTE: This work is BLOCKING the mbs 3.0 deployment, because they can't get to their staging instance to upgrade it. ;( I'd really appreciate moving it forward soon at least to the point where they are unblocked.

  • We should figure out prod plans. I think we may want to start with a fresh ipa cluster (but then everyone would have to change their password... which we can do, but always flames from that). We need to figure out timing based on the fedora 34 cycle. We also need to figure out how to cleanly remove fasClient and users from a install, or plan on re-installing everything.

Let me know if I can help out/meet on the ssh/sudo and scheduling questions. :)

  • The fas->noggin sync is still being polished (or is it done now?)

We put some improvements and bug fixes into fas2ipa recently and as far as I know, we don't have any more open issues on that front.

** Did we still need to figure out what fas users/groups we do NOT want to sync?

IIRC, we decided to transfer all groups now and clean out later.

** Did the issue of no groups on some users get solved?

I'm not sure about the details but it looks like users that were created before fas2ipa was run for the first time didn't get their groups—Ryan Lerch’s and my user have no groups at all, yours only has sysadmin-main. @abompard, your user seems to have a full set of groups, did it only get transferred with the fas2ipa run?

I have a suspicion that fas2ipa only updates user/group relations on existing users if it detects changes on the user's data. I'll look into it today.

  • sssd / ssh / sudo:

** I merged the PR from @nphilipp and applied it in staging, but it still doesn't work right. The groups are not as it expects or something. Can we schedule time to work on this?

Aside from the group membership issue, we should verify that the HBAC rules on IPA in staging are configured properly. As I wrote the PR I'm happy to assist there—I'm free anytime in my afternoon until 6pm CET (5pm UTC, 12pm EST, 9am PST) today or later in the night if needs be.

** sudo I think just needs some more ipa rules being added for each host

That's what I work on now when not debugging fas2ipa. :wink: I hope to have something to show at the end of the day.

I have a suspicion that fas2ipa only updates user/group relations on existing users if it detects changes on the user's data. I'll look into it today.

This doesn't seem to be the case, running fas2ipa for my user (re)creates group memberships for it, regardless of if it did or didn't exist before—I removed some group memberships from the existing user and it restored them without issue.

@abompard @kevin @mobrien How about running the fas2ipa script in staging again, first only for users with known problems, then for everybody? As the whole set of users takes very long to process, kicking it off before the weekend seems like a good idea to me.

@abompard @kevin @mobrien How about running the fas2ipa script in staging again, first only for users with known problems, then for everybody? As the whole set of users takes very long to process, kicking it off before the weekend seems like a good idea to me.

+1

Aside from the group membership issue, we should verify that the HBAC rules on IPA in staging are configured properly. As I wrote the PR I'm happy to assist there—I'm free anytime in my afternoon until 6pm CET (5pm UTC, 12pm EST, 9am PST) today or later in the night if needs be.

Yes, I think the hbac rules are not right. ;)
I am trying to login to ipsilon01.stg...

A test shows:

ipa hbactest --user=kevin --host=ipsilon01.stg.iad2.fedoraproject.org --service=sshd


Access granted: False

Not matched rules: allow_systemd-user
Not matched rules: group/sysadmin-main
Not matched rules: ipsilon
Not matched rules: shell-access/host/ipsilon01.stg.iad2.fedoraproject.org

the group/sysadmin-main has sysadmin-main as a group which I do, but it's not matching?

Also, I am a bit confused because we have 'ssh' 'sshd' and 'shell-access'. Perhaps we could just stick to one of them? :)

I also had to make some changes to sshd_config but thats pretty minor.

Happy to work on it, but it's already after your time... if you want we can try and find an early time next week and I can get up early and we can work on it?

Not matched rules: group/sysadmin-main

This one should match. Let's review this together on Monday.

Happy to work on it, but it's already after your time... if you want we can try and find an early time next week and I can get up early and we can work on it?

Thanks for the consideration! But it hopefully doesn't take that long , so let's meet after my dinner? Say around 7:30pm CET, 10:30am PST?

Login to comment on this ticket.

Metadata
Boards 2
dev Status: Backlog
ops Status: Backlog