#6043 Request for resources: WaiverDB
Closed: Fixed 6 years ago Opened 6 years ago by mjia.

In addition to https://pagure.io/fedora-infrastructure/issue/6009, we'd like to deploy staging and production versions of WaiverDB. Initially, we do not expect to have significant CPU or memory requirements, as it's just a small Flask application(Only a JSON API, no web UI).

We'll need a standard postgres database (one on the shared db host will be fine).


@dcallagh, are we doing this for the F26 development cycle?

The real target is F27, but we have lots to do for F27. If we can get this set up before the F26 is over, that will buy us more time to work on all the other pieces.

Just talked with @maxamillion and @kevin about this...

they are standing up an openshift deployment environment in staging right now and expect that they'll have a production environment "soon".

Since I know we're targetting waiverdb deployment to openshift in another part of the world, it makes sense to do the same here in Fedora if we can.

@dcallagh, @mjia, is it ok with you to put this RFR on pause until that cluster is ready? waiverdb would get to be the guinea-pig app.

🎉 Neat!

Yes I think that sounds like a great idea. My experience with OpenShift 3.4 internally so far has been really promising.

We will let you know when we have our deployment ready for the first app... hopefully it won't be too long.

@ralph I assume you are sponsoring this. Or I suppose I could.

Metadata Update from @kevin:
- Issue tagged with: request-for-resources

6 years ago

@kevin, I'm an openshift newbie. Can we tag-team it?

Me too! but sure we can...

FYI this Waiverdb PR has some fixes to get the app running in OpenShift, plus some templates for generating a complete Waiverdb deployment:

https://pagure.io/waiverdb/pull-request/46

@dcallagh I was looking at setting this up in our stg openshift. The yaml file has:

          image: "docker-registry.engineering.redhat.com/factory2/waiverdb:${WAIVERDB_APP_VERSION}"

but we can't access that host from the Fedora net. Could you see about going through the process here: https://fedoraproject.org/wiki/Container:Review_Process to make the image official?

Yes! I will get that started.

The OpenShift templates we have in git right now are just being used by our Jenkins, which is setting up test environments on an internal OpenShift instance, which is why it's using that internal registry.

But the real release versions of the containers should be built properly in Fedora infra.

So I have had this ready for quite a few days now... I had to craft a good help.md but the Dockerfile was trivial to write. I don't see any way to test the container build using fedpkg container-build since even scratch builds assume you have a dist-git repo to point at. But I wanted to at least see docker build run successfully before I file the review bug.

Earlier this week when I tried, we realised mjia forgot to push the waiverdb F25 update to stable (he had pushed EPEL7 since that is what we had been targetting until now). And now today I tried the build again but Fedora infra is having a huge outage right now so it still fails.

So on Monday I'll try it again, hopefully I can see a working docker build and then I'll file the review. :-)

Scratch that, I managed to get an answer out of Mirrormanager so I have successfully tested building the container.

Review request filed: https://bugzilla.redhat.com/show_bug.cgi?id=1464329

The waiverdb container is approved and built now. You can pull from: candidate-registry.fedoraproject.org/f25/waiverdb:latest

@kevin, @puiterwijk: is there a ticket kicking around somewhere that I can use to track the status of os.fedoraproject.org?

FYI, from discussion in #fedora-apps the next step here is that we need an openshift template for waiverdb.

Also, @puiterwijk the code at https://pagure.io/waiverdb is ready for security audit.

Metadata Update from @ralph:
- Issue tagged with: security

6 years ago

@dcallagh or @mjia, when you have time can you oc export -o yaml ... the build and deployment configs from upshift and paste them here (scrubbing for any details, of course).

The idea here is that we're going to commit them to ansible/roles/waiverdb/files/openshift/... or something like that, and then use an ansible playbook to oc apply ... them to os.stg.fedoraproject.org.

(really, since waiverdb made it through OSBS, we don't need the buildconfig.. but it might be nice to have it just to see.)

Hmm yeah I don't have any buildconfig at all, since internally I was deploying it from the container images that our Jenkins build was spitting out. Jenkins outside of OpenShift that is... not using OpenShift's integrated build pipeline stuff.

Personally I am still on the fence about letting OpenShift build the images for us (its S2I build strategy is a very flawed design IMHO) so I think that building an "official" Fedora layered image through OSBS and deploying that is the best approach anyway.

We have a YAML basically ready to go, there is a test environment template committed to waiverdb's source code (in the PR mentioned above) and I have a "production-like" one which is basically the same but with some names tweaked. I can paste that here (I guess with names tweaked further to suit Fedora) although it might be too long for a Pagure comment...

@dcallagh Do note that Pagure tickts allow file uploads :).

Oh yeah, file attachment works. :-)

Here is a YAML for a Fedora staging instance. I wrote this blind, in the sense that this exact configuration hasn't been tested anywhere, but it should mostly work. Any problems with it and we can iterate.

Note the comment at the top about pre-creating a secret. I guess you would either not commit the OpenShift secrets to git at all, or you could keep them in separate YAML definitions in the normal Ansible secrets place I guess.

waiverdb-stg-fedora.yaml

A few things I'm not sure about...

Is waiverdb.stg.fedoraproject.org the right hostname, and will the staging OpenShift have a wildcard cert that matches that?

Is using registry.access.redhat.com/rhscl/postgresql-95-rhel7:latest for the Postgres image okay? I think it doesn't require anything special to pull that image but it will refuse to execute if you aren't on RHEL with a valid subscription. Are the OpenShift nodes running RHEL? We could switch to a community Postgres image if preferred.

The image stream is defined as pulling from candidate-registry.fedoraproject.org/f25/waiverdb:latest (not registry.fedoraproject.org) because I have no idea how I am supposed to get my newly imported waiverdb image pushed to the stable registry. :-)

Is waiverdb.stg.fedoraproject.org the right hostname, and will the staging OpenShift have a wildcard cert that matches that?

The OpenShift URL will be waiverdb-waiverdb.app.os.stg.fedoraproject.org probably, but we will proxy this through our reverse proxies, giving us waiverdb.stg.fedoraproject.org.

Is using registry.access.redhat.com/rhscl/postgresql-95-rhel7:latest for the Postgres image okay? I think it doesn't require anything special to pull that image but it will refuse to execute if you aren't on RHEL with a valid subscription. Are the OpenShift nodes running RHEL? We could switch to a community Postgres image if preferred.

It is okay to use this image.

The image stream is defined as pulling from candidate-registry.fedoraproject.org/f25/waiverdb:latest (not registry.fedoraproject.org) because I have no idea how I am supposed to get my newly imported waiverdb image pushed to the stable registry. :-)

That's fine.
The intention is that we will have Bodhi to control the promotion at some point, but current you just ping @maxamillion I think.

The OpenShift URL will be waiverdb-waiverdb.app.os.stg.fedoraproject.org probably, but we will proxy this through our reverse proxies, giving us waiverdb.stg.fedoraproject.org.

Okay, this means the SSL termination is done by the Varnish reverse proxy, not inside OpenShift? In that case the OpenShift Router definition will needing tweaking I think...

We will have two layers of SSL encryption: client sends to proxy, proxy terminates, and then proxy sends to openshift routes, openshift terminates.
The current router definition should be fine, but we'll find out when we try to add it to the proxies.

We might just to drop the hostname: property on the Router and let it use the default generated hostname like waiverdb-waiverdb.app.blahblah.

Well, in the checked in one we would want it to have waiverdb-waiverdb.app.os.stg.fedoraproject.org, so that it's consistent in using that one :).

I have filed two documentation issues, but other than that waiverdb should be fine, and is cleared for production.

Metadata Update from @puiterwijk:
- Issue untagged with: security

6 years ago

So, this is in production now... however, one last thing to figure out: monitoring.

I know openshift will spawn a new pod if one stops responding, but do we also want to have nagios monitor to make sure we have valid data / alert us when the instance isn't able to spin up for some reason?

Good question! OpenShift already does its own monitoring/supervision of the individual pods, so I don't think we need to replicate that inside Nagios. Maybe Nagios can just monitor the overall availability of the web endpoint though.

I filed a separate ticket for the waiverdb monitoring: #6423 since I'm not familiar with Nagios and will need a little help in setting it up properly.

For the record, @puiterwijk has asked us to fill out the details here a bit more formally... so I'm going to use his template from infra-docs#64

@ralph: in this case, the audit was tracked in this same ticket. So jsut a link to this ticket would be fine.

I think we can now close this as we are in production?

If there is anything further to do, please feel free to re-open.

:new_moon_with_face:

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

6 years ago

Login to comment on this ticket.

Metadata
Attachments 1
Attached 6 years ago View Comment