#153 design, deploy and document Fedora OpenShift Playground (FOSP)

Created 2 years ago by goern
Modified a year ago

This ticket is to coordinate the activities to design, deploy and document the Fedora OpenShift Playground (FOSP).

Technical work is conducted by puiterwijk and misc, help is offered by scollier and goern

jzb to create a wiki page describing the SLA of FOSP

Also:
https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org/thread/TVTJ6PFKTY47ATW6DWJK3CRG254SNY7X/

We can discuss this on the cloud-wg list, or on the infra list, I took the infra because I kinda expect this would be more suitable, but since this discussion is about the requirement an the outcome, maybe a ticket would have been better.

Per discussion in today's meeting. Josh Berkus (jberkus) has agreed take point on this as single point of contact.

Assigning ticket to him. We should revisit before Flock, but probably not right after Summit.

This is currently in a holding pattern around hosting.

Still waiting on availability of CNCF hosting.

Misc and I are meeting next week to flush out the architecture a bit more and maybe even kick off an install to kick the tires on.

When? I'd like to join.

Replying to [comment:8 jberkus]:

When? I'd like to join.

Next Friday. I added you.

So, after the meeting, we did finished with:

1 repo (https://github.com/fedora-cloud/fosp-playbooks)

2 action items:
- see for the DNS, ie get a domain name, and a wildcard to point to the server we allocated
- discuss the authentication we want. So far, we wanted to use openid connect, see if that's doable with ipsilon as deployed by Fedora

The TODO list is to create a playbook that deploy the bastion host, ie deploy ansible, openshift ansible and run the deployment based on the host file from the git repo linked earlier.

We did select 3 masters, 2 applications nodes and 2 infras nodes for now in the Fedora cloud.

Periodic update time. The dns related issue have been fixed (with dnsmasq and hosts), and now, we bumped into https://github.com/openshift/openshift-ansible/issues/2294

I am a bit unsure on the way to bypass it, between "patching openshift-ansible", and "carrying a fixed version of seboolean".

Why fork, I have no issues carrying the patched seboolean module in openshift-ansible until we can require a version of ansible that has the fix.

Next work session is Sept. 23

moved works session to Oct 3rd.

moved working session out a couple of weeks due to the seboolean issue. That gives time for the patch to get rolled in and for misc to test those packages.

So my last comment got lost due to the move; But in short, the installer work (I even did tested it on python 3), and so I pushed the last bits on github.

Now, we just need to configure haproxy to expose the master and other on the web, and that should be it. I may try to do that tomorrow, or during the openstack summit.

So, summit took more of my time, and so did vacations, and Centos interlock meeting as well. But I did manage to get openshift exposed on the web, then found that default auth is "all is valid". I acurrently redeploying the cluster with a regular user (instead of admin/admin).

So, I have added my own auth there. For now, i suspect I have been a bit too optimist with haproxy and the roundrobin setup is not working as I hope (I am just forwarding the port 8443 to the 3 master nodes, so maybe I should just set a ssl certificate on the bastion and be done with it).

As Patrick asked on IRC, by "my own auth", I have used the Htpassword provider for now. I plan to switch to using the Fedora auth system once I have at least a clue on what I am doing regarding auth, and making susre the system work. There is also a bit of refactoring to do, cause for now, there is no role at all.

So, I have added my own auth there. For now, i suspect I have been a bit too optimist with haproxy and the roundrobin setup is not working as I hope (I am just forwarding the port 8443 to the 3 master nodes, so maybe I should just set a ssl certificate on the bastion and be done with it).

FWIW, we have been using "balance" as the method, not roundrobin. Things seem to be working on our end. What did you end up doing here?

@walters yep, origin.

https://github.com/fedora-cloud/fosp-playbooks/blob/master/hosts#L42

@scollier I just see firefox asking me to always accept the certificate (even when I did accept 5 seconds ago), but I didn't dig into why (as it was kinda working before I did clean the config file and refactored it). my intuition was that Firefox was just seeing a different certificate on each request due to the load balancing on tcp level rather than http.

I will try "balance".

So I was a bit tired, and replaced "roundrobin" with "balance", then spent 5 minutes pondering why it was "balance balance" in the config file, and how did I managed to make a wrong search/replace....

So yeah, replacing roundrobin by source make it work better, so next step is "plugging to fedora auth". And then deploy something.

We also need a domain name, IIRC we had one, no ?

Not since last update. I may have time to look at that next week, but there was some end of the FY deadline that did appeared on monday and that took my time so far.

Sent a email regarding hardware on the list. Didn't had time to do anything regarding deployment.

The current status is still "playbook are here, but we need to connect that to FAS".

Hardware has been at the data center waiting for the RAL2 to RAL3 migration completed on 15 March. (We had to wait until in the new space for sufficient power and a new rack to install into.)

I'm opening the ticket to get it racked & work out the networking details. For the project information side, I've included Michael Scherer and Marc Dequenes (OSAS) and Josh Berkus (Fedora Cloud SIG). If you need someone else involved, let me know; note that this racking ticket is internal-only to Red Hat, so if we need to have e.g. the networking discussion out here, let's do so, and carry the info back over to that ticket.

  • Karsten

i'd like to have some ownership of this as well mainly because I'd like to have administrative access. Not to abuse, but to have :).

I can add your ssh keys so you can connect, just tell me which one I should use, and I will add and deploy. (and ping me on irc if I forgot once I am back to europe)

status: still waiting on hardware, but we may be able to find hosting for pa.io in openshift online before we get FOSP

removing meeting tag until we get hardware

a year ago

Metadata Update from @dustymabe:
- Issue untagged with: meeting

So, on the hardware side. We got it and plugged it ~1 month ago, then we found that some of the blades had a problem that we suspected to be "hardware". Due to a mix of urgent stuff and PTO (PTO being urgent to use before end of may), I wasn't able to fix. Duck did investigate, but didn't found much either. We are now trying to move them to another chassis (Centos) before doing a RMA.

As those blades were supposed to be "utilities nodes" (ie, ssh, dhcp, etc), this is a bit blocking the deployment, but we will try to find another temporary layout so we can install Fedora on the others blades (who seems to start fine).

Login to comment on this ticket.