The "qa.fedoraproject.org" network/vlan is currently:
10.5.124.128/25 and 10.5.131.0/24.
We have a pretty diverse mix of machines in this network and we would like to split them out into seperate networks so they don't have the ability to interfere with each other and so we know what types of machines are in what networks.
10.5.124.128/25 - reserve this for QA machines/instances that need isolation from everything else, the 'client' machines for taskotron, beaker, openqa. Lets call it "qa-clients"
10.5.131.0/24 - reserve this for the normal qa machines. (keep it called "qa")
10.5.128.0/24 - reserve this for secondary arch machines. ppc64, s390 and aarch64. (call it "secondary").
10.5.129.0/24 - reserve this for 'community' stuff. Call it 'community'. This would have retrace, kernel-qa, and anything that didn't fit in the rest.
virthost-comm machines will need interfaces on the various networks that they house guests for. Other instances will need re-iping and then any RHIT firewall rules needed added. We should try and map out what those might be in advance if we can at all.
I may regret saying this in the future but a whole /24 for "normal" qa stuff seems like incredible overkill to me. I can't see us using more than 50 ips in the next couple years and I'm having trouble thinking of enough things we might do that would take up the whole /24.
Using the /25 for isolated qa stuff is less overkill. I estimate that we'll end up using 75-ish IPs (not including adding any ppc/s390/aarch64 etc. to beaker if that's a direction we end up going in) if my current plans go as expected which leaves us plenty of room for growth if needed.
A few other thoughts/questions:
== Firewalls ==
One thing that I was wondering about earlier today was if we could make the firewall rules required for qa stuff a bit easier by not isolating the "normal" qa network as much from the rest of infra. That way, most of the firewall rules we have could be removed instead of migrated.
The biggest risk I see here is that '''if''' a bad actor was able to compromise something in the more isolated 10.5.124 network, the most obvious vectors past that would be through the various control machines (taskotron, openqa, beaker). If those machines aren't somewhat isolated, it'd be easier for said bad actor to get somewhere more critical than our automation systems.
I'm not sure how realistic of a risk this is, though. Part of it depends on how much we want to trust the people who have access to automation clients and I'd like to make it relatively easy to get access to those in some form or another.
== Interfaces and Gateways ==
Instead of having a bunch of firewall rules for communication between the two qa-ish networks, I was thinking about putting multiple interfaces on the control machines - one on the restricted network and the other on the less restricted network.
I'm also wondering if it would be easier/better to have a "gateway" machine on the restricted qa network to restrict what various clients have access to instead of setting up a bunch of firewall rules with RHIT. My understanding of what the various clients will need to access is:
||= System =||= Taskotron =||= Beaker =||= OpenQA =||
|| repos (batcave01 currently) || x || x || x ||
|| alt (hosting TC/RC) || ? || x || x ||
|| distgit || x || || ||
|| koji || x || || ||
|| bodhi || x || || ||
So, after a bunch of discussion today, new plan:
10.5.124.0/25 and 10.5.131.0/24 - stay the way they are now. Same vlan, etc. This will allow us to keep hosts on those nets working while we move things we want to move off.
We are going to setup an ansible variable for all qa machines that need more isolation: taskotron, beaker, openqa. Then all hosts that are not in that set, but in the qa network will get iptables rules to drop all traffic from those hosts. This will provide us with some isolation from them.
Next we are going to move off the ppc/s390/arm/aarch64 machines to the 10.5.128.0/24 network. It will be it's own new vlan.
Finally we will move anything thats 'community' left in the qa network... this would be at least: cosmos01/02, retrace01/02, kernel01/02. These will go to a new 10.5.129.0/25 network on it's own vlan.
At the end there should only be qa items left in the qa network.
ok, another change:
10.5.129.0/25 - 'community' stuff.
10.5.129.128/25 - reserved for new hp moonshoot guests, vms, hosts, etc.
An update for this from a secondary PoV. We've mostly moved off the QA network with just a handful of hosts left:
sigul should be gone this week, the remainder all have associated tickets with s390 likely the one to take the longest here.
@kevin where is this in terms of progress and as an apprentice can I help?
So, we still have to move the arm hub/db, and the 'misc' stuff...
arm-koji01, db-arm-koji01, cosmos01/02, kernel01/02, retrace01/02.
The arm's I can do as soon as their virthost is back on line (its down now due to hardware issues).
The retraces will move after they get things setup in staging with the new deployment method.
I'm kinda leaning toward leaving the cosmos and kernel boxes where they are.
So, after I move the arm ones this can get closed. I'm not sure there's much to be done here from an apprentice standpoint.
The arm ones here are going to go away when f25 goes end of life anyhow, so I think I will just close this now.
Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)
to comment on this ticket.