Issue #9569: please allocate 40 IP addresses to copr hypervisor - fedora-infrastructure

fedora-infrastructure

#9569 please allocate 40 IP addresses to copr hypervisor

Closed: Fixed 3 years ago by smooge. Opened 3 years ago by praiskup.

Hypervisor hostname: vmhost-x86-copr01.rdu-cc.fedoraproject.org

We should be OK to run ~32 copr builders on that machine (and few more
smaller VMs for devel copr instance), but due to the nature of copr
(copr-backend is controlling builders) we need to have access to the
builders from copr-backend machine (currently hosted in AWS).

We don't need DNS records for the builders, and DHCP either. Even though
being able to use DHCP would be awesome.

praiskup commented 3 years ago

Even though being able to use DHCP would be awesome.

We already had static IP allocation on copr builders historically, when
virthost-aarch64-os01.fedorainfracloud.org and virthost-aarch64-os02.fedorainfracloud.org were up&running (old lab).

smooge commented 3 years ago

We don't run the IP allocation space for the OSPO cage where this system is. I also do not think they have 40 free ip addresses on this space. You will need to contact @misc 's team for this.

We also do not have dhcp on that network as the GNOME group has an open DHCP server which conflicts with any other server.

The only way I could see this working is that the vmhost allocates its own private network and talks to the copr systems that way.

praiskup commented 3 years ago

Hmpf. That will non-trivially complicate the configuration, at least I don't think
it is wise to allocate VPN for each vmhost that might be connected to copr in
the future :-/

Could we use some pre-existing VPN in Fedora infrastructure? (which would
mean that builders would have some IP range there, and copr-backend too)

kevin commented 3 years ago

Hmpf. That will non-trivially complicate the configuration, at least I don't think
it is wise to allocate VPN for each vmhost that might be connected to copr in
the future :-/

Could we use some pre-existing VPN in Fedora infrastructure? (which would
mean that builders would have some IP range there, and copr-backend too)

I don't think we want to mix builders doing untrusted 3rd party builds with our database and application backend servers. ;(

We could of course setup another one for copr, but then it's just like you were doing it directly, not much advantage in having it central?

Metadata Update from @mohanboddu:
- Issue tagged with: medium-gain, medium-trouble, ops

3 years ago

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

3 years ago

praiskup commented 3 years ago

I guess that ipv6 wouldn't make any difference?

I don't think we want to mix builders doing untrusted 3rd party builds with our database and application backend servers. ;(

understood

We could of course setup another one for copr, but then it's just like you were doing it directly, not much advantage in having it central?

Yes, our team would take care of this...

Edited 3 years ago by praiskup

smooge commented 3 years ago

I talked with misc and the site has plenty of ipv6 space. If the systems can be ipv6 only then you can have space.

praiskup commented 3 years ago

We can not disable ipv4 stack entirely - but I suppose it can be IPv4 behind NAT,
and accessible via ipv6 from the outside?

I'll try to chat with our team .. the thing is that we don't have ipv6 on copr-backend yet, which would require us to configure the ipv6 networking in us-east-1c AWS availability zone.

praiskup commented 3 years ago

Ok, IPv6 sounds better to us. Both it should be easier to configure, and there
won't be a networking bottleneck (no need to route any trafic through single
box, either in AWS or in our lab).

So we'll need to configure copr-backend to have some ipv6, first, which require
us to configure us-east-1c - can that be done please? Without this we can not
assign any IPv6 to the host in the AWS UI
then we need to allocate some IPv6 range that we'll assign manually later on

Should I split this ticket to two?

mobrien commented 3 years ago

@praiskup I have added an ipv6 cidr block to vpc-0afefac8bae905972 which is where your instances appear to be running.

You will likely possibly need some on host editing to get it to work. I know that normally Fedora cloud images and AWS ipv6 don't work together out of the box. I had an issue with this before and had to set static ips on the hosts

praiskup commented 3 years ago

Thanks! I tried to start new VM in subnet-0995f6a466849f4c3, and it says:

Subnet does not contain any IPv6 CIDR block ranges

Which is weird. Perhaps I'm doing something wrong? I thought that - when the subnet is configured - I'l be able to assign ipv6 to an existing instance (== no need to re-spawn the copr backend machine from scratch?).

You will likely possibly need some on host editing to get it to work.

We should be OK to test the process on devel machine, so overall it should be non-risk.

praiskup commented 3 years ago

The network vpc-0afefac8bae905972 is newly listed as vpc-0afefac8bae905972 | copr which IMO isn't entirely correct as other teams probably use this network as well (but I didn't check, historically we used this one because it was the only one unassigned to a particular interest group).

mobrien commented 3 years ago

The network vpc-0afefac8bae905972 is newly listed as vpc-0afefac8bae905972 | copr which IMO isn't entirely correct as other teams probably use this network as well (but I didn't check, historically we used this one because it was the only one unassigned to a particular interest group).

Sorry, my bad. I can unassign that

mobrien commented 3 years ago

Thanks! I tried to start new VM in subnet-0995f6a466849f4c3, and it says:

Subnet does not contain any IPv6 CIDR block ranges

Which is weird. Perhaps I'm doing something wrong? I thought that - when the subnet is configured - I'l be able to assign ipv6 to an existing instance (== no need to re-spawn the copr backend machine from scratch?).

This should work now, I had the VPC configured but not all the subnets

praiskup commented 3 years ago

OK, I was able to assign IPv6 to the existing VM, and new one. I don't seem to get routed to the external world (ping6 ipv6.google.com doesn't work) but i suppose I have to fix something on the VMs, not in the cloud. Thank you! Lemme experiment with that the day after tomorrow.

Can we please allocate the range of ipv6 addresses to the vmhost I mentioned above?

mobrien commented 3 years ago

There is an ip6 entry in the route table so ping6 should work. May be a host issue

praiskup commented 3 years ago

Ok, the new box was routing just fine from the beginning. The old box (current copr-backend-devel machine) had to be restarted through AWS (so aws knew it was restarted) and then it started routing. Those two boxes I talk about can not communicate with each other over ipv6.

Could we please allocate the ipv6 range for the vmhost-x86-copr01.rdu-cc.fedoraproject.org?

kevin commented 3 years ago

@smooge / @misc whats the process for this? pick a /64 and put it in... a doc?

misc commented 3 years ago

So I may be wrong, but we usually use EUI-64 to assign IP (eg, it is derived from the mac, but as that's not how we documented it on https://osci.io/infra_procedures/host_network_setup/ , maybe I am wrong ).
Each server has its own /64 range. AFAIK, only osci.io server use IP v6 for the moment in the cage, so that's still a bit rough.

@duck have the details. And while we can't add a dhcp server as pointed by smooge, we can surely add a dhcpv6 one or start to announce IPv6 prefix on the vlan 190. Or start to get a shared dhcpv4/v6 server and remove the one from gnome ?

Now, the biggest problem I see is that we have IP v6 on the public vlan (vlan 190), but there is no NAT for IP v4 on that vlan, and that's a requirement it seems ?

smooge commented 3 years ago

@misc when we discussed ipv6 layout a couple of years ago with various tenants the EUI-64 was going to be problematic in a similar reason why we weren't allowing everyone to be on the same mgmt network. Any system can say what its mac address is and so could take over for someone else (or depending on the 'random' created mac address do so by accident). So it was decided that ipv6 ranges would be given out by OSAS team for clients to use so that they didn't have to worry about this. I got a set of ips from jbrooks to use

download-cc-rdu01 IN    AAAA  2620:52:3:1:dead:beef:cafe:fed1
pagure01        IN    AAAA  2620:52:3:1:dead:beef:cafe:fed5
pagure02        IN    AAAA  2620:52:3:1:dead:beef:cafe:fed8
pagure-stg01    IN    AAAA  2620:52:3:1:dead:beef:cafe:fed3
proxy03         IN    AAAA  2620:52:3:1:dead:beef:cafe:fed6
proxy14         IN    AAAA  2620:52:3:1:dead:beef:cafe:fed7

At some point that got forgotten and a different method was being used. I don't think our ips being used will be some systems mac address but I am not sure. If we are using the self-assigned ip addresses then we need to redo the Fedora ones.

Does what I say sound familiar?

misc commented 3 years ago

yeah, looking at the internal list, I see that's not how we gave IP v6 to GNOME, so osci.io docs (derived from existing practice) are maybe misleading.

I guess it might be time to call @jasonbrooks on this ticket to he can assign something like he did for Fedora.

duck commented 3 years ago

For the OSCI, and tenants we manage, we are using the automatically generated suffix. It is not anymore a simple MAC-based value, but use a method defined in RFC7217. Since I'm not going to reimplement how it works for our process the doc simply says to use the suffix that comes automatically on the link-local suffix. This method ensure we have no conflict whatever the MAC address is.

So yes at some later point it was discussed about assigning ranges but we already had many systems deployed with IPv6 and I never changed our process. Uniformizing would be best but that means work and disruption of service. I guess changing on our side could be associated with a reboot campaign. We would need to adapt our Ansible rules to setup the right IP in the network_setup role too.

praiskup commented 3 years ago

but there is no NAT for IP v4 on that vlan, and that's a requirement it seems ?

We need the IPv4 for ephemeral VMs started on the vmhost... Initially I thought that
libvirt would do the NAT translation there for the VMs, and that we'd be able to assign
public ipv6 to those VMs somehow using some script (I don't know if there's some
mechanism in libvirt's dhcp6 actually, or if we could use dhcp6 from the network
and dhcp4 from host).

Of course if we had NAT on the network, it would be much easier because the
VMs in libvirt would use bridged networking and we wouldn't have to take "manual"
care of networking at all.

Sorry, we don't have a concrete plan. So depending on what is possible, we'll have
to adapt.

duck commented 3 years ago

To get NAT we would need to move the host to a private VLAN. that's possible but I do not have the full picture to see if that would be appropriate.

Misc?

kevin commented 3 years ago

Do you even need ipv4 addresses? Can the vm's just be ipv6 only?

I would think that would work... (but I haven't tried it)

praiskup commented 3 years ago

People are used to build their packages against custom external repositories that are usually ipv4 only :-/

kevin commented 3 years ago

Hum, yeah. :(

How about 2 interfaces on the vm's... one ipv4 only and on a nat libvirt network using 192.168 or whatever, and the second one ipv6 only using a bridge... ?

praiskup commented 3 years ago

That could work, good idea. We wouldn't have to care about manual IP assignment at all.

misc commented 3 years ago

It that's a private vlan, you can set your own dhcp.

Another solution is to get a nat64 gateway (tayga is packaged).

praiskup commented 3 years ago

I thought there's already some DHCP server that is able to do this for us.
We didn't expect the lab is so limited so it needs to be rethought next
time :(

I think we'll prefer to configure libvirt for doing IPv4 NAT and somehow
assign IPv6. I hope it will be possible. Should be easier than
dedicating one VM for doing DHCP (or OpenVPN as proposed previously).
But thanks for the idea, we'll fall-back there when needed.

So is there some IPv6 range that we could start experimenting with?

praiskup commented 3 years ago

So is there some IPv6 range that we could start experimenting with?

I mean - I think we need to get ipv6 "prefix" assigned?

kevin commented 3 years ago

@misc @smooge @jasonbrooks Can one of you allocate a /56 or whatever for this?

smooge commented 3 years ago

I can not.

misc commented 3 years ago

We are still discussing how we want to assign them.

misc commented 3 years ago

Ok so since copr would be in the fedora range (from the network point of view), and Fedora is already using 2620:52:3:1:dead:beef:cafe/112, I think it would be simpler to just use that range for copr as well (if I am not wrong, that's a 65000 IP range, so there is surely enough for 40 there).

I am still trying to figure a scheme to split the range that wouldn't create headaches for others, but at least, this ticket shouldn't be blocked on that.

smooge commented 3 years ago

I have reassigned various public ips which would have been used for cloud.

copr-builder-01          IN     AAAA    2620:52:3:1:dead:beef:cafe:c101
copr-builder-01             IN     A     8.43.85.40
copr-builder-02          IN     AAAA    2620:52:3:1:dead:beef:cafe:c102
copr-builder-02             IN     A     8.43.85.41
copr-builder-03          IN     AAAA    2620:52:3:1:dead:beef:cafe:c103
copr-builder-03             IN     A     8.43.85.42
copr-builder-04          IN     AAAA    2620:52:3:1:dead:beef:cafe:c104
copr-builder-04             IN     A     8.43.85.43
copr-builder-05          IN     AAAA    2620:52:3:1:dead:beef:cafe:c105
copr-builder-05             IN     A     8.43.85.44
copr-builder-06          IN     AAAA    2620:52:3:1:dead:beef:cafe:c106
copr-builder-06             IN     A     8.43.85.45
copr-builder-07          IN     AAAA    2620:52:3:1:dead:beef:cafe:c107
copr-builder-07             IN     A     8.43.85.46
copr-builder-08          IN     AAAA    2620:52:3:1:dead:beef:cafe:c108
copr-builder-08             IN     A     8.43.85.47
copr-builder-09          IN     AAAA    2620:52:3:1:dead:beef:cafe:c109
copr-builder-09             IN     A     8.43.85.48
copr-builder-10          IN     AAAA    2620:52:3:1:dead:beef:cafe:c110
copr-builder-10             IN     A     8.43.85.51
vmhost-x86-copr01        IN     A     8.43.85.57
vmhost-x86-copr01        IN     AAAA  2620:52:3:1:dead:beef:cafe:c001
vmhost-x86-copr02        IN     A     8.43.85.58
vmhost-x86-copr02        IN     AAAA  2620:52:3:1:dead:beef:cafe:c002
vmhost-x86-copr03        IN     A     8.43.85.59
vmhost-x86-copr03        IN     AAAA  2620:52:3:1:dead:beef:cafe:c003
vmhost-x86-copr04        IN     A     8.43.85.60
vmhost-x86-copr04        IN     AAAA  2620:52:3:1:dead:beef:cafe:c004

For the other 30 64 ips you can use :c111 to c140. The vmhost-x86-copr boxes should be ready for you to use also.

Metadata Update from @smooge:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

praiskup commented 3 years ago

Nice, thank you! Btw., do we have to use the copr-builder-XX hostnames?
The VMs for copr builders are will be removed and started from scratch all
the time... (and 10 pcs wouldn't be enough). Can we drop those records
ourselves?

smooge commented 3 years ago

These are just the names I put in the rdu-cc.fedoraproject.org dns file in Fedora Infrastructure DNS files. Change those to what you need them to be but I only have 10 ipv4 I can give you..

smooge commented 3 years ago

Those ips and such will change in about 4 or 5 months because red hat will be moving the ipv4 and ipv6

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

Waiting on Assignee

Boards 1

ops Status: Done

fedora-infrastructure

Source Code

#9569 please allocate 40 IP addresses to copr hypervisor Closed: Fixed 3 years ago by smooge. Opened 3 years ago by praiskup.

Metadata

ops medium-trouble medium-gain

Boards 1

#9569 please allocate 40 IP addresses to copr hypervisor

Closed: Fixed 3 years ago by smooge. Opened 3 years ago by praiskup.