#11715 Move from iptables to nftables (was firewalld)
Opened a year ago by kevin. Modified a month ago

This has been on my backlog a long while and @phsmoura wanted to work on it.

This will be somewhat long, so lets go!

Background: Currently we use iptables on all our hosts. We do this because it's simple to manage for the most part and it was well supported accross fedora and rhel.
However, iptables is really disappearing now. It's been replaced with nftables, but with somewhat of a compatibility mode allowing legacy iptables to keep working for now.
Additionally, firewalld provides us with some advantages over iptables: ability to not care about what the backend is, ability to specify zones, better dynamic replacement, etc.

The current setup works via templates in roles/base/templates/iptables/ a few groups of things have different templates, but there's not too many. Additionally in these templates we fill in some values from hosts and group vars. Finally, we also insert some blocking rules from the ansible-private repo for external hosts.

There is an ansible firewalld module ( https://docs.ansible.com/ansible/latest/collections/ansible/posix/firewalld_module.html ) but it might be too heavy for us.
We may need to look at how we can make custom firewalld config and keep using a template system.
This will require some investigation. ;)

Some requirements:

  • production hosts should block all access from staging hosts, except those in the 'staging friendly' ansible group
  • hosts/groups need to be able to specify things that are not in the base setup.
  • we need some way to inject the rules from the ansible-private script. I can provide more info about that out of band.
  • There's very likely some old cruft in current iptables we can drop
  • We will need to change our kickstarts to keep firewalld installed and active when we do new installs after this lands.
  • we definitely want to be careful and land this in staging first and make sure everything is working, or do things slowly and check each.

There may be more I am not thinking of. :)


Metadata Update from @zlopez:
- Issue assigned to james (was: phsmoura)

6 months ago

@phsmoura are you and @james working together on this one?

@smilner yes, thats the plan now that Im back from PTO

Both Pedro and I have looked at this, and while it seems possible to migrate to firewalld it seems like it'll be much easier to migrate to nftables. Upstream nftables have a migration document:

https://wiki.nftables.org/wiki-nftables/index.php/Moving_from_iptables_to_nftables

...which seems to work fine. I also asked internally on the firewall slack channel and they didn't have a problem moving to nftables. So unless anyone objects the rough plan at the moment is:

  • Duplicate/migrate roles/base/templates/iptables to roles/base/templates/nftables
  • Have a boolean in ansible which controls which of iptables/nftables to use (Eg. in roles/base/tasks/main.yml).
  • Migrate the internal private ansible data.
  • Start turning the boolean on for staging hosts.
  • Start turning the boolean on for production hosts.

Comments/objects/etc. welcome.

This sounds reasonable to me!

Yeah, sounds like a reasonable plan indeed. Thanks for investigating things!

Update...

tl;dr Everything is progressing, but slower than hoped. Will hopefully sync up again with Pedro next week (he's mostly been dealing with Fedora planet issues).

First bullet point is mostly done, and the converter doesn't like ansible templates :).
However there are a few iptables things outside of that tree, more than I previously thought.

Ansible things required a bunch of (re)learning, esp. about tags/facts/booleans/etc.
Learning/dealing with ansible public/private bits, and privileges to go with it.
On the upside, might be able to help other issues/people in the future and/or write some docs.

Also looked at what we can do to test things are good (mostly a bunch of nmap documentation and testing it).

Great. Thanks for the update!

If you all want to schedule a meeting (matrix or video) I can explain how the non template parts work currently and you can see if you can come up with a betetr way to do it. ;)

Update early (because not at work tomorrow)...

First bullet point should be done and ready for first review next week. Copy/migrated custom_rules to nft_custom_rules, which should be fine. Maybe do some cleanup after the migration, but not before so it'll be easier to review/see how things migrated and that they should do the same thing.

Migrated ipset to nftable filters/sets ... I think will work, but def. the most review needed of everything.

Okay, I pushed a commit which does just the nft bits to my branch (no PR, for obvious reasons):

https://pagure.io/fork/james/fedora-infra/ansible/commits/nftables?identifier=nftables

Current diff:

https://pagure.io/fork/james/fedora-infra/ansible/c/19072eb8f0aaa5e0de6935ff661fb6d63444b5ac?branch=nftables

...probably need to look at the current iptables in another window, but I tried to just add comments so it's easy to look from one to the other.

The osbuildapi bits are worth a 2nd/3rd look, because that works very differently in nftables to iptables. I also moved the flush and the first loop to the end of the script, to reduce the reload window, but it should still be easy to compare.

From a shot look this looks pretty reasonable.

We need to sort out how to add the blocklist stuff for external hosts (thats currently a script in ansible-private. I can get you a copy or just explain how it works out of band).

Once thats sorted, I guess next step might be to start with a few staging hosts and look at them before / after?
Also, I suppose we need some way to 'convert' a host. If thats a manual playbook, or just have iptables get removed and setup things and a reboot switches it I am not sure.

Phenomenal bit of work here @james!

Ok, so the next bit is converting https://github.com/mkorthof/ipset-country

And I have "a port", not fully tested but pretty close and need some feedback.

And, to get that feedback we'll need to explain a big difference:

In iptables/ipset there's a single scope, so you can create a set of ips and access that set from any rule in any chain you want.

In nftables you create a table scope, and create a set within that scope ... and/or create a chain and a rule within that chain. But nothing in table scope X can access things in table scope Y.

...so atm. for the port I've created a table ipcountry, which contains the sets and has an INPUT chain, with a higher prioritzation than the default ip filter chain, which matches IPs to the ip sets and then reject/drop those packets if they are in it ... if there's no match then that'll be accepted and move to the normal filter chain.

The advantage of doing it this way is that everything is kept within the ipcoutry table scope. The downside is that this is a separate chain, so if we want to allow existing connections then we need a "rule ip filter INPUT ct state related,established counter accept" in the ipcountry chain. But the same is true of any other allow/deny shortcuts.
Also atm. the chain runs before the normal chain, it's a simple change to run it after but it's still the same problem AIUI.

I think this is fine, but before posting patches for review I thought I'd make sure.

I think this approach is just fine.

We really only want/need those in a INPUT context.

Okay, pushed the ipset-country port to my fork: https://github.com/james-antill/ipset-country

Tested it on Fedora 40 and el9 (ipv4 only).

Should also fall back to iptables if "nft" command isn't available (so we should be able to deploy the cmd everywhere imediately).

Also updated the Ansible PR for syntax things on osbuild set stuff.

We can probably try deploying this in stg now (please review). Possible things we might want to do:

Have some clever way to switch machines back and forth between iptables/nftables:
pro: Easier to deploy
con: A bunch of ansible playbook changes, and will be useless within seconds of the final machine using nftables.

Look at the old stuff in the ansible templates and remove some "temp" things from 10 years ago.

Cleanup the ansible templates to use include file?

Awesome! I noticed COUNTRY is set twice in these lines. Is the second line meant to override the first, is there some concating happening, or is it a typo?

Just a simple override, the first line is the upstream default.

Okay, pushed the ipset-country port to my fork: https://github.com/james-antill/ipset-country

Tested it on Fedora 40 and el9 (ipv4 only).

Cool. Looks pretty reasonable from a quick glance.

Should also fall back to iptables if "nft" command isn't available (so we should be able to deploy the cmd everywhere imediately).

Seems that lots of our hosts have nft installed. Should we mass uninstall it and then just re-add to staging or whatever we want to test?

Also updated the Ansible PR for syntax things on osbuild set stuff.

I can't seem to find the PR again... link?

We can probably try deploying this in stg now (please review). Possible things we might want to do:

Have some clever way to switch machines back and forth between iptables/nftables:
pro: Easier to deploy
con: A bunch of ansible playbook changes, and will be useless within seconds of the final machine using nftables.

Yeah, as long as we can roll back I don't think we need any super switcher.

Look at the old stuff in the ansible templates and remove some "temp" things from 10 years ago.

Yeah, we should do that. :)
Give me a list or note them and we can look...

Cleanup the ansible templates to use include file?

yeah, could make it more clear I guess...

Ahh, I never opened a PR for the ansible bits because I knew we shouldn't merge it. It's in my fork and I linked to it in a comment...

https://pagure.io/fedora-infrastructure/issue/11715#comment-936866

...I'll update rebase it Thursday against current, because the lint stuff probably broke it.

It's up to you if you want to remove nft from the machines not using it, or want to not install the new ipset-country on them.

Well, currently we just install the one on all of them... I suppose we could install just the new one on staging?

At this point I might say lets wait until jan...

Log in to comment on this ticket.

Metadata