#9852 DNS Mini-initiative
Closed: Fixed 7 months ago by kevin. Opened 2 years ago by smooge.

Describe what you would like us to do:

The DNS system needs some restructuring due to limitations being found in the current versions used.

  1. The version of BIND in EL7 and EL8 does not allow a large number of TCP connections however Fedora's version of various tools try TCP by default and UDP only on request. This means that dig and other commands are not working. It is recommended to move the DNS servers to Fedora 33 and update as it goes.
  2. The default timeouts for the main fedoraproject.org zone is short which means that those systems are wanting more TCP connections per time. The various DNS maintainers say we should look at moving $TTL to bigger and put in short times for the records we expect to be short lived (wildcard and NS records).
  3. The DNS git repo is 4.4 GB in size for 60 MB of files. We need to look at cleaning up the history as most of those 4.4 GB are just dns signed records we do not really need to keep history of?
    [external recommendation.. set up a hidden master DNS in IAD2 with the keys and zonetransfer to the dns servers we need these to go to.]

When do you need this to be done by? (YYYY/MM/DD)

To be clear, TCP will get less of an issue once https://pagure.io/fedora-infrastructure/issue/9422 is done. The new algorithm has even smaller signatures than the old one.

The problem is getting there because the transition state has two signatures, making replies much bigger and requiring TCP.

As for TTLs, https://00f.net/2019/11/03/stop-using-low-dns-ttls/ sums up the problem. Obviously it's always a compromise.

Maybe even some of the potentially dynamic records can do with longer TTLs. E.g. NS records you mentioned are used exclusively by DNS resolvers, and resolvers are well equipped to handle non-responsive servers and do fallback as needed.

This is not to say stale data should be kept indefinitely, I'm just trying to find a middle ground. I would have to know more about your load balancing/proxying to provide a more useful recommendation. Ping me if you are interested.

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-gain, medium-trouble, mini-initiative, ops

2 years ago

It seems pointless to me to keep signed zone files in history. Ensure unsigned zone data are stored with all keys used, but not always changing signatures itself. ldns-read-zone -s can be used to drop DNSSEC data from the zone. Inline signing might be used on the server to create signatures, just ensure it has enough of random entropy. Or save a deploying script, which would create signed zone from unsigned and keys.

The current infrastructure for making DNS work is the following:

an admin makes changes to the files in the DNS repository. they then run commands like the following function:

dnscommit ()
    local args=$1;
    cd ~/dns;
    git commit -a -m "${args}";
    git pull --rebase && ./do-domains && git add built && git commit -a -m "Signed DNS" && git push

The do-domains script does some checking and then runs the jinja2 templates on the various files and then does all signatures (though not correctly with 2 sets of keys). Private keys are stored on batcave in a limited use location.

Once named checkzone says its ok, the commits happen and there are git triggers which do some work.

Each of the Fedora dns servers usually do a regular checkout of the git repository on batcave to get an updated tree. If there was a git update then named reloads to get the changes. A force push can be done from batcave which basically forces the git update on each client. Changes to this will need either a parallel tooling or refactoring of the current system.

Metadata Update from @mizdebsk:
- Issue tagged with: dns

a year ago

ok, so we have the new signatures now.

Do we still want to move away from el8 bind?

We do need to clean up the git repo.

If moving from el8 bind, then move to what instead? Fedora's bind? Something else?

As a side note, moving away from the current 2 commit system would close a potential attack I noticed today.
If someone make a change, commit (not push), run do-domains, and then take the signed DNS file and never push them, that person can use it to do a DNS MITM attack.

Now, MITM is rare and not trivial, DNSsec is not really verified by default (afaik), HTTPS already protect most web applications, and only trusted people have access to the DNS repo, so I do not think there is a big risk.

However, ssh rely on DNS for verification (https://weberblog.net/sshfp-authenticate-ssh-fingerprints-via-dnssec/), and afaik, lets encrypt do verification when there is a request, and avoid MITM is kinda why DNSsec is here, so the current workflow is not optimal.

Wanted to stoke a flame back to this and see where this landed and if their was an outcome to this or what?

So, I think we can close this now.

We have moved at least one of the nameservers to rhel9... and others will move soon.

We need low TTL because we use that to add/remove proxies from our round robin dns setup.

It's anoying to clean out the git history from time to time, but it's handy to distribute things with git. If someone would like to design a new setup for dns, they can make a proof of concept and open a new ticket to sell it to us. ;)

If I missed anything here, please feel free to reopen.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

7 months ago

Login to comment on this ticket.

Boards 2
ops Status: Backlog
mini-initative Status: Backlog