#2476 systemd-resolved by default should be deferred to a future release
Closed: Rejected 3 years ago by churchyard. Opened 3 years ago by simo.

As the recent heated discussion on fedora-devel showed there is still a long way to go for systemd-resolved to be a compliant resolver tha can be used by default by applications on a Linux system.

Multiple reports of issues have surfaced, on desktop and even more importantly in servers.

The switch to systemd-resolved breaks important uses cases for servers by not returning standard compliant answer to clients using the DNS protocol.

Compliance with standards is a very basic requirement for any default system-wide resolver to have, systemd-resolved does not meet this requirement yet, so it should not be forced as the default resolver on Fedora until this basic requirement is met.


As the person who started this discussion on the fedora list, I agree with Simo.

The current method actively breaks servers.
The current method breaks RFC standards.
The current method decreases DNS security.
The current method has unknown and unexpected side effects, due to returning non-DNS entries such as /etc/hosts or LLMNR via port 53

The software and its authors seem unaware of the standardization efforts happening right now related to these type of DNS problems (and upcoming problems related to DNS-over-TLS and DNS-over-HTTPS) at the IETF across multiple vendors. It would be better to synchronize these efforts as interoperability and compatibility are key here.

The discussion on fedora-devel is heated, but it's not uncovering anything particularly new. If anything, it's showing that we need better docs and better explanations. Some specific use cases are not covered, in particular the case of "dnssec aware server". But I think it's totally fair to assume that this case is relevant to 0.01% of users out there (after all, you have to run the latest Fedora, on a server, in an environment where DNSSEC actually works). The issue can be worked-around by symlinking /etc/resolv.conf differently. This is a simple work-around and good enough until https://bugzilla.redhat.com/show_bug.cgi?id=1879028 is resolved.

The current method actively breaks servers.
The current method breaks RFC standards.

It "breaks" some niche cases, while fixing many other things. Workarounds exists for the negatively impacted cases. The RFCs disagree in some areas, or simply don't apply to the mixed case of local and remote resolution, and resolved in general is trying to do stuff that makes sense.

The current method has unknown and unexpected side effects, due to returning non-DNS entries such as /etc/hosts or LLMNR via port 53

It's not trying to implement a generic DNS server. It only provides a stub listener on localhost, i.e. only accessible to local clients, so that programs which don't use nss get similar resolution. This is probably better than the alternative of having inconsistent resolution in different programs. But even that can be turned off (either by disabling the stub listener, or by making /etc/resolv.conf point to remote servers, or by disabling LLMNR and mDNS in resolved).

Metadata Update from @ngompa:
- Issue tagged with: meeting

3 years ago

Multiple reports of issues have surfaced, on desktop

I haven't seen any strong arguments against using systemd-resolved on desktop. The status quo in F32 is really extremely broken in common cases, as discussed at length on the mailing list, and it would be very disappointing to go backwards here.

and even more importantly in servers.

The case against resolved on servers is stronger. The problem here seems to be that systemd-resolved is really totally broken with respect to DNSSEC (or, if we want to be charitable about it, certainly not working as standards require and certain applications expect). That doesn't necessarily mean that systemd-resolved is an inappropriate default for servers, but server administrators that need working DNSSEC are going to need to manually configure /etc/resolv.conf. I kinda think that's not a very big ask, though, especially since we are talking about server administrators who have to configure a lot of things, and since this is likely only relevant to a small minority of applications.

That said, systemd-resolved doesn't really solve any major problems on servers like it does on desktops, since servers don't need split DNS and usually have a simple static configuration. systemd-resolved does introduce shared systemwide DNS cache, which is nice. But I care much less about what we do for servers than for desktop, where the situation is really very seriously broken without systemd-resolved.

The current method actively breaks servers.

But this probably only affects a small minority of servers, yes? And it is very easy to avoid by simply editing /etc/resolv.conf to your liking.

The current method breaks RFC standards.

Other than DNSSEC (acknowledged above) and returning /etc/hosts, LLMNR, and mDNS (which is desirable)? What else?

The current method decreases DNS security.

Does it really? How so? If this is about DNSSEC, then I do not care, because it seems DNSSEC is only useful for DNS servers and certain other special cases. It has no impact on 99.9% of users. Right?

In fact, it seems clear that the opposite is true. I'm currently working on a change proposal to enable systemwide DNS over TLS in Fedora 34. Although this will only provide opportunistic encryption that will not protect against active attackers, and relies on support from the configured DNS server, it will at least be a nice privacy improvement to protect against passive network attackers. We cannot do this with nss-dns because the glibc maintainers understandably do not want nss-mdns linking to a TLS library. So this change helps us improve real-world DNS security.

In contrast, I'm not sure what benefit DNSSEC is really providing to clients, since it looks like we'll never be able to enforce it any decade soon. I would go so far as to say that DNSSEC for clients is not real-world security, it's a total failure. Now, I have no doubt that it's important for DNS servers, but DNS servers will never run systemd-resolved, so that is not a relevant consideration for Fedora. I wonder if you're thinking about this from the point of view of a DNS server operator? That seems to be where your complaints are coming from, right? But it really doesn't make any sense. Use a real DNS server like BIND or unbound for that purpose.

The current method has unknown and unexpected side effects, due to returning non-DNS entries such as /etc/hosts or LLMNR via port 53

Who cares? This is designed to make things work. Why wouldn't you want this?

The current method actively breaks servers.

But this probably only affects a small minority of servers, yes? And it is very easy to avoid by simply editing /etc/resolv.conf to your liking.

The problem here is that the upgrade breaks server applications silently.
Because it replaces a working DNS configuration with a crippled (for some used cases, not in general) one.
The outcome is that a cursory look will seem ok, only to later find out from individual application logs that things are not working as expected.
And unless you know why it may take quite a while to figure out.

The current method breaks RFC standards.

Other than DNSSEC (acknowledged above) and returning /etc/hosts, LLMNR, and mDNS (which is desirable)?

It is potentially desirable for a desktop, but not for a server in general.
Is it possible to have systemd-resolved use a configuration generator that configures diffrenly, by default, based on Fedora flavor (Workstation vs Server) ?

What else?

One other thing pointed our was resolving tld. names, there is indeed a config option for this one, bu again you need to know you have to trigger it, or spend time debugging an application to figure out why it is not working when it previously was.

There may be more, it is unclear, as testing on server side has not been a lot evidently.

The current method decreases DNS security.

Does it really?
[long rant about whether DNSSEC or not is good elided]
Yes it does really, there are various standards that influence for example mail servers or other records that are not just used by DNS servers themselves but applications, that have no other equivalent. DoT and DoH are nice things for clients, but do squat for some DNS attacks.

Who cares? This is designed to make things work. Why wouldn't you want this?

People care to have consistent data, and not random stuff from the local network that can be injected with zero oversight, so, for servers at least, by default, port 53 should just be a cache resolver for DNS only and not an aggregator of random sources.

Note that I really want a caching resolver locally both on workstations and servers, and do not care to choose an implementation, as long as that implementation can work properly for the different use cases, and Workstation vs Server use cases are quite different here.

I would be even ok if you transitioned workstations to use system-resolved, but delayed servers until system-resolved gets fixed for the use cases that are important to servers.

It will mean different defaults between wks and srv, but that should be ok because both configurations must be fully supported as working in Fedora and actually gives us more assurance that we won't regress until the caching resolver has all kinks sorted out.

The problem here is that the upgrade breaks server applications silently.
Because it replaces a working DNS configuration with a crippled (for some used cases, not in general) one.
The outcome is that a cursory look will seem ok, only to later find out from individual application logs that things are not working as expected.
And unless you know why it may take quite a while to figure out.

Hm, that is true. Maybe it would be better to not upgrade Fedora Server to systemd-resolved, and use it only for new installs, to avoid potential breakage. (We still want to upgrade desktop variants, though, so this would only make sense for Server.)

On the other hand, we've already had enough bugs with the upgrade scripts that introducing more complexity one week before final freeze doesn't sound very fun....

It is potentially desirable for a desktop, but not for a server in general.
Is it possible to have systemd-resolved use a configuration generator that configures diffrenly, by default, based on Fedora flavor (Workstation vs Server) ?

Not for F33 timeframe, at least not nicely. Currently the only way to change this behavior is to (a) change systemd build flags, but we share the same build between all Fedora variants, or (b) edit /etc/systemd/resolved.conf to contain non-default values to disable LLMNR and mDNS. So we would need an RPM scriptlet to edit that config file only for Fedora Server. Possible, but not nice.

Is any of this likely to cause any practical issues? I don't think so? If you have hosts in /etc/hosts, they're there to be used, right? And LLMNR and mDNS are hardly likely to affect a server environment. We wound up disabling mDNS resolution, so in the end only /etc/hosts and LLMNR will be used (although mDNS will still be resolved by nss-mdns before systemd-resolved sees anything). I don't think there's any config to disable use of /etc/hosts. There's another config option in /etc/systemd/resolved.conf to disable reading /etc/hosts. So it really is configurable however we want, it's just a question of defaults....

One other thing pointed our was resolving tld. names, there is indeed a config option for this one, bu again you need to know you have to trigger it, or spend time debugging an application to figure out why it is not working when it previously was.

There may be more, it is unclear, as testing on server side has not been a lot evidently.

Also true, I understand this will break many Kubernetes setups until you change the config.

People care to have consistent data, and not random stuff from the local network that can be injected with zero oversight, so, for servers at least, by default, port 53 should just be a cache resolver for DNS only and not an aggregator of random sources.

I'm OK with using different defaults for servers to disable LLMNR. It would either mean editing the config file with a scriptlet, or changing the build flag to disable LLMNR support for Workstation users too. I don't think we have a way to disable the /etc/hosts functionality, though. Ditto for /etc/hosts.

I'm also OK with disabling systemd-resolved entirely on Fedora Server. I'm not convinced we really need to, but I'll admit the disadvantages on Server seem to outweigh the only advantage (caching).

I'm also OK with disabling systemd-resolved entirely on Fedora Server. I'm not convinced we really need to, but I'll admit the disadvantages on Server seem to outweigh the only advantage (caching).

I would take this option, let it brew and pull that in for F34 with the notable bugs fixed, it will be a better experience for users of the server edition.

I am not convinced that we need to disable this for Fedora Server. Furthermore, I am unsure that such a split configuration is even reasonably workable. Unlike a lot of things, DNS has never really received any attention to modernize how it is configured on Linux system. Consequently, it still operates more or less the same as it did in the 80s for configuration and lots of things rely on those assumptions in broken ways.

I would like @zbyszek or another systemd developer to quantify the effort it would take to resolve the issues identified with resolved, because it's going to be a pain to maintain a split setup and I don't think it's worth it, based on the discussion so far.

I would also note that Ubuntu has been using this since Ubuntu 16.10 for cloud, server, and desktop use-cases. Clearly, it actually manages to work quite well in practice for them, or they would have reverted it in Ubuntu 18.04 or Ubuntu 20.04. Since they haven't, I can only assume that it works much better than what people say on the list right now.

Servers do not care about DNS caching. If they did, they would have kept running nscd for the last 20 years. Servers do not have a need for different DNS streams on different interfaces. That is, systemd-resolved brings no value to servers.

The current default configuration breaks servers that deploy DNSSEC validation.

Either don't enable systemd-resolved on servers, or enable it in a way that does not break servers (eg configured to not drop DNS queries with the DO bit set)

All other issues, such a breaking desktop DNS and VPN client configurations are things can be be discussed and potentially addressed or not, after the fedora 33 release.

Servers do not care about DNS caching. If they did, they would have kept running nscd for the last 20 years. Servers do not have a need for different DNS streams on different interfaces. That is, systemd-resolved brings no value to servers.

Maybe not to your servers. They're useful for mine since I serve and access different things on different networks with different names.

Servers do not care about DNS caching. If they did, they would have kept running nscd for the last 20 years. Servers do not have a need for different DNS streams on different interfaces. That is, systemd-resolved brings no value to servers.

Maybe not to your servers. They're useful for mine since I serve and access different things on different networks with different names.

We definitely cannot make absolute statements about all configurations, but it is true that servers in general are on stable connection with simpler needs than desktop in this area.

That said, if up to Fedora 32 they worked fine, delaying this change for servers by one release while the worst issues affecting them are handled should not be a problem.

OTOH forcing the switch causes problem, that to me is a regression, and I am not too happy to make bold changes when there is little need and known regressions.

https://meetbot.fedoraproject.org/fedora-meeting-2/2020-09-30/fesco.2020-09-30-14.00.html

  • AGREED: Add #1879028 as FE, close 2476. (+5, 1, -0) (mhroncok,
    14:49:09)
  • ACTION: zbyszek to mark 1879028 as a fesco accepted FE (mhroncok,
    14:49:29)
  • ACTION: zbyszek to write a Fedora Magazine article (mhroncok,
    14:49:50)
  • ACTION: zbyszek to make sure this gets into common bugs if it is not
    fixed (mhroncok, 14:50:11)

Metadata Update from @churchyard:
- Issue close_status updated to: Rejected
- Issue status updated to: Closed (was: Open)

3 years ago

For the record, I am deeply disappointing that FESCO has agreed with ramming this through despite the security implications. And I am further disappointed that this is continuing in F34 with adding more systemd-resolved features instead of insisting that the fallout from this raised issue is properly addressed first before adding even more new DNS related features.

We are not talking about some optional package issue here. DNS is a core standards compliant protocol for any OS that is currently being violated resulting in security issues with no clear plans to be addressed even for the next fedora release. In fact, the only focus for f34 seems to be adding more features and ignoring this raised issue. FESCO should insist that the f33 raised issues in this item are addressed before allowing more features to be pushed into F34.

This issue is affecting the viability for me to use fedora on servers.

For the record, openssh support for SSHFP broke on fedora 33 as well.

https://bugzilla.redhat.com/show_bug.cgi?id=1886343

"On a side note, we have a test that worked fine with Fedora 32 and fails with Fedora 33 for some reason, but looks like related to local dns resolver rather than openssh"

Related: https://github.com/systemd/systemd/issues/12317

"Currently, systemd's resolved is completely blocking my DNSSEC deployment in a corporate network with our own domain. The only option I had is to install an alternative, e.g. dnsmasq."

Metadata Update from @zbyszek:
- Issue untagged with: meeting

3 years ago

Login to comment on this ticket.

Metadata