#2088 F31 Change – “dnf --best” as default behavior
Closed: Rejected 2 months ago by churchyard. Opened 4 months ago by jmracek.

I would like to ask for an exception to the Changes policy for F30 Change proposal: https://fedoraproject.org/wiki/Changes/DNF_Default_Best.

The change should not only improve security, but also will lead to much easily discovering issues in release. The original issue arise from Modularity team as an important key for modular integrity.


I asked Jaroslav to take this directly to FESCo since we're so far into the schedule that going through the normal process would mean it doesn't even get to FESCo until after the code complete (testable) deadline.

I have concerns about changing the behavior of a key system component this late in the process, even though it seems to be a pretty minor change. I've also asked the QA team to weigh in on this ticket.

Reading this change proposal I'm +1 conditionally on providing a clear Contingency mechanism and Contingency deadline. Aka let's put it in F30 now but have a documented process of revoking it.

This being late also means that there was no discussion or announcement on devel. We should make sure that happens so we can revert this if we see some valid concerns there.

@adamwill What are your thoughts on this?

+1 to the change. However, the option name --nobest sounds to me badly chosen. The yum --skip-broken was a lot more descriptive.

See mail to test@ list. I agree with @till , I think I'm fine with the behaviour, but would prefer --skip-broken both as it's a more descriptive name and it's the name people know from yum. (Or we could have both names, of course).

Questions/thoughts I have here:

  • Are we risking a situation where in an attempt to make things more secure, users simply don't upgrade at all and miss all security updates? dnf dependency errors can be essentially impossible for a non-expert user to figure out, and can be very specific to a particular system. Resolving this type of error may also require switching from the GNOME Software UI to the command line.

  • Are we expecting the breakages that are uncovered here to mostly be our breakages - problems in the Fedora repositories - or problems with 3rd party repositories and locally installed packages? Breakages that are our own fault will hopefully be reported and fixed quickly, but we don't have much control over the other types of breakage.

  • How is this going to interact with PackageKit and it's usage of libdnf? Do we need to change something there so we have consistent behavior? It would be confusing if we were in a situation where PackageKit does one thing and 'dnf' another.

  • We should check carefully that the options being used for the test transaction and the actual after-reboot transaction are the same when doing 'dnf system-upgrade' and upgrading via the UI in GNOME Software.

IIRC yum used to print an informational message about what was going on, and mention --skip-broken, when there were dependency issues. Perhaps dnf could do the same (if it doesn't already).

From the Change page:

The purpose of the --nobest switch (as a shorthand for --setopt=best=0) is to make it easy for the user to override the default setting when needed, and it will also be suggested in the DNF output when a dependency error occurs.

So dnf doesn't do this yet (at least in the version in F30), but it will.

How is this going to interact with PackageKit and it's usage of libdnf? Do we need to change something there so we have consistent behavior? It would be confusing if we were in a situation where PackageKit does one thing and 'dnf' another.

From my quick reading of the diff, this change changes the dnf frontend to use --best as default and doesn't change the libdnf solver default behaviour. This means that it doesn't change how the solver behaves for PackageKit (and gnome-software that uses PackageKit), leaving it using the old behaviour there.

+1 to the change. However, the option name --nobest sounds to me badly chosen. The yum --skip-broken was a lot more descriptive.

The option --skip-broken has different behavior from --nobest. If you use --skip-broken, the packages with broken dependencies are removed from the transaction. If you use --nobest, you don't skip any packages, but some of them are installed in a lower version (the transaction is not limited to the best candidates only).

To be honest, being rawhide user, I would appreciate if best will stay 0, because otherwise I won't ever upgrade my system (because some packages always have some broken deps) ;)

See mail to test@ list. I agree with @till , I think I'm fine with the behaviour, but would prefer --skip-broken both as it's a more descriptive name and it's the name people know from yum. (Or we could have both names, of course).

We tried to avoid to use --skip-broken to set best option to False, because the option is already used to set strict configuration to False.

To be honest, being rawhide user, I would appreciate if best will stay 0, because otherwise I won't ever upgrade my system (because some packages always have some broken deps) ;)

I think that rawhide is getting more stable therefore the impact should be lower that used to be. Anyway anyone can open /etc/dnf/dnf.conf and set best=true to best=false. Then the old behavior will be 100% restored.

Metadata Update from @psabata:
- Issue tagged with: meeting

4 months ago

I prefer to describe behavior using libsolv testcase (because you can easily play with it).

repo system 0 empty
repo rawhide 0 testtags <inline>
#>=Pkg: puppet 5 1 noarch
repo custom 0 testtags <inline>
#>=Pkg: app 1 1 noarch
#>=Req: puppet
#>=Pkg: app 2 1 noarch
#>=Req: puppet >= 6
#>=Pkg: puppet 6 1 noarch
#>=Req: something-non-existing-yet

system x86_64 rpm system

poolflags implicitobsoleteusescolors
solverflags allowvendorchange keepexplicitobsoletes bestobeypolicy keeporphans yumobsoletes

job install name app
result transaction,problems <inline>
#>install app-1-1.noarch@custom
#>install puppet-5-1.noarch@rawhide

nextjob

job install name app [forcebest]
result transaction,problems <inline>
#>problem 1d63b840 info nothing provides something-non-existing-yet needed by puppet-6-1.noarch
#>problem 1d63b840 solution bf4b808e deljob install name app [forcebest]

Previously, we would be able to install user some older version of app because the new one is broken for some reason. While with new behavior, we will error out. I think people do like when dependency resolver actually finds solution for them (even if installing older version).

Or is it general behavior of DNF to do something over-strict so that users would end up resolving dependencies themselves? See https://bugzilla.redhat.com/show_bug.cgi?id=1677746 for example.

From the yesterday's meeting:

  * There are too many things to consider. We will discuss this in the
    ticket for another week.  (contyk, 17:02:34)

This was discussed in today's FESCo meeting:
ACTION: bowlofeggs will file an RFE for better error messages from dnf
REJECTED: Waiting for one more week (+2, 0, -3)
AGREED: Feature is postponed to F31, without accepting or
rejecting. We need more information from the Change owners about
specific cases that this solves (+6, 0, 0)

Metadata Update from @zbyszek:
- Issue untagged with: meeting

4 months ago

Metadata Update from @zbyszek:
- Issue tagged with: next release

3 months ago

The option --best solves following issues:
foo-1-1
foo-2.1 - including security fix but has broken dependency

"dnf install foo"
Originally - it installs foo-1-1, RC=0
New behavior - It fails - RC=1

"dnf upgrade foo" # foo-1-1
Originally - Operatin ends with "Nothing to do", RC=0
New behavior - It fails, because the best candidate is not installable - RC=1

When originally discussing this, I had been confused about the above behavior and thought that it behaved in the reverse. (That --best meant "install the best one it can manage", where it actually means "fail if the latest version can't be installed"). I'm fully in favor of this change, as it's better for users to always get proper updates. It will also provide feedback quickly in the event that things become broken.

+1 from me too. The current default (equivalent of --skip-broken) feels strictly like a regression vs yum, as it allows security updates to be missed silently if there are broken dependencies.

That behaviour may be fine for rawhide where we may want to try to update as much as possible in the presence of frequent broken dependencies.

But when using mainline Fedora, I really want to be up-to-date and to know when something is broken that leaves me with older updates. --best is definitely my preference here.

Hmm, right now, on my laptop with F30, --best fails, as does --best --allow-erasing. (And after looking at the report, even though I am what could be considered a "power user", I have no clue what the underlying issue is [*]). My only recourse will be to jump back to --skip-broken.

As a middle ground, maybe dnf could default to --best only for security updates? We have this knowledge included in all bodhi updates, and we could make use of it here. If the update is marked as security, add it to the "best" set, and make that version mandatory. For other packages, do "best effort" like now.

[*]

$ sudo dnf upgrade --best --allowerasing
Error: 
 Problem: cannot install the best update candidate for package gnome-software-snap-3.32.0-5.fc30.x86_64
  - problem with installed package gnome-software-snap-3.32.0-5.fc30.x86_64
  - package gnome-software-snap-3.32.0-5.fc30.x86_64 requires snapd-login-service, but none of the providers can be installed
  - package snapd-2.37.4-2.fc30.x86_64 requires snap-confine(x86-64) = 2.37.4-2.fc30, but none of the providers can be installed
  - cannot install both snap-confine-2.38-1.fc30.x86_64 and snap-confine-2.37.4-2.fc30.x86_64
  - cannot install both snap-confine-2.37.4-2.fc30.x86_64 and snap-confine-2.38-1.fc30.x86_64
  - problem with installed package snap-confine-2.37.4-2.fc30.x86_64
  - cannot install the best update candidate for package snap-confine-2.37.4-2.fc30.x86_64

I think the problem is with virtual Provides, but from the text here, I cannot figure out what was providing that Provides, and why it stopped being provided. Maybe if the output included the name of the actual rpm, i.e. rpm -q --whatprovides snapd-login-service?

Well, that seems fairly clear to me, but maybe I'm just experienced at reading it :)

gnome-software-snap required snapd-login-service. Now the next line after that is telling you what dnf identified as the best available provider of that:

- package snapd-2.37.4-2.fc30.x86_64 requires snap-confine(x86-64) = 2.37.4-2.fc30, but none of the providers can be installed

the following lines make it fairly clear that that's an older version of snapd, but we are trying to install a newer version of snap-confine and that requires a newer version of snapd. Without --best the newer snap-confine would simply be left out to make things happy; with --best we get the error because dnf refuses to leave out an available update to make a dependency problem go away.

So either snapd should still be providing 'snapd-login-service' or gnome-software should no longer be requiring it. It seems that @ngompa did the former for F29 and earlier and the latter for F30+ earlier today, so this should be fixed with the next compose.

I can see that maybe it takes a bit of experience to read these errors, but I'm not sure it's possible or at least easy to provide more friendly ones that won't just be wrong sometimes...

edit: never mind

I'm more in @zbyszek category - that that error message is not obvious to me - could I track it down and figure it out? Sure, but it would take me 5-10 minutes.

And what are we expecting the user to do? I think the hope is that they will a) file a bug report and b) add --nobest. But most likely a) their machine will not be updated today b) they will get a bad impression of Fedora.

Saving emergency action, the turn around time on getting a fixed update out is ~24 hours at best, and likely more, so any repository breakage like this will result in a lot of people seeing errors and getting a bad impression of Fedora, to get one bug report, that we could have detected server side before pushing anything.

(If this was the only type of problem, I'd be super negative on this proposal - but problems could also involve 3rd-party repositories or local packages, so I'm simply negative.)

If we add --best to the dnf client, then we'll also, as pointed out earlier, have a disjunction between GNOME Software, and the command line. I cannot imagine in any circumstance adding a:

Problem with updating packages, do you want to repeat the transaction allowing older
versions of packages

dialog to GNOME Software.

Let's leave this specific issue aside. I think https://bodhi.fedoraproject.org/updates/FEDORA-2019-1a613fbede solves it.

The wider points are that we do get those kinds of problems, even within our core set of packages (this is updates-testing, so breakage is somewhat expected, but still), and figuring out even which package is at fault will be hard for most users. And in most cases proceeding with the transaction without problematic packages is more appropriate than failing. So again:

maybe dnf could default to --best only for security updates?

We have a long agenda already for today's FESCo meeting and this conversation is ongoing in the ticket. I'm going to skip including it in today's meeting and move it to next week.

@zbyszek There is an argument to be made, though, that we only get bugs like this in our packages currently exactly because --best is not the default. If it were the default, these kinds of issues would be much more likely to be caught in updates-testing, which would just be the system working as normal - as things stand we do get updates sent to updates-testing which are not installable in some common case, and this is usually caught by testers and the update is rejected.

But that seems like the wrong reason to do this change:
1. testers already see the existing output from dnf and can report on any such failures they see. If we are not getting enough feedback from testers on this, we can ask them to report such cases more actively, and/or to tweak their local configuration to have best=true.
2. we should have this tested automatically. Opt-in gating went through FESCo today, and "can-I-be-installed" was one of the basic tests suggested as a bare minimum. This is exactly the kind of check which is easy to automatize, and we shouldn't be burning tester time on this.

(And as was already said, the common cause of dep issues are external repos, and our QA can't do anything here.)

  1. The output of dnf when it skips an update for dep reasons doesn't actually make it very clear that that is happening. It's very easy to miss. Also, when updating with gnome-software, you really aren't informed of this at all.

  2. Nothing is ever quite as simple as it first appears: an issue can appear only if a particular other package is installed, for instance. A test that checks 'can this update be installed on top of a minimal install?' is useful, but will not catch all problems.

My point in my previous comment was not "this is a reason to turn on --best by default", those reasons were given in the proposal. It was "here are reasons why 'but if we turn on --best by default people will run into problems more often, and the fact that problems currently exist proves it!' is not necessarily entirely true".

We will discuss this issue during Friday's FESCo meeting at 15:00UTC in #fedora-meeting-1 on
irc.freenode.net.

Metadata Update from @bowlofeggs:
- Issue tagged with: meeting

2 months ago

Correction: the meeting will be in #fedora-meeting.

@jmracek we discussed this during today's meeting, and we decided that we would like to hear your responses to the comments that have been made here over the past few weeks. You can read the log of our meeting here, starting at 15:03:51:

https://meetbot.fedoraproject.org/fedora-meeting/2019-04-26/fesco.2019-04-26-15.00.log.html

Thanks I will try to answer as much question as it is possible.

Jaroslav

On Fri, Apr 26, 2019 at 6:31 PM Randy Barlow pagure@pagure.io wrote:

bowlofeggs added a new comment to an issue you are following:
``
@jmracek we discussed this during today's meeting, and we decided that we
would like to hear your responses to the comments that have been made here
over the past few weeks. You can read the log of our meeting here, starting
at 15:03:51:

https://meetbot.fedoraproject.org/fedora-meeting/2019-04-26/fesco.2019-04-26-15.00.log.html
``

To reply, visit the link below or just reply to this email
https://pagure.io/fesco/issue/2088

Reading this change proposal I'm +1 conditionally on providing a clear Contingency mechanism and Contingency deadline. Aka let's put it in F30 now but have a documented process of revoking it.
This being late also means that there was no discussion or announcement on devel. We should make sure that happens so we can revert this if we see some valid concerns there.

Proposal was moved to Fedora 31.

+1 to the change. However, the option name --nobest sounds to me badly chosen. The yum --skip-broken was a lot more descriptive.

We cannot reused --skip-broken option because it has different functionality. Skip-broken allows you to make a transaction even if one of the argument cannot be satisfied. On background --skip-broken option sets strict configuration option to false.

The --best option sets best configuration option to true.
The --best option sets best configuration option to false.

See mail to test@ list. I agree with @till , I think I'm fine with the behaviour, but would prefer --skip-broken both as it's a more descriptive name and it's the name people know from yum. (Or we could have both names, of course).

I am not supporting this idea because a single switch (--skip-broken) used for reverting multiple configuration option is even more confusing (see above). Additionally I provided an user friendly hints to use --nobest option to resolve issues.

> I prefer to describe behavior using libsolv testcase (because you can easily play with it).
> repo system 0 empty
> repo rawhide 0 testtags <inline>
> #>=Pkg: puppet 5 1 noarch
> repo custom 0 testtags <inline>
> #>=Pkg: app 1 1 noarch
> #>=Req: puppet
> #>=Pkg: app 2 1 noarch
> #>=Req: puppet >= 6
> #>=Pkg: puppet 6 1 noarch
> #>=Req: something-non-existing-yet
> 
> system x86_64 rpm system
> 
> poolflags implicitobsoleteusescolors
> solverflags allowvendorchange keepexplicitobsoletes bestobeypolicy keeporphans yumobsoletes
> 
> job install name app
> result transaction,problems <inline>
> #>install app-1-1.noarch@custom
> #>install puppet-5-1.noarch@rawhide
> 
> nextjob
> 
> job install name app [forcebest]
> result transaction,problems <inline>
> #>problem 1d63b840 info nothing provides something-non-existing-yet needed by puppet-6-1.noarch
> #>problem 1d63b840 solution bf4b808e deljob install name app [forcebest]
> 
> 
> 
> Previously, we would be able to install user some older version of app because the new one is broken for some reason. While with new behavior, we will error out. I think people do like when dependency resolver actually finds solution for them (even if installing older version).
> Or is it general behavior of DNF to do something over-strict so that users would end up resolving dependencies themselves? See https://bugzilla.redhat.com/show_bug.cgi?id=1677746 for example.

True, but this is the time to not ship the broken software. Additionally this is only a default that can be easily overwritten by best=false in /etc/dnf/dnf.conf, or from commandline using --nobest. Additional benefit will be that issues with a broken package set will be reported much earlier, therefore it can be fixed.

Hmm, right now, on my laptop with F30, --best fails, as does --best --allow-erasing. (And after looking at the report, even though I am what could be considered a "power user", I have no clue what the underlying issue is []). My only recourse will be to jump back to --skip-broken.
As a middle ground, maybe dnf could default to --best only for security updates? We have this knowledge included in all bodhi updates, and we could make use of it here. If the update is marked as security, add it to the "best" set, and make that version mandatory. For other packages, do "best effort" like now.
[
]
$ sudo dnf upgrade --best --allowerasing
Error:
Problem: cannot install the best update candidate for package gnome-software-snap-3.32.0-5.fc30.x86_64
- problem with installed package gnome-software-snap-3.32.0-5.fc30.x86_64
- package gnome-software-snap-3.32.0-5.fc30.x86_64 requires snapd-login-service, but none of the providers can be installed
- package snapd-2.37.4-2.fc30.x86_64 requires snap-confine(x86-64) = 2.37.4-2.fc30, but none of the providers can be installed
- cannot install both snap-confine-2.38-1.fc30.x86_64 and snap-confine-2.37.4-2.fc30.x86_64
- cannot install both snap-confine-2.37.4-2.fc30.x86_64 and snap-confine-2.38-1.fc30.x86_64
- problem with installed package snap-confine-2.37.4-2.fc30.x86_64
- cannot install the best update candidate for package snap-confine-2.37.4-2.fc30.x86_64

I think the problem is with virtual Provides, but from the text here, I cannot figure out what was providing that Provides, and why it stopped being provided. Maybe if the output included the name of the actual rpm, i.e. rpm -q --whatprovides snapd-login-service?

The change of format of error is not part of the change. Additionally there are always printed with DNF during not only upgrades therefore nothing changed.

I'm more in @zbyszek category - that that error message is not obvious to me - could I track it down and figure it out? Sure, but it would take me 5-10 minutes.
And what are we expecting the user to do? I think the hope is that they will a) file a bug report and b) add --nobest. But most likely a) their machine will not be updated today b) they will get a bad impression of Fedora.
Saving emergency action, the turn around time on getting a fixed update out is ~24 hours at best, and likely more, so any repository breakage like this will result in a lot of people seeing errors and getting a bad impression of Fedora, to get one bug report, that we could have detected server side before pushing anything.
(If this was the only type of problem, I'd be super negative on this proposal - but problems could also involve 3rd-party repositories or local packages, so I'm simply negative.)
If we add --best to the dnf client, then we'll also, as pointed out earlier, have a disjunction between GNOME Software, and the command line. I cannot imagine in any circumstance adding a:
Problem with updating packages, do you want to repeat the transaction allowing older
versions of packages
dialog to GNOME Software.

I have a vision to ship a perfect open source operation system - Fedora. This is a long term mission and the first step would be to know whats broken. Then we can fix it and make it better. Personally I don't want to read above lines like Fedora is broken and we afraid that some one could recognize it. I am proud developer of Fedora and I am looking forward.

So something that confuses me here... Are we not running install checks on updates-testing and failing update composes when they are unresolvable? I thought this was part of what we're supposed to do with Taskotron and such.

If we're not doing that, and we're definitely not spamming people about dependency errors, what are we doing to prevent breakages in the distribution?

So something that confuses me here... Are we not running install checks on updates-testing and failing update composes when they are unresolvable? I thought this was part of what we're supposed to do with Taskotron and such.
If we're not doing that, and we're definitely not spamming people about dependency errors, what are we doing to prevent breakages in the distribution?

We are not currently doing this. Also, as @adamwill pointed out, it's also not as simple as it might seem to catch all the kinds of problems that could occur, though I do think we could catch a lot of them with a repo test.

On Mon, 2019-04-29 at 01:21 +0000, Neal Gompa wrote:

So something that confuses me here... Are we not running install
checks on updates-testing and failing update composes when they are
unresolvable? I thought this was part of what we're supposed to do
with Taskotron and such.
=20
If we're not doing that, and we're definitely not spamming people
about dependency errors, what are we doing to prevent breakages in
the distribution?

We are not currently doing this. Also, as @adamwill pointed out, it's
also not as simple as it might seem to catch all the kinds of problems
that could occur, though I do think we could catch a lot of them with a
repo test.

I voted -1 on this proposal as it stands, but I would be willing to change my vote to a +1 if the proposal were amended to describe an acceptable user experience when a repository is broken. My concern is that a user could miss an important security update due to a broken repository that today's behavior would still install.

If dnf's --best flag fails to install updates, the user should see something like this:

$ sudo dnf upgrade
<snip>
Error: Unable to update to latest packages due to <errors>
Hint: You can use --no-best to ask dnf to install the latest packages it is able to install.
WARNING: libfoo-1.3.2-2.fc30 has an important security fix: it is recommended that you consider using --no-best to get it.

What do others think?

I voted -1 on this proposal as it stands, but I would be willing to change my vote to a +1 if the proposal were amended to describe an acceptable user experience when a repository is broken. My concern is that a user could miss an important security update due to a broken repository that today's behavior would still install.
If dnf's --best flag fails to install updates, the user should see something like this:
$ sudo dnf upgrade
<snip>
Error: Unable to update to latest packages due to <errors>
Hint: You can use --no-best to ask dnf to install the latest packages it is able to install.
WARNING: libfoo-1.3.2-2.fc30 has an important security fix: it is recommended that you consider using --no-best to get it.

What do others think?

It sounds good, but implementation will be not easy. Even not sure whatever it is deliverable.

https://meetbot.fedoraproject.org/fedora-meeting/2019-05-10/fesco.2019-05-10-15.00.html

FESCo rejects the change proposal as is. please come back with a new proposal when ready. move the discussion to the devel ML. (+5, 1, -0)

Metadata Update from @churchyard:
- Issue close_status updated to: Rejected
- Issue status updated to: Closed (was: Open)

2 months ago

Login to comment on this ticket.

Metadata