#1974 Problematic blocker for F29: dnf 'offline' module tracking
Closed: Invalid 4 years ago by sgallagh. Opened 5 years ago by adamwill.

Hi, FESCo. There's a currently-accepted Beta blocker bug for what's basically a significant missing 'safety valve' feature in dnf. To summarize quickly, if in any dnf transaction repodata from a module repo which is active on the system is missing, any packages installed from that module will be treated as non-modular.

The test scenario is to run dnf --enablerepo=updates-testing-modular module install dwm:6.0/user then run dnf update without the --enablerepo, at which point dnf will offer an update to the 6.1 version of dwm that is in the main F29 repos, which it should not really do. But there are several other ways this basic problem could be encountered.

AIUI, our latest status update from the dnf team is that 'fixing' this requires significant engineering work and they really can't be sure when it will be done, but it's highly likely it can't be done for Beta and may even not be done for Final. Tagging @dmach so he can correct this if I'm wrong.

To me, this is kind of a major problem, and it feels bad to say that we're shipping F29 with modularity-for-all as a major feature if we don't have a safety valve as basic as this sorted out. So I figured rather than just decide what to do about this via the blocker process it would be appropriate to escalate it to FESCo for consideration at that level, the question being: what should we do about this? Do we want to hold Beta for it, no matter how long it takes? Hold Final? Not hold the release for it, but change our messaging around how 'done' Modularity is, if this isn't done?

Thanks!

@sgallagh @psabata


FYI: @langdon mentioned at the Council meeting today that there might a a shorter workaround.

@psabata and I have been working this problem with the DNF folks for the last couple days. We are still designing the solution, but we are also in the process of trying to determine which use-cases we think need to be addressed for Beta and which for Final. We will try to have a better report to provide early next week.

@sgallagh can you provide an update on current status, even if things are still being figured out?

@sgallagh can you provide an update on current status, even if things are still being figured out?

After a good bit of investigation we think that it's not going to affect very many people during the Beta cycle. The most likely situation to encounter this issue would be in cases where a user did dnf update --enablerepo=updates-testing-modular and then later did dnf update before the module they had updated had ended up in the stable repo. Since the u-t repos are available by default in Beta, we probably won't have people actually hitting this.

We considered two workarounds:
Ensure that the repodata includes all modulemd for any module that has ever been present in the stream. This would address cases where users who don't update frequently-enough might end up with packages on their system from a release that was post-GA but not the latest module update, and thus DNF could lose track of it if the new module no longer included that package and the non-modular repos did. The downsides here were complexity in maintaining that information and a potential to rapidly grow the repodata size.
Include a special tag in all packages built as part of a module so that DNF could check and refuse to replace any package containing this tag with a package lacking it. This would require specialized handling in DNF, would break the planned "hotfix repo" functionality and would also require a mass-rebuild of all modules post-Freeze, so that was not an option either. (Note that we HAVE implemented that tag for F30 and it was part of the mass-rebuild I did there just after we branched, so this remains in our back pocket for future Fedora releases.)

The full solution and proper solution will be to have DNF keep a copy of the modulemd matching any installed RPMs in a local database (similar to how it retains traditional packages in the RPMDB) so it can always identify which RPMs on the system belong to a module stream and act accordingly.

So, my recommendation would be for us not to block Beta on this issue, but we should consider blocking GA on it.

How about the case I suggested in the bug, where a modular repo is misbehaving for some reason and a user figures it'd be safe to update non-modular content with dnf --disablerepo=(modularrepo) update? Or the case of doing dnf --disablerepo=* --enablerepo=somerepo something, which is something I use myself occasionally after picking it up on IRC or in the wiki or something, so it may be one of those 'cargo cult' things...

How about the case I suggested in the bug, where a modular repo is misbehaving for some reason and a user figures it'd be safe to update non-modular content with dnf --disablerepo=(modularrepo) update? Or the case of doing dnf --disablerepo=* --enablerepo=somerepo something, which is something I use myself occasionally after picking it up on IRC or in the wiki or something, so it may be one of those 'cargo cult' things...

I'm personally comfortable logging that as a Common Bugs entry for Beta. Those are power-user operations from my perspective.

The full solution and proper solution will be to have DNF keep a copy of the modulemd matching any installed RPMs in a local database (similar to how it retains traditional packages in the RPMDB) so it can always identify which RPMs on the system belong to a module stream and act accordingly.

So, my recommendation would be for us not to block Beta on this issue, but we should consider blocking GA on it.

Pfff. This sounds like a huge change and yet another thing which should not be landing in one of the most crucial packages between beta and final.

I agree that the impact of the bug itself does not seem to be too great, so it would be OK to delay the fix until final, iff the fix was clearly in sight and simple. But instead the fix looks rather complicated.

The full solution and proper solution will be to have DNF keep a copy of the modulemd matching any installed RPMs in a local database (similar to how it retains traditional packages in the RPMDB) so it can always identify which RPMs on the system belong to a module stream and act accordingly.
So, my recommendation would be for us not to block Beta on this issue, but we should consider blocking GA on it.

Pfff. This sounds like a huge change and yet another thing which should not be landing in one of the most crucial packages between beta and final.

It's likely to be a fair bit of effort, I agree. I do think it's fairly important to Modularity, but I don't think we're untestable at this point.

I agree that the impact of the bug itself does not seem to be too great, so it would be OK to delay the fix until final, iff the fix was clearly in sight and simple. But instead the fix looks rather complicated.

Yeah, there's definitely some schedule risk here. I won't deny that.

We will discuss this during Monday's meeting at 15:00UTC in #fedora-meeting-1 on
irc.freenode.net.

Metadata Update from @bowlofeggs:
- Issue tagged with: meeting

5 years ago
AGREED: we move this to a final blocker and also add a common bugs for beta and hope for the best (+7, 1, -0)

Metadata Update from @bowlofeggs:
- Issue untagged with: meeting
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

5 years ago

There seems to be debate going on in the bug about the extent to which dnf can/should do anything here, and what if anything it ought to do:

https://bugzilla.redhat.com/show_bug.cgi?id=1616167#c12
https://bugzilla.redhat.com/show_bug.cgi?id=1616167#c13

we may need FESCo to take another look at this to have clarity on the expectations for Final.

Metadata Update from @zbyszek:
- Issue status updated to: Open (was: Closed)

5 years ago

Metadata Update from @zbyszek:
- Issue tagged with: meeting

5 years ago

I'm +1 for FESCo getting more involved in finding a reasonable solution to this, but otherwise I'm not sure what we're being asked to do here.

@jsmith: I think we're being asked to define exactly which problems must be solved in order for the blocker to be considered resolved.

I think for me, we need to handle at minimum the following case:

I want to use a modular package that for an older version than is in the standard repo. I install it from updates-testing because the testing version fixes a problem I know I would otherwise hit. I do dnf install --enablerepo=*-testing mypackage. Later, I want to update the rest of my system, but I don't want to update everything from u-t, so I just do dnf update. I do NOT want my package to be updated incorrectly to the non-modular version because DNF no longer recognizes that I have a modular RPM installed and selects the higher NVR of the non-modular version for an update.

@jsmith right, as @sgallagh said. From the QA side our last understanding here was that FESCo and DNF team broadly agreed that something akin to @sgallagh 's proposal above could/should be implemented for F29 Final, and the bug was set to block F29 Final on that basis. The recent discussion in the bug suggests that is not the case, so it's no longer clear to us what would constitute "resolution" of this bug as an F29 Final blocker. Thus we'd like FESCo to look at it again and make another call on what is expected to be achieved in this area before F29 can go out. Thanks.

After considerable discussion in the meeting today, I'm going to propose a more constrained requirement that we will make for blocking F29:

Proposal: At any time, a user must be able to run dnf distro-sync and the result must be that while RPMs (modular and non-modular) may be upgraded or downgraded based on the repositories that are enabled and accessible (taking into account skip-if-unavailable settings), they may NOT change streams. A change of stream is defined as follows: "An RPM installed as part of a particular module stream must not update to a different stream of that module, a stream of a different module or to a non-modular RPM." Additionally, non-modular RPMs are permitted to update into a default module stream.

Edit: Added note about non-modular RPMs being permitted to update to a default module stream.

+1 to @sgallagh's most recent proposal.

+1 to @sgallagh's amended proposal.

This was discussed in today's FESCo meeting.
* AGREED: We table this for a week to give modularity+dnf folks time
to discuss and respond to the proposed requirement. We will revisit
next meeting. (+6, 0, 0) (zbyszek, 15:54:31)
* proposal for requirements for DNF behaviour at GA:
https://pagure.io/fesco/issue/1974#comment-534269 (zbyszek,
15:54:34)
* given time constraints, we ask for the discussion to start now
(zbyszek, 15:54:37)

AGREED: file fesco special final blocker on this, try and get more estimate input from dnf team (+7,0,-0) (jforbes, 15:18:50)

Metadata Update from @jforbes:
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

5 years ago

(In reply to Daniel Mach from comment #18)

Estimated delivery for this feature is end of January.

Metadata Update from @zbyszek:
- Issue status updated to: Open (was: Closed)

5 years ago

From the meeting today:

* #1974 Problematic blocker for F29: dnf 'offline' module tracking
  (contyk, 15:02:46)
  * LINK: https://pagure.io/fesco/issue/1974   (contyk, 15:02:52)
  * AGREED: drop blocker status on bug, document in release notes,
    common bugs and anywhere else we can (+7, 0, -0)  (contyk, 15:21:05)

Who will handle this? Any action on our side?

Metadata Update from @psabata:
- Issue untagged with: meeting

5 years ago

Metadata Update from @psabata:
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

5 years ago

Metadata Update from @adamwill:
- Issue status updated to: Open (was: Closed)

5 years ago

So this one has come up again for F30. The DNF feature is still not implemented. See recent comments on https://bugzilla.redhat.com/show_bug.cgi?id=1616167 . Once again, we (the blocker bug review teams) are kicking this to FESCo to decide what action is appropriate here for F30.

Apparently, DNF team is now not sure they can implement this feature for F30 either.

Metadata Update from @psabata:
- Issue tagged with: meeting

5 years ago
  * AGREED: As we're fairly sure this wouldn't be fixed in time for F30,
    we're moving this to F31. Modularity & DNF folks with prepare a
    specific implementation plan and will update the ticket (+6, 0, -0)
    (contyk, 15:20:34)

Metadata Update from @psabata:
- Issue untagged with: meeting
- Issue assigned to psabata

5 years ago

Any response from the DNF team on this?

We agreed with modularity team to delivery the feature in F31.

Metadata Update from @zbyszek:
- Issue tagged with: next release, pending announcement

5 years ago

The postponement announced along today's meeting agenda.

Metadata Update from @zbyszek:
- Issue untagged with: pending announcement

5 years ago

Do we need to keep this ticket open?

I don't see a reason to keep it open. If we get close to F31 and it's not ready, then we can open a new ticket to review it.

From my point of view the original issue should not be marked as a Fedora blocker. Fedora 29 and 30 showed that this feature is nice to have and not a blocker. Playing with "soft blockers" is something that we should avoid.

Metadata Update from @sgallagh:
- Issue close_status updated to: Invalid
- Issue status updated to: Closed (was: Open)

4 years ago

Login to comment on this ticket.

Metadata