#4866 mirror manager issue? / dnf --refresh reverts to older metadata
Closed: Fixed None Opened 8 years ago by mschwendt.

Filing this per suggestion from Honza Šilhan (jsilhan):

= bug description =

There has been another thread on users@ list about dnf not working on fresh repo metadata and whether/how it can be told to download latest repo metadata reliably. Some users found out that not even "dnf clean all ; dnf update" gives them access to the latest metadata.

Even worse, when testing it in Rawhide, consecutive runs of "dnf update --refresh" with a time difference of only one minute made the tool revert to older metadata:

My other reports with the subject "dnf --refresh reverts to older metadata" prefix can be found in the archives easily.

= bug analysis =

I've suggested that the Mirror Manager and DNF developers sit together and look into it to rule out that the metalink server sends out information about old metadata.

In addition to that, there may be a bug in DNF, too. Expired metadata should remain available for the tool to be verified/confirmed (via a timestamp/checksum check), so DNF would not need to redownload them if they are still considered "current", "fresh" or "latest". If, however, "dnf --refresh …" not only expires the cache but deletes it, they must be downloaded. In that case, mirror manager takes precedence and decides what metadata will be used, because dnf cannot access cached (and possibly more recent) metadata anymore. Can it happen that mirror manager advertises mirrors carrying older metadata than before?

= fix recommendation =

Verify the metalink checksum/timestamp mechanism, so it is never reverted to older metadata compared with what has been offered earlier.


We've looked more deeper into this issue and the mirrors distribution is designed well. The freshest metadata should be checked by verification of checksum of repomd in metalink and then the rest of files (primary, filelist, etc.) of the freshest repomd verification checksums. DNF verifies the repomd checksum and the rest files too. The problem comes when there are some metadata cached. It is probably a bug in DNF. We have already a patch for "metadata not synchronized properly" issue which should fix this. Lets wait for next release of DNF. After your confirmation that everything works fine then we will be able close this otherwise we will reinvestigete this. The bug in the infrastructure could occur only when the checksums in metalink doesn't match the recent repomd or the other files in one mirror are not from the same date metadata creation.

So, I am hard pressed to think of how the mirrorlists servers could issue older metalink info. They are all updated from the same source at the same time.

Patrick wrote a tool to help try and isolate these problems:

https://github.com/puiterwijk/check_metalink

If we could get the output from that we could see what metalink(s) they got and what the master mirrors had.

But yes, if there was a fix in dnf related to this, lets see if thats the issue...

Happy to help explain the process or provide further info to help track this down.

The problem comes when there are some metadata cached. It is probably a bug in DNF.

https://lists.fedoraproject.org/pipermail/users/2015-August/464144.html

The first run of "dnf update --refresh" downloaded metadata, although it didn't result in any changes compared with earlier. It's safe to assume that it put these downloaded metadata into the cache afterwards. Or is that a wrong assumption?

Under the assumption, the second "dnf update --refresh" run also downloaded metadata, but there were fewer updates available afterwards. Old metadata.

So, where did the old metadata come from? Why would it even be possible to think that it could be cached metadata? It had newer metadata in the cache. Then it downloaded something and ended up with older metadata.

What's the theory here?

So, I have a theory...

Mirrormanager when constructing the metalinks adds the previous repomd.xml in for a few days as an alternate. This is to prevent a new updates push going out, mm updating to it, and 0 mirrors have synced it yet, so the user gets errors.

So, perhaps the metalink had an alternate in it, and the first time it got a mirror using the new one and the second time it got a mirror using the older one.

So, perhaps here dnf could see that there's multiple repomd.xml's in the metalink, try the newest one first, if that fails, then try the older one, etc.

And all that within a minute. That is, it had metadata for more updates an hour before already. Then two consecutive runs changed all that? Why? Because mirror manager offered a new metalink file? Or because DNF didn't evaluate the timestamps properly?

About the alternates, I've asked about them before. There are timestamps on all the metadata references. It should be easy to tell which one is "latest", shouldn't it?

Reading a four years old ticket https://fedorahosted.org/mirrormanager/ticket/35 which talks about "timestamps increasing monotonically", in the cached Rawhide metalink.xml file they are higher for the alternates:

<file name="repomd.xml">
<mm0:timestamp>1439884975</mm0:timestamp>

    <mm0:alternate>
        <mm0:timestamp>1440006338</mm0:timestamp>

    <mm0:alternate>
        <mm0:timestamp>1440055923</mm0:timestamp>

Is that correct?

Well, looking at a fedora 22 updates-testing x86_64 right now:

{{{
<?xml version="1.0" encoding="utf-8"?>
<metalink version="3.0" xmlns="<a href="http://www.metalinker.org/"">http://www.metalinker.org/" type="dynamic" pubdate="Sun, 23 Aug 2015 17:15:07 GMT" generator="mirrormanager" xmlns:mm0="http://fedorahosted.org/mirrormanager">
<files>
<file name="repomd.xml">
<mm0:timestamp>1440197844</mm0:timestamp>
<size>4834</size>
<verification>
<hash type="md5">757cdede5a4d931e5173ccb034c4681f</hash>
<hash type="sha1">0fbb56c81958f01bf474d28551043cb56a834221</hash>
<hash type="sha256">9951c1532e3cf63e7a7e94e9ee5c7653c002a74b41ef4380168c368f67461eba</hash>
<hash type="sha512">f911f6e78348372a2130722a379cf06f1391e79ce623904d016d23409304267208f4b20a90558fd0b0f0af76222917d28bdd4c64f68ceedea4db7f4b56dddc73</hash>
</verification>
<mm0:alternates>
<mm0:alternate>
<mm0:timestamp>1440293340</mm0:timestamp>
<size>4833</size>
<verification>
<hash type="md5">73a6e8c15b9e38ea24de7cec9c88466f</hash>
<hash type="sha1">3b083e6aa1f6c358124e328d67fcb56ce366588e</hash>
<hash type="sha256">f8f4a2dc0a3fad0fa45a03a2b2864b105d8df8c86576e068c4c89cab6c37f103</hash>
<hash type="sha512">e2fd4e9ab1c5066e5a9d653c373b11036581d4d097ccc3c2a870850634b3ff2ab93b254d2e583dda3cc1b9d1228c4eb0be11abb55c17064d3ae2380830dbc7eb</hash>
</verification>
</mm0:alternate>
<mm0:alternate>
<mm0:timestamp>1440099783</mm0:timestamp>
<size>4832</size>
<verification>
<hash type="md5">c6646e40581048381910b00b6249be4d</hash>
<hash type="sha1">e8d03199b95009f9b3107f83b68665c12c3d3e69</hash>
<hash type="sha256">987841ead41ba3576037e39041dbd827a00008cfd2b6971a11fe0b9616346bd1</hash>
<hash type="sha512">6993e6a03ec44ea3e212f2620147733b64e46a6b05fbdcecbc5a9c6243889be21253f7c2f0aa5a552603da41063fa29ffd30b520280aee846d4260cf2b74262f</hash>
</verification>
</mm0:alternate>
</mm0:alternates>
}}}

And all that within a minute. That is, it had metadata for more updates an hour before already. Then two consecutive runs changed all that? Why? Because mirror manager offered a new metalink file? Or because DNF didn't evaluate the timestamps properly?

Hard to say on the information here.

  • Mirrorlists update once per hour, so if they got the metalink right before an update and then right after it could have changed.

  • Metalinks also change from time to time as mirrormanager tries to randomize the list of up to date mirrors so all users don't hit the exact same one(s). So the list could have changed order.

  • They could have gotten the exact same metalink, but dnf hit a mirror with an anternate but different repomd first. I don't know what method it uses here, dnf folks would have to answer...

Well, looking at a fedora 22 updates-testing x86_64 right now:

How is that an answer to the question?

In what you've shown, the first of the two alternates has a timestamp higher than the primary repomd.xml timestamp. That doesn't match your earlier explanation in comment 4.

It looks much more like all that matters is the timestamps, and one of the alternates may refer to the most recent metadata.

In my most recent Rawhide metalink, the three timestamps are increasing, too. Second alternate is highest. Which mirror carries the metadata this second alternate refers to?

Where is the connection between these three different repodata "releases" in the metalink and the long list of mirrors with its "preference" attribute?

Is mirror manager accurate/careful enough when creating that list, and the mirror with highest preference carries the most recent metadata?

Or may it carry either one of the three metadata releases mentioned in the metalink? Then DNF would need to some mirror crawling to find a mirror that really carries the most recent metadata. Especially after one of those infamous "dnf clean all" runs or after deleting the cache via other options.

Anyway, if the DNF developers think it's a simple bug in "dnf", then I don't understand the suggestion to "report it to Fedora infrastructure". The feedback here so far confirms that the matter is more complex than everyone may believes it to be.

I've talked to librepo developer (the DNF library responsible for downloads) and the issue could be that the first mirrorlist with checksum match with any of the alternates or {{{<file name="repomd.xml">}}} is downloaded. We can order that by timestamps. The question is what to do if any of the newest metadata cannot be downloaded? (continue with the second newest? or fail?) Why is not the most recent update in {{{<file name="repomd.xml">}}} section but is in alternates instead?

Hi,
the bug I've mentioned to Jan is: https://bugzilla.redhat.com/show_bug.cgi?id=1019103

The issue there was that metalink xml contained checksum of repomd.xml for version that wasn't available on any mirror and DNF (librepo) was trying hundreds of mirrors without success. This is a reason why currently librepo accepts any repomd.xml now.

Do you have any proposals how to solve this?

The solution Jan mentioned - sort the repomd checksums by timestamps and at first try to download the most recent one and the older try only if no mirror has the newest seems reasonable to me. (Of course the number of mirrors that could be tried would be limited - e.g. try to get the most recent repomd from first five mirrors and if no success, then try another checksum from the sorted sequence of checkums)

Replying to [comment:7 mschwendt]:

Well, looking at a fedora 22 updates-testing x86_64 right now:

How is that an answer to the question?

No, it's a description of what (as far as I know) happens. Which specific question were you wanting answered?

In what you've shown, the first of the two alternates has a timestamp higher than the primary repomd.xml timestamp. That doesn't match your earlier explanation in comment 4.

I didn't say anything about ordering in comment 4. I said there would be alternate repomd's with different timestamps. I have no idea how dnf handles them.

It looks much more like all that matters is the timestamps, and one of the alternates may refer to the most recent metadata.

Right.

In my most recent Rawhide metalink, the three timestamps are increasing, too. Second alternate is highest. Which mirror carries the metadata this second alternate refers to?

No idea. At least the master mirrors since it's the newest timestamp, but could be any number of others too, depending on how long ago that landed and when the mirrors synced.

Where is the connection between these three different repodata "releases" in the metalink and the long list of mirrors with its "preference" attribute?

The repodata "releases" are each changes in the repomd.xml that mm has detected. For rawhide this happens... every day. So it's "today's rawhide", "yesterdays rawhide" and "the day before's rawhide".

The list of mirrors are those mirrors that are up to date by the crawler. The crawler runs every day, but has 300+ mirrors to crawl. I don't know if it considers a mirror up to date that has just "today's rawhide" or any of them.

Is mirror manager accurate/careful enough when creating that list, and the mirror with highest preference carries the most recent metadata?

No. The preference isn't based on that. It's based on bandwith that mirrors say they have available, the mirrors marked 'up to date' in the database for that thing, and a randomness to prevent every single user from mobbing a single mirror.

Or may it carry either one of the three metadata releases mentioned in the metalink? Then DNF would need to some mirror crawling to find a mirror that really carries the most recent metadata. Especially after one of those infamous "dnf clean all" runs or after deleting the cache via other options.

I don't know for sure, would have to investigate the code, but it could well be any with of the repomd's, current or alternates.

Anyway, if the DNF developers think it's a simple bug in "dnf", then I don't understand the suggestion to "report it to Fedora infrastructure". The feedback here so far confirms that the matter is more complex than everyone may believes it to be.

yes, it is complex. I am perfectly happy to adjust things in mm if it makes sense to do so.

Replying to [comment:9 tmlcoch]:

Hi,
the bug I've mentioned to Jan is: https://bugzilla.redhat.com/show_bug.cgi?id=1019103

The issue there was that metalink xml contained checksum of repomd.xml for version that wasn't available on any mirror and DNF (librepo) was trying hundreds of mirrors without success. This is a reason why currently librepo accepts any repomd.xml now.

Was it the 'oldest' of them?
its possible that mirrors had all either synced to newer or been dropped out as not up to date.

ie, using the rawhide example, there's 3 alternates, it might be sometimes the oldest of those no longer matches any mirror in the metalink.

Do you have any proposals how to solve this?

The solution Jan mentioned - sort the repomd checksums by timestamps and at first try to download the most recent one and the older try only if no mirror has the newest seems reasonable to me. (Of course the number of mirrors that could be tried would be limited - e.g. try to get the most recent repomd from first five mirrors and if no success, then try another checksum from the sorted sequence of checkums)

Well, the alternates were added as a way to prevent the master mirrors and mm from updating and there being no mirrors synced yet. (ie the time right after a change).

I suppose sorting them by time (or fixing mm to output them in time sort order?) and then trying each might work.

'''Repomd downloading in Librepo'''

'''Current logic:'''
Parse metalink.xml
Download repomd.xml from mirrors in order as listed in metalink
* Use first repomd.xml which checksum matches a checksum listed in metalink (regardless if the checksum is from .../file/verification or from .../alternate/verification)

'''Notes for the current logic:'''
Usually the first mirror in the list is able to satisfy our needs (so only one repomd.xml is downloaded)
Max number of mirrors that are tried can be (and usually it is) limited by librepo's option LRO_MAXMIRRORTRIES

'''Proposed logic:'''
Parse metalink.xml
Sort all available checksums for repomd.xml by timestamp (checksums from .../file/verification and from .../alternate/verification)
Download repomd.xml from mirrors in order as listed in metalink
Use first repomd.xml which checksum matches the newest checksum from metalink
* If number of mirrors specified by LRO_MAXMIRRORTRIES (0 for all mirrors in metalink) were tried with no success, remove the newest checksum from the list and try to find a repomd.xml that matches the next checksum.

This proposed logic should guarantee (to some extent limited by LRO_MAXMIRRORTRIES value) that Librepo will use the most recent available repomd.xml.

The negative effect of this approach is that multiple repomd.xml can be downloaded (even hundreds if LRO_MAXMIRRORTRIES is set to 0 and vast part of mirrors is not synced yet).

I don't know how hard would be to update mm to put the most up-to-date mirrors first in the metalink but IMHO it would be the most universal solution because it would solve the problem not only for Librepo but for all other download managers that honor order of mirrors in metalink.
Also it would save bandwidth and time for every user that would be updating metadata during mirror-sync period.

But I don't know if mm has information about which mirrors are synced and which not yet.
Also I'm worried that sorting metalink by up-to-date data could cause heavy load to the master mirrors during synchronization.

What do you think?

It would be better to move this topic onto a mailing-list, IMO.

///

Which specific question were you wanting answered?

Kevin, you've written that mirror manager "adds the previous repomd.xml in for a few days as an alternate". Given the wording in the XML metalink file, using the word "alternate" has been ambiguous.

If there were documentation using unclear terminology like that, it would be bad documentation. Fact is, the metalink contains multiple references to the last three (or possibly more?) repodata releases in no particular order. It can be one of the alternates that refers to the newest repodata.

I've pointed that out and asked whether what is found in the Rawide metalink file is correct? An easy question, but not if the term "alternates" is used in different ways. It's only a simple matter, but I had hoped for some accurate information and a clear idea how to do it and how to put the metalink file contents to good use. Afterall, package tool developers need to know what to expect from the metalink data.

///

So far, this ticket confirms all the mirroring problems users have been facing over a long time back to Yum era. Especially the crawling of dozens of mirrors only to discover they are not up-to-date or are incomplete (broken repodata or missing packages). And the mirror propagation times that confuse the users, if bodhi/bugzilla/announcements mention new updates, but the package tools just don't "see" them.

How long on average does it take for repo updates to appear on mirrors? For instance, an update released in bodhi, how long does it take for it to become available to users in another country?

Users don't wait for such updates to become available automatically, if hours after an announcement, the updates are still not ready to be downloaded. They get the impression they need to fix things Fedora doesn't seem to get right. Making users play with "yum clean all" has been the worst idea ever. Users, who haven't switched to "dnf clean all", play with the --refresh option without being successful. With no safe way to fetch the latest repodata (or at least refetch some that have been cleaned from the cache), wiping out the local metadata cache bears high risks (such as unresolvable deps, and imagine what a distro-sync would do when switching back to older metadata!).

///

Indeed, when restarting with an empty metadata cache, a fatal error condition can be reached: Failure to download any metadata that are at least of the same age than the metadata, which have been used before to install updates or packages. This can happen, if all the tried mirrors either carry older metadata or are broken while syncing newer metadata. The same can happen, if the local metadata cache is invalidated when discovering that all of the previously current mirrors are not usable anymore => no known place that still carries the currently active metadata => need to search for other mirrors and/or newer metadata. Lots of mirror crawling again.

Whether package tools even remember the age/timestamp of metadata that have been used before to update the installation, I dunno. DNF doesn't display such a detail for the cached metadata either. Instead, it displays when the metadata have been checked last, which is a meaningless detail and only encourages users to force a refresh more often, even if it leads to nothing. More relevant, IMO, would be to display the date/timestamp of the active/cached metadata, so users can compare it with the date of announced updates. That would also clear the confusion caused by a "dnf clean all" or --refresh still not causing a download of the very latest metadata (since it has not arrived on the mirrors yet).

///

Proposed logic:

What do you think?

Some good thoughts in there.

As I understand the information about mm that has been provided in this ticket, you cannot avoid crawling multiple mirrors in search of any metadata that matches the metalink details. The mirrors at the top of the list may carry latest metadata, or not, or may be broken again already being busy syncing even newer updates. Mirrors further down on the list may be more up-to-date meanwhile.

When sorting by timestamps, the timestamp of the cached metadata needs to be considered, too. Visiting the first few mirrors, it could be that they carry newer metadata than what's cached locally, but not the very latest advertised in the metalink. A "dnf clean metadata" run complicates matters a lot, because then only downloading the very latest might be sufficient. The second latest may be too old already.

Replying to [comment:13 mschwendt]:

It would be better to move this topic onto a mailing-list, IMO.

Personaly I like this way better. At least we have it tracked.

Whether package tools even remember the age/timestamp of metadata that have been used before to update the installation, I dunno.

Yes, DNF remembers it.

DNF doesn't display such a detail for the cached metadata either. Instead, it displays when the metadata have been checked last, which is a meaningless detail and only encourages users to force a refresh more often, even if it leads to nothing. More relevant, IMO, would be to display the date/timestamp of the active/cached metadata, so users can compare it with the date of announced updates. That would also clear the confusion caused by a "dnf clean all" or --refresh still not causing a download of the very latest metadata (since it has not arrived on the mirrors yet).

The last local file modification was shown for each repo in verbose mode (-v). I've changed it to show the real metadata timestamp on the server [1].

[1] https://github.com/rpm-software-management/dnf/commit/7f7adb78678b21817c9baedba42315fa0cab181a

Proposed logic:

What do you think?

Some good thoughts in there.

As I understand the information about mm that has been provided in this ticket, you cannot avoid crawling multiple mirrors in search of any metadata that matches the metalink details. The mirrors at the top of the list may carry latest metadata, or not, or may be broken again already being busy syncing even newer updates. Mirrors further down on the list may be more up-to-date meanwhile.

Kevin, can mm recheck the synced mirrors and add new "latest=1/2/3..." attribute to url tags? So it would still honor the mirror preference (will not overwhelm just some of them) while knowing before accessing the url whether they are most up-to-date or not.

Replying to [comment:12 tmlcoch]:

...snip...

This proposed logic should guarantee (to some extent limited by LRO_MAXMIRRORTRIES value) that Librepo will use the most recent available repomd.xml.

One possible negative here is if mirrormanager has just updated for a new repomd.xml, the only mirror that might have that repomd.xml is the master mirror, resulting in everyone hitting the master mirror and swamping it. (The master mirror is listed at or near the very bottom with a low priority).

The negative effect of this approach is that multiple repomd.xml can be downloaded (even hundreds if LRO_MAXMIRRORTRIES is set to 0 and vast part of mirrors is not synced yet).

yeah.

I don't know how hard would be to update mm to put the most up-to-date mirrors first in the metalink but IMHO it would be the most universal solution because it would solve the problem not only for Librepo but for all other download managers that honor order of mirrors in metalink.
Also it would save bandwidth and time for every user that would be updating metadata during mirror-sync period.

We can look at doing that.

But I don't know if mm has information about which mirrors are synced and which not yet.

Some. There's a crawler that checks them, but it is by no means instant.

Also I'm worried that sorting metalink by up-to-date data could cause heavy load to the master mirrors during synchronization.

Right.

What do you think?

Can we check what yum did in the past here?

Replying to [comment:14 jsilhan]:

Replying to [comment:13 mschwendt]:

It would be better to move this topic onto a mailing-list, IMO.

Personaly I like this way better. At least we have it tracked.

You're welcome to open a thread on the infrastructure list if you like.

Whether package tools even remember the age/timestamp of metadata that have been used before to update the installation, I dunno.

Yes, DNF remembers it.

DNF doesn't display such a detail for the cached metadata either. Instead, it displays when the metadata have been checked last, which is a meaningless detail and only encourages users to force a refresh more often, even if it leads to nothing. More relevant, IMO, would be to display the date/timestamp of the active/cached metadata, so users can compare it with the date of announced updates. That would also clear the confusion caused by a "dnf clean all" or --refresh still not causing a download of the very latest metadata (since it has not arrived on the mirrors yet).

The last local file modification was shown for each repo in verbose mode (-v). I've changed it to show the real metadata timestamp on the server [1].

[1] https://github.com/rpm-software-management/dnf/commit/7f7adb78678b21817c9baedba42315fa0cab181a

Proposed logic:

What do you think?

Some good thoughts in there.

As I understand the information about mm that has been provided in this ticket, you cannot avoid crawling multiple mirrors in search of any metadata that matches the metalink details. The mirrors at the top of the list may carry latest metadata, or not, or may be broken again already being busy syncing even newer updates. Mirrors further down on the list may be more up-to-date meanwhile.

Kevin, can mm recheck the synced mirrors and add new "latest=1/2/3..." attribute to url tags? So it would still honor the mirror preference (will not overwhelm just some of them) while knowing before accessing the url whether they are most up-to-date or not.

So you mean add additional data to the metalink as to which repomd.xml the mirror had at the time it was last crawled?
That might be possible, but I suspect it would be pretty error prone. Mirrors could well update after they were crawled...

It would be great if we could think of a more simple process instead of making it more complex, but I'm not sure thats going to be possible. It may also not be possible to always say 'dnf --refresh update' runs will always show the same pending updates.

Replying to [comment:13 mschwendt]:

Which specific question were you wanting answered?

Kevin, you've written that mirror manager "adds the previous repomd.xml in for a few days as an alternate". Given the wording in the XML metalink file, using the word "alternate" has been ambiguous.

Right, as with most free software there's no detailed spec available that defines all terms and how everything works.

If there were documentation using unclear terminology like that, it would be bad documentation. Fact is, the metalink contains multiple references to the last three (or possibly more?) repodata releases in no particular order. It can be one of the alternates that refers to the newest repodata.

Right. Feel completely free to make some more detailed documentation. I was simply trying to explain what I know of the way mirrormanager works. I didn't know I was writing documentation, I thought I was trying to help solve some issue.

I've pointed that out and asked whether what is found in the Rawide metalink file is correct? An easy question, but not if the term "alternates" is used in different ways. It's only a simple matter, but I had hoped for some accurate information and a clear idea how to do it and how to put the metalink file contents to good use. Afterall, package tool developers need to know what to expect from the metalink data.

"what is found in the rawhide metalink is correct". I don't understand the question then. What does "correct" mean?

I didn't intend to use alternates in different ways.

Let me try again:

The repomd.xml descriptions in the "file" and "alternates" sections in the metalink are the most recent repomd.xml and up to 2 previous ones, in no particular order.

///

So far, this ticket confirms all the mirroring problems users have been facing over a long time back to Yum era. Especially the crawling of dozens of mirrors only to discover they are not up-to-date or are incomplete (broken repodata or missing packages). And the mirror propagation times that confuse the users, if bodhi/bugzilla/announcements mention new updates, but the package tools just don't "see" them.

I am not sure I agree.

How long on average does it take for repo updates to appear on mirrors? For instance, an update released in bodhi, how long does it take for it to become available to users in another country?

The question is indeterminate. We don't run mirrors. Each mirror decides when and how often they sync. We provide tools to allow them to do so quite quickly (they can trigger syncs based on fedmsgs now).

...snip...

constructive suggestions for improvement very welcome.

Yum has had lots of problems finding a working mirror, crawling over a seemingly never-ending list of mirrors and/or encountering 404 Not Found errors for packages later. This has been a topic on users@ list regularly.

Pretending that Yum has worked fine, would be a mistake. Downloading new metadata from a mirror is only safe, if the mirror carries all packages already, too. That has not been the case always either.

[...]

I didn't intend to use alternates in different ways.

I had hoped this would have been cleared already. Do you agree, at least, that it has been confusing to claim that //"it adds the previous repomd.xml in for a few days as an alternate"// when the <mm0:alternate> entries in the metalink file may refer to the latest repodata instead of the "previous" one?

Anyway, your most recent try is more accurate and confirms what I've pointed out.

[...]

Each mirror decides when and how often they sync.

That's not much of a basis for a repodata download tool/library. Any statistics known about the mirrors? Are any high profile mirrors known, which sync frequently or quickly? Else repodata downloading cannot be done without lots of crawling. And not without more frequent automatic refresh attempts either (i.e. metalink refers to new repomd, which is not found on any mirror yet, so it would be wrong to wait hours before crawling once more).

[...]

It would be great if we could think of a more simple process

If mm doesn't give any hint about which mirror to choose, not much can be done other than implementing a more complex repodata caching tool to be run on each client. It keeps track of the currently and previously assigned mirrors, their metadata timestamps, and works based on that. Instead of relying entirely on mm and its list of mirrors.

You're assigned to Mirror N, you learn via mm that the master mirror has been updated, then eventually Mirror N will offer the updates, too. If nobody knows when that may be, crawling cannot be avoided. So, additionally, do a bit of crawling of a few mirrors from mm's mirrorlist to search for a mirror, which may be up-to-date already, and update the local cache details about the visited mirrors. Switch to Mirror X, if it's up-to-date, else stick to Mirror N and check it more frequently till it's found to be up-to-date.

Replying to [comment:16 kevin]:

Replying to [comment:14 jsilhan]:

Replying to [comment:13 mschwendt]:

As I understand the information about mm that has been provided in this ticket, you cannot avoid crawling multiple mirrors in search of any metadata that matches the metalink details. The mirrors at the top of the list may carry latest metadata, or not, or may be broken again already being busy syncing even newer updates. Mirrors further down on the list may be more up-to-date meanwhile.

Kevin, can mm recheck the synced mirrors and add new "latest=1/2/3..." attribute to url tags? So it would still honor the mirror preference (will not overwhelm just some of them) while knowing before accessing the url whether they are most up-to-date or not.

So you mean add additional data to the metalink as to which repomd.xml the mirror had at the time it was last crawled?
That might be possible, but I suspect it would be pretty error prone. Mirrors could well update after they were crawled...

It would be great if we could think of a more simple process instead of making it more complex, but I'm not sure thats going to be possible. It may also not be possible to always say 'dnf --refresh update' runs will always show the same pending updates.

IMO it's better to have more complex server side - to show some hints otherwise the clients would try every mirror trying to find the latest and that would overhelm mirrors even more.

Lets say straight we cannot deal with the fact that the fastest synced mirros will be overhelmed at first. We can fairly distribute the bandwidth of mirrors using P2P though with no need to determine the mirror preference. Has anyone created proof of concept P2P metadata exchange? what's your opinions?

Kevin, we really think having the file and alternates checksums sorted by timestamp would make sense + let PM known which mirror is up-to-date and which not before fetching the repomd.

In DNF we can have two policies for metadata loading.
best - will try only mirros with the recent updates
any (some updates are better than none) - will try any available mirror from file and alternates

It could work like that in DNF real example:

{{{dnf update}}}
- only expired mirros are updated with ANY policy informing user the mirror is not the most recent one (the mirros are expired if local metadata timestamp exceeds some DNF time constant)
- if during package download some package url gets 404 then DNF will fail and repo will be marked as expired (could happen if metadata is outdated)
{{{dnf update --refresh}}}
- all metadata are fetched with BEST policy, if not reachable then with ANY policy informing user the mirror is not the most recent one.

Replying to [comment:19 jsilhan]:

IMO it's better to have more complex server side - to show some hints otherwise the clients would try every mirror trying to find the latest and that would overhelm mirrors even more.

Lets say straight we cannot deal with the fact that the fastest synced mirros will be overhelmed at first. We can fairly distribute the bandwidth of mirrors using P2P though with no need to determine the mirror preference. Has anyone created proof of concept P2P metadata exchange? what's your opinions?

We don't control mirrors directly, they are all volenteers who carry our content. I am pretty sure that they would not be willing to run special p2p stuff for Fedora.

Kevin, we really think having the file and alternates checksums sorted by timestamp would make sense + let PM known which mirror is up-to-date and which not before fetching the repomd.

ok. Filed https://github.com/fedora-infra/mirrormanager2/issues/116 to ask for that.

In DNF we can have two policies for metadata loading.
best - will try only mirros with the recent updates
any (some updates are better than none) - will try any available mirror from file and alternates

It could work like that in DNF real example:

{{{dnf update}}}
- only expired mirros are updated with ANY policy informing user the mirror is not the most recent one (the mirros are expired if local metadata timestamp exceeds some DNF time constant)
- if during package download some package url gets 404 then DNF will fail and repo will be marked as expired (could happen if metadata is outdated)
{{{dnf update --refresh}}}
- all metadata are fetched with BEST policy, if not reachable then with ANY policy informing user the mirror is not the most recent one.

That seems overcomplex to me...

Perhaps more data would help. I talked to mirrormanager developers yesterday and they are going to see if we can gather some info about update times for mirrors. ie, when we update master, how many update in an hour, 3 hours, 6 hours, a day, etc... That might tell us how long we should keep alternates or if we could do some other approach.

how long we should keep alternates

That's not so interesting, since the answer depends on various factors, such as how often a repo is updated per day and per week.

If you ever need to update a repo thrice a day, you likely want to put the checksum of more than the last three repodata releases in the metalink.

And since those checksums are the only way for metadata download tools to learn which metadata may be trusted, it would be plausible to delete old checksums after 4-7 days and not already after 24 hours. It doesn't make the metalink xml much larger.

Really much more important would be MM giving a hint on which mirrors may carry the most recent metadata. I see preference= values from 100 down to 64 without any gaps. Is this only used for basic sorting by priority? Splitting the values into well-defined groups, 100-75 could mean "mirror usually fetches the latest metadata multiple times per day", 75-50 "at least every 24 hours", 50-25 "at most once per day", and so on. Depending on what is known about the mirrors.

The repomd.xml alternates sorting by timestamp is now running on the MirrorManager production systems:

https://github.com/fedora-infra/mirrormanager2/issues/116

So, what more can we do here?

1) There's discussion in ticket #4882 about timeouts getting the metalink after 30s (mostly from yum-cron/dnf automatic). I've not been able yet to islolate the cause of those. Do we want to make some change in dnf for this? Increase the timeout or retry (mirrors.fedoraproject.org is a round robin dns).

2) Aside from sorting the repomd.xml / alternatives, is there any concrete proposals for changes we could make in mirrormanager / metalink output?

Replying to [comment:25 kevin]:

So, what more can we do here?

1) There's discussion in ticket #4882 about timeouts getting the metalink after 30s (mostly from yum-cron/dnf automatic). I've not been able yet to islolate the cause of those. Do we want to make some change in dnf for this? Increase the timeout or retry (mirrors.fedoraproject.org is a round robin dns).

It depens whether you want to. If it would increase unavailibility of the servers in the long run or not. It could be slightly bigger than 30 sec, say 1 min? (dnf conf option has 30 sec. by default)

2) Aside from sorting the repomd.xml / alternatives, is there any concrete proposals for changes we could make in mirrormanager / metalink output?

Maybe sort mirrors by the most recent synced repodata.

Replying to [comment:26 jsilhan]:

It depens whether you want to. If it would increase unavailibility of the servers in the long run or not. It could be slightly bigger than 30 sec, say 1 min? (dnf conf option has 30 sec. by default)

Well, right now it resolves mirrors.fedoraproject.org and sends a request and waits 30s for a reply right? Could we instead have it: resolve mirrors.fedoraproject.org, send a request, wait 30s, then retry (and get another ip most likely) and wait another 30s?

2) Aside from sorting the repomd.xml / alternatives, is there any concrete proposals for changes we could make in mirrormanager / metalink output?

Maybe sort mirrors by the most recent synced repodata.

well, when new repodata appears in the database it will have just 1 (master mirror) synced. If we put that at the top everyone will mob it and it will run out of BW. As the crawler runs it could add more, but again, if there were say 3 mirrors synced if we put them at the top everyone would mob them. ;( So I guess there's a balance here between getting people up to date mirrors and not causing them undue problems with too many requests.

Replying to [comment:27 kevin]:

Well, right now it resolves mirrors.fedoraproject.org and sends a request and waits 30s for a reply right? Could we instead have it: resolve mirrors.fedoraproject.org, send a request, wait 30s, then retry (and get another ip most likely) and wait another 30s?

Now DNF has a timeout (30 sec) limit per mirror not per whole session. When it retries new mirror connection which times out it will try another one with another 30 sec period. IIUC this is what you want and no change is needed.

well, when new repodata appears in the database it will have just 1 (master mirror) synced. If we put that at the top everyone will mob it and it will run out of BW. As the crawler runs it could add more, but again, if there were say 3 mirrors synced if we put them at the top everyone would mob them. ;( So I guess there's a balance here between getting people up to date mirrors and not causing them undue problems with too many requests.

Then we will keep it as it is and you can close this ticket if there are no objections. Thanks.

ok. If you can think of anything further we can improve or change from the server side, please let us know in a new ticket, etc.

Login to comment on this ticket.

Metadata