#10801 mirrormanager doesn't return correct metalink file for 9-stream
Closed: Fixed 2 years ago by kevin. Opened 2 years ago by markmartirosian.

same as #10279

error: Downloading successful, but checksum doesn't match. Calculated: 92f44febb5fbf0f731af498feefb653d5b516e7184a23ab6e63b71c1faa43f18cb10a39bea249ee9467dfc010a51acd9c0314dcd69ea5e1d1e86c5bd48ed94db(sha512)  Expected: 7ce2adbd012d321cabb2bf28fb3fa67e4b31f8f1f21e91643b8b79d8ea18215711e6eadb72f792e1eeff2a39bb22346eb9594078b824ebc45599962a431995e2(sha512)  (http://mirror.siena.edu/centos-stream/9-stream/BaseOS/aarch64/os/repodata/repomd.xml).
error: Downloading successful, but checksum doesn't match. Calculated: 92f44febb5fbf0f731af498feefb653d5b516e7184a23ab6e63b71c1faa43f18cb10a39bea249ee9467dfc010a51acd9c0314dcd69ea5e1d1e86c5bd48ed94db(sha512)  Expected: 7ce2adbd012d321cabb2bf28fb3fa67e4b31f8f1f21e91643b8b79d8ea18215711e6eadb72f792e1eeff2a39bb22346eb9594078b824ebc45599962a431995e2(sha512)  (http://mirror.shastacoe.net/centos-stream/9-stream/BaseOS/aarch64/os/repodata/repomd.xml).
error: Downloading successful, but checksum doesn't match. Calculated: 92f44febb5fbf0f731af498feefb653d5b516e7184a23ab6e63b71c1faa43f18cb10a39bea249ee9467dfc010a51acd9c0314dcd69ea5e1d1e86c5bd48ed94db(sha512)  Expected: 7ce2adbd012d321cabb2bf28fb3fa67e4b31f8f1f21e91643b8b79d8ea18215711e6eadb72f792e1eeff2a39bb22346eb9594078b824ebc45599962a431995e2(sha512)  (https://mirror.shastacoe.net/centos-stream/9-stream/BaseOS/aarch64/os/repodata/repomd.xml).
error: Downloading successful, but checksum doesn't match. Calculated: 92f44febb5fbf0f731af498feefb653d5b516e7184a23ab6e63b71c1faa43f18cb10a39bea249ee9467dfc010a51acd9c0314dcd69ea5e1d1e86c5bd48ed94db(sha512)  Expected: 7ce2adbd012d321cabb2bf28fb3fa67e4b31f8f1f21e91643b8b79d8ea18215711e6eadb72f792e1eeff2a39bb22346eb9594078b824ebc45599962a431995e2(sha512)  (http://mirror.net.cen.ct.gov/centos-stream/9-stream/BaseOS/aarch64/os/repodata/repomd.xml).
$ curl --silent http://mirror.stream.centos.org/9-stream/BaseOS/aarch64/os/repodata/repomd.xml|sha512sum
92f44febb5fbf0f731af498feefb653d5b516e7184a23ab6e63b71c1faa43f18cb10a39bea249ee9467dfc010a51acd9c0314dcd69ea5e1d1e86c5bd48ed94db  -

cc @bstinson @adrian @carlgeorge @kevin


We might need Adrian to take a look here.
We pushed this morning using the expected sync_in_progress directory stamp.

OK something is odd with DNS and what is being presented. If I do this from the mirrormanager crawlers I get:

[root@mm-crawler02 crawler][PROD-IAD2]# curl --silent http://mirror.stream.centos.org/9-stream/BaseOS/aarch64/os/repodata/repomd.xml|sha512sum
7ce2adbd012d321cabb2bf28fb3fa67e4b31f8f1f21e91643b8b79d8ea18215711e6eadb72f792e1eeff2a39bb22346eb9594078b824ebc45599962a431995e2  -
[root@mm-crawler02 crawler][PROD-IAD2]# date
Tue  5 Jul 21:11:18 UTC 2022
# host mirror.stream.centos.org
mirror.stream.centos.org has address 136.144.60.219

so mirrormanager is expecting the one starting with 7c and not the one starting with 92

Seems like it's fixed now. Seeing a lot of 404s, but eventually a working mirror is found.

#8 204.8 [MIRROR] tar-1.34-5.el9.aarch64.rpm: Status code: 404 for http://mirror.net.cen.ct.gov/centos-stream/9-stream/BaseOS/aarch64/os/Packages/tar-1.34-5.el9.aarch64.rpm (IP: 72.10.120.178)

At this point I think I understand all problems we currently have with the MirrorManager Stream setup but I have no real idea how solve it. The core of the problem is that the repository inter-dependency like we have wit baseos, appstream, crb is something which our tools cannot really handle. On the MirrorManager side not and also not on the DNF side.

Another problem is, that whatever the reason for the problem is, the user sees the same error message: checksums do not match.

Today's error was happening because @arrfab and I tried to improve things for the SIG content. SIG content had the same checksum problem because MirrorManager took a couple of hours to pick up changes. So we are now scanning the primary mirror more often and we are scanning the real primary mirror. Unfortunately we are now in a situation where MirrorManager knows about new content too early. Long before it hits the mirrors. Previously it was the other way round.

We are basically facing a big synchronization problem. How to we synchronize data distribution world wide. This is probably the reason many projects are using less and less public mirrors and are relying on CDNs.

I will now switch MirrorManager to scan not the primary primary mirror, but the previous primary mirror. But more often. So maybe we will not get new content (checksums) in the database too early.

I still have a couple of ideas how to improve the situation, but it never will be the perfect solution.

If DNF would have a way to detect if the current baseos, appstream and crb belong to each other that would make our live also a bit easier. Maybe that is also something that can be worked on.

Metadata Update from @mobrien:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: high-gain, high-trouble, mirrorlists

2 years ago

I don't understand how DNF could help you. Moreover, DNF can barely help. It only sees repositories as independent entities. There is not relation between them. Whatever URL DNF gets it will fetch it.

I don't understand how DNF could help you. Moreover, DNF can barely help. It only sees repositories as independent entities. There is not relation between them. Whatever URL DNF gets it will fetch it.

That was kind of my point. If there would be something telling DNF which repository depends on which version of another repository. It would be something completely new. The repodata would need to include the checksums of other repositories. So if the metalink offers multiple checksums, DNF would know which ones can be combined.

I think I see what adrian is saying, but the fix needs to go into the repository metadata.

repository metadata strawman which is so thin you can tear it apart without teeth:

repo_uuid: 88404d0d-ef19-40b7-a1b5-eae050fc4c07
repo_signature: [some sort of key which says this repo is this id]
repo_signature_chain_url: [blah]
repo_timestamp:
repo_associated_with:
- [repo_uuid of baseos]
-- repo_sha256 to use
- [repo_uuid of appstream]
-- repo_sha256 to use
- [repo_uuid of crb]
-- repo_sha256 to use
- [other ones which could be used also]
repo_requires:
- repo_uuid of baseos

then dnf can go through and say

EPEL needs CRB, do I know that? 
No, tell them we can't acrtivate this repo. 
Yes, is it enabled? No? Can it be enabled
etc etc

mirror crawlers can go through and say 'oh all those are the right sigs, they are in sync'

I think I see what adrian is saying, but the fix needs to go into the repository metadata.

Yes. Thanks for the clarification. The idea was that if we can track repository dependencies on the DNF level we can be more flexible with what we return in the metalink. Being more flexible in the metalink means we do not need to perfectly synchronize the mirror update and the metalink update. It is just one idea how to avoid the problems we currently have.

It looks like you are moving the problem of freshness of a mirror from mirrormanager to DNF. So far mirrormanager sent the latest hash to prevent DNF from using outdated repositories (as a security measure against denying updates to clients). If you want to change the paradigm, I recommend talking to DNF (@jmracek) and librepo (@tmlcoch) people.

Hello,
I understand that synchronization problem of mirrors and related repositories is quite important but moving the problem to DNF is somehow not right to me. I can also suggest that content of dependent repositories can be merged into one (the easiest and working solution), but intermediately I can reject that idea because there are good and multiple reasons of splitting packages into multiple repositories including download size requirements, resources required to process repositories and a different update strategy and update time for particular repositories. If one repository will contain a checksum of the second one then any update of one repository will require an update of the second one. In case that someone else will modify one of dependent repositories then the whole chain will stop to work. In short I see your point but the proposed solution sounds to me somehow not right and it tries to fixed complex problem on side where cannot be solved.

@ppisar no I am talking more belt and suspenders. mirrormanager has to use various 'short-cuts' to confirm that a mirror is up2date because trying to walk over 4 to 80TB per mirror of data over a thousand or so servers would take much longer than a day and be a bigger drag on each mirrors io than the clients. Some of those short cuts use rsync to time/datestamp various files in comparison to the master and some of them I believe use dnf itself to confirm that things are aligned. Most of the time those short-cuts work but when they don't we are sending clients data which isn't correct.

dnf and Gnome Software itself has this problem if a site has its own repositories but not all of them have been properly configured. [Customers may have Appstream turned on but BaseOS turned off. Its rare but it does happen.. more likely they have something like EPEL but not CRB.]

This could be used to help cut down those problems.. it could also make it worse which is why yes having tmloch and jmracek on it is a good idea.

So, whats the status here? Should we close this now or ?

I'm going to go ahead and close this... I don't think we can fix the 'linkage' problem, it's just a balancing act. :(

Please re-open if there's more that we can do here.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata