NOTE
If your issue is for security or deals with sensitive info please mark it as private using the checkbox below.
CentOS Stream users, both CentOS Stream 10 and 9, have been affected by bad hashes being given by the mirrors. They are getting errors such as " - Downloading successful, but checksum doesn't match. Calculated: 09f9f3040a68ccceb349f04463cad23b3af1dc36ccf61c1fada52b9955a441742d8434dcb66f265924aed2d2b9dee3edaf16e0534fe475c6a5faf45db021d9a1(sha512) Expected: f3765f12125ff6f09532cd5e74a0ab0bf0b275aa28794c1171b11ed247ed46fe60a7869d2ddc4658e31ac155632b4def908fd473050e6f68c32322212b921f62(sha512)"
We verified that the our sync to the master mirror was correct, and for CentOS Stream 10, we had a new compose that we pushed out and that fixed the problems for CentOS Stream 10. But for CentOS Stream 9, we didn't have such a compose, and although we "re-pushed" our old compose, it didn't fix the problem.
So, I need two things.
First, can we get CentOS Stream 9 repo's fixed, especially for the aarch64 and s390x repositories.
Second, can we get some access to something, so that we can check that the mirror-manager has the right hash. This is so we don't have to "wait 12 hours and see if any users scream".
2024-10-16
Note: I now that sounds like a rush, but our CentOS Stream 9 aarch64/s390x users have been without repos for 2 days.
Related CentOS Stream issue: https://pagure.io/centos-infra/issue/1521
Here is a way to check: This is for CentOS Stream 9 aarch64, the sha256 sums do not match.
$ curl -s 'https://mirrors.centos.org/metalink?repo=centos-appstream-9-stream&arch=aarch64&protocol=https,http' | grep sha256 <hash type="sha256">546bd50f911c7f58c0cd4cccf94974276e3ef1e74d183248415a46b51c7af1b2</hash> $ curl -s https://mirror.stream.centos.org/9-stream/AppStream/aarch64/os/repodata/repomd.xml | sha256sum dc7f653bba0866e3407f925319825b6ae9bdfb87078dc1b4d60c807b5a66188c
This is for CentOS Stream 10 aarch64, today, the sha256 match>
$ curl -s 'https://mirrors.centos.org/metalink?repo=centos-appstream-10-stream&arch=aarch64&protocol=https,http' | grep sha256 <hash type="sha256">993f88cf2ed0a643e8bbce859f83c63af2c6ad0a85214cb3a9d7414af146abc3</hash> $ curl -s https://mirror.stream.centos.org/10-stream/AppStream/aarch64/os/repodata/repomd.xml | sha256sum 993f88cf2ed0a643e8bbce859f83c63af2c6ad0a85214cb3a9d7414af146abc3
Not sure you are looking for this but running:
curl -s "https://mirrors.fedoraproject.org/metalink?repo=centos-baseos-9-stream&arch=s390x" | head -15
will tell you what MirrorManager currently has in the database.
That does make things easier to check. After we push a compose to the correct mirror area, how long does it usually take before that database is updated?
ELN has similar problems and we discussed this just a couple of days ago.
According to https://pagure.io/fedora-infra/ansible/blob/main/f/vars/apps/mirrormanager.yml#_72 the primary mirror is checked every 15 minutes. I am not sure if we are scanning a public facing system or if we are scanning the content slightly before it is pushed to the public. Once the database has the new content the data is written to a cache file and pushed to the mirrorlist servers (I guess about 20 systems worldwide today). Updating the mirrorlist servers happens only once an hour (:20 after the hour).
The reason why we still do it only once an hour is that the old system, a couple of years ago, took almost 50 minutes to create the cachefile from the database. Today this is finished in 3 to 4 minutes, so we could do it much more often. We have not changed this, however, because it works pretty good most of the time. It could be decreased. But this means that after 60 minutes MirrorManager should be up to date.
If it takes longer to fix then something was not working as expected and for some reason MirrorManager did not pick up the changes from the primary mirror. This might mean manual changes somewhere.
Cool, and hour is a short enough time for us.
If we find that only some of the repo's are updated (That is what I'm guessing happened) what can we do to fix it?
I am currently trying to see how to access the new MirrorManager system to have a look why it fails.
In Fedora common reasons for failures like this was if an old version of the repository was restored and MirrorManager does not handle ctime going backward. But I guess you didn't copy an old version of the repository back.
ctime
I found the reason. Going to the internal mirror that MirrorManager is scanning I see the same checksum as in MirrorManager:
$ curl -s http://mref1-priv.iad2.centos.org/9-stream/AppStream/aarch64/os/repodata/repomd.xml | md5sum 8bdd17ea95090cd8561fee80ab09f4f9 - $ curl -s https://mirror.stream.centos.org/9-stream/AppStream/aarch64/os/repodata/repomd.xml | md5sum b4cc3852ad28bb615e515824316f569f -
So it looks like the data on the internal mirror we are scanning is different from the public data. The data on the public mirror is from October 14th. The internal mirror has data from October 16th.
I found the reason. Going to the internal mirror that MirrorManager is scanning I see the same checksum as in MirrorManager: $ curl -s http://mref1-priv.iad2.centos.org/9-stream/AppStream/aarch64/os/repodata/repomd.xml | md5sum 8bdd17ea95090cd8561fee80ab09f4f9 - $ curl -s https://mirror.stream.centos.org/9-stream/AppStream/aarch64/os/repodata/repomd.xml | md5sum b4cc3852ad28bb615e515824316f569f - So it looks like the data on the internal mirror we are scanning is different from the public data. The data on the public mirror is from October 14th. The internal mirror has data from October 16th.
That probably because I pushed a new compose to 9-stream right before I opened this ticket, and it hasn't made it to all the mirrors yet. I realize that it makes it hard to find the problem, but our users have been broken for 2 days and I needed it fixed as soon as possible.
Looking now, I see:
% curl -s https://mirror.stream.centos.org/9-stream/AppStream/aarch64/os/repodata/repomd.xml | md5sum
8bdd17ea95090cd8561fee80ab09f4f9 -
...so I assume this is temporarily fixed. Is there anything else infra. needs to do?
Although internal has changed, again (it's now d41d8cd98f00b204e9800998ecf8427e)
d41d8cd98f00b204e9800998ecf8427e
Metadata Update from @phsmoura: - Issue priority set to: Waiting on Reporter (was: Needs Review) - Issue tagged with: medium-gain, medium-trouble, ops
From my tracking of the CentOS Stream 9 new compose through the mirror system, I believe everything will be settled soon. Soon, is relative, since things take so long to propogate out.
I wanted to thank you for your information about the mirror database. It has allowed me to write a checking script so if this happens again, we (the CentOS Stream team) should be able to see the problem early, rather than having to wait 10-12 hours. That should allow us (CentOS Stream team and the Mirror team) to figure out what has gone wrong. And hopefully fix it before it affects end users.
Ping Ping Ping!!!
How long is it before we (CentOS Stream) push our repo's to the mirror-master and the correct hashes show up in the mirror database?
From the discussion above, it sounded like 15 minutes to possible an hour and a half.
I've been waiting and watching for 2 1/2 to 3 hours and I'm starting to get worried that we're going to have the same problem we had on Monday, but on a Friday.
Can someone look and see if the database update service is running?
A quick glance I see:
mm updated from the private mirror about 12min ago mm pushed an updated mirrorlist to mirrorlist servers 10min ago (but it might be the update missed that train and will be in the one in ~45m?)
When did the push to master repo happen on your end?
It depends on the timezone. But according to Jenkins the push ended 3 Hours and 15 Minutes ago.
Just so people know, at about the 4 hour mark, the database updated and everything was good.
I am hitting this today all the time from my region
❯ podman run --pull=always -it quay.io/centos/centos:stream9 dnf -y -v makecache Trying to pull quay.io/centos/centos:stream9... Getting image source signatures Copying blob 49248b282aaf skipped: already exists Copying config 0157981aad done | Writing manifest to image destination Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, groups-manager, needs-restarting, playground, repoclosure, repodiff, repograph, repomanage, reposync, system-upgrade DNF version: 4.14.0 cachedir: /var/cache/dnf Making cache files for all metadata files. baseos: has expired and will be refreshed. appstream: has expired and will be refreshed. extras-common: has expired and will be refreshed. repo: downloading from remote: baseos countme: no event for baseos: budget to spend: 2 error: Downloading successful, but checksum doesn't match. Calculated: 1e037ac2c89e1b1cccad4de51bf179186b11641b1ef0f6e5376cd243ca7020e731d794716a3db514ed7b582973ea882200488fbc5f4c4190c5545c98ff5c7432(sha512) Expected: f251fa02fa992c1929bcb5b743a430f6db42f8e4d5dc49ed400962659e39cc5fd7e5f3156fb8543a3488a30322a1480740917dfd7a5dca6bd4d115b73a5cac81(sha512) (http://ftp.sh.cvut.cz/centos-stream/9-stream/BaseOS/x86_64/os/repodata/repomd.xml). error: Downloading successful, but checksum doesn't match. Calculated: 1e037ac2c89e1b1cccad4de51bf179186b11641b1ef0f6e5376cd243ca7020e731d794716a3db514ed7b582973ea882200488fbc5f4c4190c5545c98ff5c7432(sha512) Expected: f251fa02fa992c1929bcb5b743a430f6db42f8e4d5dc49ed400962659e39cc5fd7e5f3156fb8543a3488a30322a1480740917dfd7a5dca6bd4d115b73a5cac81(sha512) (https://ftp.sh.cvut.cz/centos-stream/9-stream/BaseOS/x86_64/os/repodata/repomd.xml). error: Downloading successful, but checksum doesn't match. Calculated: 1e037ac2c89e1b1cccad4de51bf179186b11641b1ef0f6e5376cd243ca7020e731d794716a3db514ed7b582973ea882200488fbc5f4c4190c5545c98ff5c7432(sha512) Expected: f251fa02fa992c1929bcb5b743a430f6db42f8e4d5dc49ed400962659e39cc5fd7e5f3156fb8543a3488a30322a1480740917dfd7a5dca6bd4d115b73a5cac81(sha512) (http://ftp.fi.muni.cz/pub/linux/centos-stream/9-stream/BaseOS/x86_64/os/repodata/repomd.xml). error: Downloading successful, but checksum doesn't match. Calculated: 1e037ac2c89e1b1cccad4de51bf179186b11641b1ef0f6e5376cd243ca7020e731d794716a3db514ed7b582973ea882200488fbc5f4c4190c5545c98ff5c7432(sha512) Expected: f251fa02fa992c1929bcb5b743a430f6db42f8e4d5dc49ed400962659e39cc5fd7e5f3156fb8543a3488a30322a1480740917dfd7a5dca6bd4d115b73a5cac81(sha512) (https://ftp.fi.muni.cz/pub/linux/centos-stream/9-stream/BaseOS/x86_64/os/repodata/repomd.xml). error: Downloading successful, but checksum doesn't match. Calculated: 1e037ac2c89e1b1cccad4de51bf179186b11641b1ef0f6e5376cd243ca7020e731d794716a3db514ed7b582973ea882200488fbc5f4c4190c5545c98ff5c7432(sha512) Expected: f251fa02fa992c1929bcb5b743a430f6db42f8e4d5dc49ed400962659e39cc5fd7e5f3156fb8543a3488a30322a1480740917dfd7a5dca6bd4d115b73a5cac81(sha512) (https://mirror.karneval.cz/pub/linux/centos-stream/9-stream/BaseOS/x86_64/os/repodata/repomd.xml). error: Downloading successful, but checksum doesn't match. Calculated: 1e037ac2c89e1b1cccad4de51bf179186b11641b1ef0f6e5376cd243ca7020e731d794716a3db514ed7b582973ea882200488fbc5f4c4190c5545c98ff5c7432(sha512) Expected: f251fa02fa992c1929bcb5b743a430f6db42f8e4d5dc49ed400962659e39cc5fd7e5f3156fb8543a3488a30322a1480740917dfd7a5dca6bd4d115b73a5cac81(sha512) (http://linuxsoft.cern.ch/centos-stream/9-stream/BaseOS/x86_64/os/repodata/repomd.xml). error: Downloading successful, but checksum doesn't match. Calculated: 1e037ac2c89e1b1cccad4de51bf179186b11641b1ef0f6e5376cd243ca7020e731d794716a3db514ed7b582973ea882200488fbc5f4c4190c5545c98ff5c7432(sha512) Expected: f251fa02fa992c1929bcb5b743a430f6db42f8e4d5dc49ed400962659e39cc5fd7e5f3156fb8543a3488a30322a1480740917dfd7a5dca6bd4d115b73a5cac81(sha512) (https://linuxsoft.cern.ch/centos-stream/9-stream/BaseOS/x86_64/os/repodata/repomd.xml). error: Downloading successful, but checksum doesn't match. Calculated: 1e037ac2c89e1b1cccad4de51bf179186b11641b1ef0f6e5376cd243ca7020e731d794716a3db514ed7b582973ea882200488fbc5f4c4190c5545c98ff5c7432(sha512) Expected: f251fa02fa992c1929bcb5b743a430f6db42f8e4d5dc49ed400962659e39cc5fd7e5f3156fb8543a3488a30322a1480740917dfd7a5dca6bd4d115b73a5cac81(sha512) (https://mirror.netzwerge.de/centos-stream/9-stream/BaseOS/x86_64/os/repodata/repomd.xml).
This seems like issue with quay.io containers and not with Fedora mirrors.
So, I'm looking at the various delays we have here: - scanning the primary mirror: every 15 minutes, takes ~1m - updating the mirrorlist server: currently every 15 minutes, takes ~5m
So in theory it shouldn't take more than 35 minutes between the moment the fullfiletimelist file is updated on the primary mirror and the new checksums are served by the mirrorlist server. How could it go up to 4 hours?
fullfiletimelist
CentOS Stream is not using the fullfiletimelist method. It is checking http://${CENTOS_PRIMARY}/9-stream/COMPOSE_ID and http://mref1-priv.iad2.centos.org/SIGs/9-stream/COMPOSE_ID and only scans the primary mirror if that file has changed.
http://${CENTOS_PRIMARY}/9-stream/COMPOSE_ID
http://mref1-priv.iad2.centos.org/SIGs/9-stream/COMPOSE_ID
See /opt/scripts/primary-mirror-wrapper.sh.
/opt/scripts/primary-mirror-wrapper.sh
Is the openshift console the right place to look for the output of that script?
Oh OK good point, thanks. Well, it shouldn't take more than 35 minutes after that file has changed then. I'm assuming it changes only at the end of a push? Or can it change and the files still be in their old version?
Yes, you can look at the output of the jobs created by the primary-mirror-centos cronjob.
primary-mirror-centos
@zlopez what do you mean? it is having trouble in dnf?
@mvadkert
My guess is that the podman pull is confusing the issue.
podman pull
dnf makecache
@zlopez I asked @mvadkert to look for a ticket and add info as it was causing issues. The method they used was to try and show a repeatable way in case it was not a transient problem which was showing up outside of containers at the time.
That was just a minimal reproducer to show the problem from my location (CZ) in that time. It was a problem with dnf makecache (mirrors), I thought it is clear from the output :)
One thing I'm noticing here: the metalinks for Fedora consider the last three instances of the metadata that mirrormanager knows about to be 'valid'. e.g. right now the Fedora 41 x86_64 metalink has this block:
<file name="repomd.xml"> <mm0:timestamp>1729673406</mm0:timestamp> <size>5959</size> <verification> <hash type="md5">5ede176164d13163e9bafa0adf9a4e17</hash> <hash type="sha1">0c056f4a869e236f9db60781eff25210d7ddb4ff</hash> <hash type="sha256">e0385859ee68565e7e2975cbb230f120646c5da5c85e01769475113baa03f611</hash> <hash type="sha512">5b70dd4b1e01069a1c4747248e50c76be837d710c4485fc3cc2064d7be7c5c1cb59d732b3e6c816e916e67969cea07d6b2c40d168094b0961cc0f7230942647e</hash> </verification> <mm0:alternates> <mm0:alternate> <mm0:timestamp>1729586874</mm0:timestamp> <size>5959</size> <verification> <hash type="md5">eda7be1d490b4b703bc8417a5858b988</hash> <hash type="sha1">c923e11707d667d535df9d9cfdf01387dec5feeb</hash> <hash type="sha256">d3bf8bd6e4e65a684926b641c1cf7fcb15943f9de45c46e18141ac002e4b6115</hash> <hash type="sha512">a909bcf06fa3b8d451bd65847ec459da2f2dc7422b08c1658bbb8c3567779ef44759f5ea87de6ce4fc2edda5498752377c73e8efc7d29bb131938a3a9b08906e</hash> </verification> </mm0:alternate> <mm0:alternate> <mm0:timestamp>1729500389</mm0:timestamp> <size>5959</size> <verification> <hash type="md5">8a38282bdedf612b0e32063ac265c8b2</hash> <hash type="sha1">cbc2009fe80f3762d931e65690ad9311e52eba44</hash> <hash type="sha256">529d4f0e17b8cd57bf1ee6260013b457450509cbf67fb13bf001a555d0efa1db</hash> <hash type="sha512">fb8bb0ecef362d14847c2b56e34b9314779d9d2f05a0c583b1fb4deb3a2c4b924d438e3a033fc73b21d716c18e60b8a1a8d04faf1ed8e3b14fc4037480605f81</hash> </verification> </mm0:alternate> </mm0:alternates>
but the CentOS Stream metalinks don't seem to do this. e.g. right now the CentOS Appstream 9 metalink has this block:
<files> <file name="repomd.xml"> <mm0:timestamp>1729516899</mm0:timestamp> <size>4469</size> <verification> <hash type="md5">a45efcd05f426f7aaa169d8a3b595d50</hash> <hash type="sha1">2d03b4b6e48e705ad856e2eaa381a487822ed897</hash> <hash type="sha256">ab030d7065ef723f3ff1f82ed42201177f680115f63d4508bffe53e377fd72a7</hash> <hash type="sha512">2b24bee0edb01591a02e244effbfdc2ae441c91d416210232c9475d79b8fb39eb64c2e8b32efc04fcaf205a87c03884b45f9d16e0f4e309363fad02c6c857cd8</hash> </verification>
no 'alternates' as in the Fedora metalink. So effectively, for Stream, only the most recent metadata that mirrormanager knows about is considered 'valid'.
Is there a reason we're not considering the last two or three instances of the metadata valid as we do for Fedora?
Yes. The interdependency of BaseOS, AppStream and CRB breaks the possibility to use multiple checksums. We had this turned on in the beginning, but because DNF does not select one single mirror for all three repositories it is possible to have three different mirrors. The problems we have seen is with specific dependencies between repositories. glibc from BaseOS requires a specific version of glibc-devel from AppStream for example. That breaks with multiple checksums. Because DNF might select a mirror which is not up to date but valid from the metalink point of view.
Fedora does not have this problem as the main repository never changes and only the update repository changes.
ah, I see. mmmf. well, in that case, it seems critical to have mirrormanager only update the metadata when the new one is present on public mirrors...
So, can we make MM ignore the private mirror and only update when the primary public mirror updates?
That would mean any systems with the private mirror in their metalink response would not be able to use it in the time between it being updated, and the public mirror being updated and MM noticing. Those systems would fall through to using a public mirror at those times. But that seems better than the alternative, which is that any system which does not have the private mirror in its metalink response is completely broken during periods when the private mirror has been updated and MM has noticed, but no public mirror has yet updated.
That comes with the risk that, if all public mirrors update during the same 15 minute period between MM refreshes, nobody can use dnf till MM notices. But hopefully we have enough mirrors that that isn't likely.
Is there a reason we're not considering the last two or three instances of the metadata valid as we do for Fedora? Yes. The interdependency of BaseOS, AppStream and CRB breaks the possibility to use multiple checksums. We had this turned on in the beginning, but because DNF does not select one single mirror for all three repositories it is possible to have three different mirrors. The problems we have seen is with specific dependencies between repositories. glibc from BaseOS requires a specific version of glibc-devel from AppStream for example. That breaks with multiple checksums. Because DNF might select a mirror which is not up to date but valid from the metalink point of view. Fedora does not have this problem as the main repository never changes and only the update repository changes.
It might be worth getting some current DNF people in here, because this seems like something that would be much easier to solve if the only solution we have isn't pick one of these N bad options.
But some extra data, some of which is out of date because I'm old:
This has been a problem in Fedora, debuginfo repos. were the biggest problem (-updates and -updates-debuginfo being out of sync) and were probably the big motivation for abrt not using the repos. Other instances are/were source repos. and codec downloads.
Yum (and I assume DNF) won't downgrade the repo. metadata (unless you call clean explicitly), so while you can't get DNF to force upgrade AppStream after you've upgraded BaseOS it's not like they will ping/pong.
A lot of things can still be done if the repos. need to be in sync for some packages, but aren't in sync. ... nothing can be done if we only show one valid repo. metadata and it isn't available. Yes, the UX for "repos. aren't in sync and it caused a problem" is much worse but as we get closer to release and 666 things don't change each hour that gets hit less often.
Bringing in some dnf folks. CC @jmracek @mdomonko @jkolarik
Thanks for making it a DNF problem. This "problem" is not a news to DNF maintainers.
The situation is that DNF handles every repository independently because there is no relation between the repositories written anywhere. It's not also clear how the dependency should work (require the same server hostname, require the same repodata ID, require repodata ID not to be older).
Once the design would be clerar, one would need to enhance the YUM repository configuration file, implement it in librepo, DNF4, and DNF5, get the updates to CentOS distribution, deploy the changes to user's systems and to CentOS mirror master, benefit.
As you can see this is not thing you can achieve within hot-fix time frame. So I recommend not doing it a DNF problem and focus on improving the mirror manager reliability.
The idea was more that maybe you could offer advice on how best to solve the problem, esp. as you are apparently aware of the problem for a long time.
Once the design would be clear, one would need to...
You couldn't tie repos. that have the same <distro cpeid=XXX> data together?
You couldn't just look at the repomd timestamps and for any that are within 5m of each other look at multiple mirrors if one of them updates?
You couldn't at least spit out a better error message when only one of them updates and weird dependency problems make the install/upgrade command fail?
You couldn't spit out a generic error message saying multiple repos. might be a problem and automatically run "clean expire-cache" when a dependency problem occurs?
Assuming no changes are happening to dnf, I guess someone should look at what is different that this problem happens a lot more on s390x than x86_64 (I would assume it's number of mirrors or speed of updates).
I think that the problem window is larger then few minutes. In case that you sync diferent updates of dependendent repository you might need 48 updates for automatic resync, because by default DNF will consider repository as up to date (in is configurable in .repo file).
I believe that this is unreasonable from DNF side. What can help - distribute all related updates in one repository, but there are some drawbacks - Big repository requires more resources (download, HDD, RAM, CPU).
Crazy ideas - create new DNF/DNF5 plugins and in repository URLs use a variable. The plugin will ask a mirror manager for a hash and plugin will set variable prior repository loading. Similar solution is in use for some cloud providers to connect to correct region server. But there might be some related problems - It requires installation of that plugin on system (it cannot be a dependency of DNF/DNF5), connection to the mirror every time DNF runs (plus unknown sets of additional problems). You have to also ensure that all original URLs will continue to work (compatibility).
I've already wrote that: It must be defined how to recognize that a repository needs another repository and how to recognize that the dependent repository is out-dated.
These information should not depend on mirror manager because it's desired to work even if repositories are defined with a direct baseurl.
My idea was adding a new key to /etc/yum.repod.d files to define the relation and use /repomd/revision from repomd.xml for the age comparison. Examples:
CentOS appstream needs exactly the same baseos revision:
[baseos] baseurl=... [appstream] baseurl=... required_repo="=baseos"
EPEL needs enabling CRB
[epel] baseurl=... required_repo="crb"
Originally, I though that ">foo" would be useful for highly conservative projects which wants to be extra-sure that foo did not make an incompatible change. But that would be pretty obtrusive for frequently updated (foo) repositories.
The downside is that current meaning of /repomd/revision is undefined. The only thing we are sure is that it's a string. Fedora has a Unix time stamp there. CentOS Stream has fixed string there. We could add a completely new field to repomd.xml, but I think it would heavily overlap with the revision.
Once the design would be clear, one would need to... You couldn't tie repos. that have the same <distro cpeid=XXX> data together?
Good idea. I did not know about this field. However, I worry that there can be products with a system of base and updates repositories like Fedora, but with identical CPE ID in repomd.xml because naturally those are updates for the same product. Suddenly adding the age requirement would break these products.
You couldn't just look at the repomd timestamps
I don't trust file system timestamps. People do weird things with them. Moreover not all transports supports them (e.g. FTP). It reminds why web developers place versions into URL query parameters instead of relying on HTTP ETags.
and for any that are within 5m of each other look at multiple mirrors if one of them updates? Why 5 minutes? I checked an internal RHEL Pulp server and there is almost 4-minute difference between baseos and appstream. I don't like these magical constants. You couldn't at least spit out a better error message when only one of them updates and weird dependency problems make the install/upgrade command fail?
and for any that are within 5m of each other look at multiple mirrors if one of them updates?
Why 5 minutes? I checked an internal RHEL Pulp server and there is almost 4-minute difference between baseos and appstream. I don't like these magical constants.
That's sounds overly complicated. How would that differ from a situation when CentOS delivered 3 acceptable hashes?
Wouldn't be people tired of the suggestion on every broken dependency?
I admit that the /repomd/tags/distro/@cpeid is appealing to me, especially if we only focus on CentOS where we can tell people to run crearerepo_c with the right --revision option.
I don't trust file system timestamps.
I meant the number within the <timestamp> node of the repomd.xml file, which although it is from the filesystem at some point doesn't require trusting the users filesystem.
I'm a bit lost here. Is there something we can actually do? Or will it need some new work upstream from dnf?
Or has it not been a problem in the last month?
Ping! Ping! Ping! I've waited several hours for this to clear up, but it isn't, and I'm worried we're about ready to break the EPEL 10 buildroot.
Some of the CentOS Stream 10 repodata is out of sync on the mirrors. The biggest problem is that we just had an update to rpm. rpm is in BaseOS and rpm-build is on AppStream. For both x86_64 and aarch64, one repo is synced correctly, and one isn't. Oddly enough, it's different on each repo. Thus, rpm-build cannot be installed on x86_64.
I don't care whose fault it is, but can this be fixed before it goes into the EPEL 10 buildroot?
# x86_64 appstream $ curl -s 'https://mirrors.centos.org/metalink?repo=centos-appstream-10-stream&arch=x86_64&protocol=https,http' | grep sha256 <hash type="sha256">4928194ec1d1a87b6075c9dc6f2184b05e87b4de6bc85ee449b908985c04caf6</hash> $ curl -s https://mirror.stream.centos.org/10-stream/AppStream/x86_64/os/repodata/repomd.xml | sha256sum 4928194ec1d1a87b6075c9dc6f2184b05e87b4de6bc85ee449b908985c04caf6 # x86_64 baseos $ curl -s 'https://mirrors.centos.org/metalink?repo=centos-baseos-10-stream&arch=x86_64&protocol=https,http' | grep sha256 <hash type="sha256">9bf4875627ea3947b7af864e7a9bd874bff515951e96532efde3ee59e780e2af</hash> $ curl -s https://mirror.stream.centos.org/10-stream/BaseOS/x86_64/os/repodata/repomd.xml | sha256sum aab1d3e70df94318b3d1acc8a91f38e64bfa2b40833bbe6e5f9dfb915042c6e2 # aarch64 appstream $ curl -s 'https://mirrors.centos.org/metalink?repo=centos-appstream-10-stream&arch=aarch64&protocol=https,http' | grep sha256 <hash type="sha256">15a30b82388edf5e47b6ef5593e788b3d87f1830f5de36d7501fc7a8775dcccb</hash> $ curl -s https://mirror.stream.centos.org/10-stream/AppStream/x86_64/os/repodata/repomd.xml | sha256sum 4928194ec1d1a87b6075c9dc6f2184b05e87b4de6bc85ee449b908985c04caf6 # aarch64 baseos $ curl -s 'https://mirrors.centos.org/metalink?repo=centos-baseos-10-stream&arch=aarch64&protocol=https,http' | grep sha256 <hash type="sha256">53d3260477e4c7bb818a32262c6c643344eb93a7a3d875e476275963c759fb39</hash> $ curl -s https://mirror.stream.centos.org/10-stream/BaseOS/aarch64/os/repodata/repomd.xml | sha256sum 53d3260477e4c7bb818a32262c6c643344eb93a7a3d875e476275963c759fb39
Looking at it now.
Rescanned primary mirror and pushing changes to mirrorlist servers now.
Should be fixed now.
@adrian : Thanks, I confirm that it has now up2date baseos/appstream repodata. Now I'd like to ask you : were the crawlers stuck on something ? (reason why it wasn't updating mirrormanager DB and so was still relying on cached - and so wrong - metadata )
as this problem is recurring, would be good to work towards a long term solution for this problem, as it can affect a lot of people and break CI.
Now I'd like to ask you : were the crawlers stuck on something ? (reason why it wasn't updating mirrormanager DB and so was still relying on cached - and so wrong - metadata )
Unfortunately I do not know where the OpenShift logs are stored to have a look at the previous runs. So I don't really know why it was stuck.
I looked at the logs (on log01 but it's not easy to extract), the script was still not detecting any changes at 13:57:15, but it did detect changes before, at 13:12:06. I'd like to ask a question: does the COMPOSE_ID file only change at the end of a push? Or can it change and the files still be in their old version while the push is happening? If that the case, we could have mirrormanager detecting a change in COMPOSE_ID and scanning old files, and then not detecting the new files until the COMPOSE_ID changes again, which could explain the delays.
log01
13:57:15
13:12:06
COMPOSE_ID
I just double checked. COMPOSE_ID is not excluded and done separately at the end. So it is possible that it is updated before everything else is finished. That would explain why this happens so often. I will update our sync script so COMPOSE_ID only changes at the end.
Is there a difference between 9 and 10 in the sync scripts? Using --delay-updates should already help here. Don't we also have file marking that an update is happening. I thought we included a file that exists during sync and is removed at the end.
The CentOS Stream syncs always put a file .sync_in_progress in the base directory before the sync to the mirrors, and take it out after the sync. For both 9 and 10. While I thought we also did the COMPOSE_ID last as well, it turns out we only did that when doing our update from the compose to our staging area.
.sync_in_progress
The script to scan the primary mirror checks for .sync_in_progress. So we should not scan the primary mirror while being updated.
Actually, the script is only checking for the presence of http://${CENTOS_PRIMARY/9-stream/.sync_in_progress. Is it the right place? It is the same file for the CentOS 10 Stream pushes? I would guess it's a different one for 10.
http://${CENTOS_PRIMARY/9-stream/.sync_in_progress
Right, also the COMPOSE_ID is also only looked at in http://${CENTOS_PRIMARY}/9-stream/COMPOSE_ID. Good point. We need to check both files. The scanning script is basically only 9-stream aware.
Ah yeah that may be the cause of all this. I'll update the script.
OK it's updated, hopefully this'll solve this entire thing. I have currently hardcoded versions 9 and 10 in the script, is there a place where I could get all the current versions of CentOS Stream? More reliably than parsing the output of Apache's index page on the mirror I mean.
@abompard : Stream is "slow" moving if you want to compare with Fedora releases, so Stream 11 will not be appearing before a long time (when some Fedora release will be branched for next RHEL, aka 11) .. There is no api to check for releases, but just wanted to add that Stream distributions themselves are pushed to root dir, there are other composes for the community builds under SIGs : https://mirror.stream.centos.org/SIGs/ .. and COMPOSE_ID files are incremented/updated each time there is a push to one of these SIGs directories (building for 8/9/9-stream/10-stream - and sooner 10) .. I don't know if that's also another script to modify at your side
Yeah the script is already checking those SIGs subdirectories.
An alternative would be to add a mention of this script in the CentOS Stream releng docs, because I'm pretty sure we're going to forget about it when that happens.
So, I think this is all solved until stream 11 right?
So, closing. Please reopen if there's anything left to do here.
Metadata Update from @kevin: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.