#12571 centos-resilientstorage-9-stream sha512 mismatch
Closed: Fixed 6 months ago by adrian. Opened 9 months ago by dupondje.

Describe what you would like us to do:

CentOS Stream 9 - ResilientStorage              929  B/s | 3.9 kB     00:04    
Errors during downloading metadata for repository 'resilientstorage':
  - Downloading successful, but checksum doesn't match. Calculated: 45c8d2dfc4fe245817d3f5ce5be99505a841b1d328ef14c12ba3b789277bf271a2e36cb981ce66cd0c30f652670504f40907dc05aa8bc9858d506b1989093846(sha512)  Expected: 17ef9ff2c8fc04667ed55a33dc02476a59e236e7da7f19d86c586e2fbab75fa1bd7b1e68d1a1ad4657e84c2791f5f0bf78f30be53db8b5eb7fc83227c819185f(sha512) 
  - Downloading successful, but checksum doesn't match. Calculated: 87c76b87fe36bfa60fcd8edbc52934559d8af59850efa3f0c04904e0a757d35d9d9bc0edb6114126971709afa4670d77d2de501c949a4f5e5b6144409f7efa77(sha512)  Expected: 17ef9ff2c8fc04667ed55a33dc02476a59e236e7da7f19d86c586e2fbab75fa1bd7b1e68d1a1ad4657e84c2791f5f0bf78f30be53db8b5eb7fc83227c819185f(sha512) 
Error: Failed to download metadata for repo 'resilientstorage': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

It seems like the hash returned by https://mirrors.centos.org/metalink?repo=centos-resilientstorage-9-stream&arch=x86_64&protocol=https,http is not matching the correct hash:

$ wget -qO - https://mirror.stream.centos.org/9-stream/ResilientStorage/x86_64/os/repodata/repomd.xml |sha512sum 
87c76b87fe36bfa60fcd8edbc52934559d8af59850efa3f0c04904e0a757d35d9d9bc0edb6114126971709afa4670d77d2de501c949a4f5e5b6144409f7efa77  -

I'm not even sure what this repository is. Is that something provided by CentOS or some third party repository?

This is a repository from the main CentOS Stream with packages which don't fit into the 'BaseOS/AppStream/CRB' set but built inside of the Stream system. What seems to be happening is that mirrormanager keeps getting the wrong repodata stored for these repositories. Possible reasons I can think of:
It scans this repository less often and so keeps older versions longer. This repository may update more often than other CS repos and a default scan time is not catching it.
It is keeping an older version stuck in its db for some reason.
It is looking in the wrong place for the data.
Some other tool is feeding it the wrong data for some reason.

It's been broken since at least 24hours now btw.

there were other reports on internal Slack about same issue but for other repositories.
I just checked at the source and the checksum 87c76b87fe36bfa60fcd8edbc52934559d8af59850efa3f0c04904e0a757d35d9d9bc0edb6114126971709afa4670d77d2de501c949a4f5e5b6144409f7efa77 is the correct one, so it seems Fedora mirrormanager is still living on cached info or something else ?

Metadata Update from @james:
- Issue tagged with: medium-gain, medium-trouble

9 months ago

Metadata Update from @james:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

9 months ago

Also hit similar issue with metadata for repository 'baseos-source':

bash-5.1# dnf builddep -y rust-bootupd
enabling baseos-source repository
enabling appstream-source repository
enabling extras-common-source repository
CentOS Stream 9 - BaseOS                                                                                                                     4.0 MB/s | 8.7 MB     00:02    
CentOS Stream 9 - BaseOS - Source                                                                                                            386  B/s | 3.0 kB     00:08    
Errors during downloading metadata for repository 'baseos-source':
  - Downloading successful, but checksum doesn't match. Calculated: 886c66d8a1a66f7788984c019771daa65cb052f84bc2d16169f941b4df992a205e923aa1fba0aa0fe59440d086099168f7815551b1fbcee3e43053511bef93c0(sha512)  Expected: a3535f07216e2f1b7491ea4ec89c61a7b460659e9ad6249089087b623774e54dbb9b6b25f33fc04b809052210d917fda7e61530c9206af0d8dd88f015330e3dc(sha512) 
Error: Failed to download metadata for repo 'baseos-source': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

The sha512sum in the DB is correct, the mirrorlist cache may not have been distributed to the proxies correctly, I'll check that.

I fixed it. I was not aware that something was not working correctly.

Oh, OK, what did you fix Adrian?
Btw the mirrorlist is now responding with the correct sha512.

I updated the checksums in the database. The database had the wrong information. I set the ctime of all stream directories to 0 and then the script re-detected all the correct information.

is there a possibility to add some monitoring at the fedora side to see if there is some delta between "in DB" checksum and what's available on the "proxies" ? .. that would be awesome as it's not the first time users are reporting issues like this so adding some monitoring endpoints would be appreciated :)

I updated the checksums in the database. The database had the wrong information. I set the ctime of all stream directories to 0 and then the script re-detected all the correct information.

OK, thanks! Any idea why it had the wrong information? Would that be in the scan-primary-mirror log?
Is this an operation that we can end up doing multiple times in the future? If so I could write a script to automate it.

I updated the checksums in the database. The database had the wrong information. I set the ctime of all stream directories to 0 and then the script re-detected all the correct information.

OK, thanks! Any idea why it had the wrong information? Would that be in the scan-primary-mirror log?

I haven't checked the logs. It should be probably in some logs. I would expect some race condition which updates the timestamps in the database while the checksums have not updated. There are already lot's of checks in the script to avoid such conditions, but it seems we missed another one. Maybe if we run a scan and while we scan the primary mirror is being updated.

Is this an operation that we can end up doing multiple times in the future? If so I could write a script to automate it.

There are a couple of options. Not sure what is the best way. Many things are because the Fedora trees are so huge. One example is, if the ctime has not changed we do not look any further because looking at all files for Fedora is just too expensive. For CentOS it is not that expensive. We could have a mode that always reads all checksums. Not very elegant.

Or we could have an outside check that goes over all CentOS checksums and compares them with the metalink values. If that happens we reset all ctimes in the database and the next scan will fix it.

Metadata Update from @adrian:
- Issue untagged with: medium-gain, medium-trouble

9 months ago

Metadata Update from @zlopez:
- Issue tagged with: medium-gain, medium-trouble

7 months ago

So, I guess this hasn't happened again... perhaps improvements could be tracked upstream?

Metadata Update from @kevin:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

7 months ago

The issue occurs again now unfortunately.

Metadata Update from @dupondje:
- Issue status updated to: Open (was: Closed)

6 months ago

@adrian , could you please write down here the SQL query you used to set the ctimes to zero? Thanks.

It has been suggested that the centos-9 images should be built to use mirror.stream.centos.org instead of mirror.centos.org

SQL command: update directory set ctime=0 where name like '9-stream/%';

update command in openshift: scan-primary-mirror -c /etc/mirrormanager/scan-primary-mirror-centos.toml -d --category CentOS

Should be correct in about 30 minutes.

Metadata Update from @adrian:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

6 months ago

Log in to comment on this ticket.

Metadata