Learn more about these different git repos.
Other Git URLs
We have an issue on Bodhi where sometimes automatic updates fail to get the changelog in notes because for unknown reason Koji's getRPMHeaders is returning an empty result.
This happens sometimes with real user made builds, while automatically generated test packages like dummy-test-package-gloster are always missing the changelog.
https://github.com/fedora-infra/bodhi/issues/4146 is an old log entry of the problem, we're now handling the ValueError case by ignoring generating the changelog, but that causes bugs referenced in the changelog to not be added to the update.
The code responsible for downloading the headers from Koji is at https://github.com/fedora-infra/bodhi/blob/f7d80356e5a8333bd790d84e1aff2d51af3c9186/bodhi/server/util.py#L68-L103 As you can see, Koji is not throwing any exception, or the outcome would be different. It is reporting an empty result and accordingly to Koji code that means it is not finding the relevant build.
Usually an automatic update is created only few seconds after the build completion time. Can a too short delay be the cause? I see updates created with the same delay (3 seconds) having the changelog, while others haven't. Another thing to mention is that manually running some time later get_rpm_headers or get_changelog for the same buggy build returns the correct output.
get_rpm_headers
get_changelog
Metadata Update from @tkopecek: - Custom field Size adjusted to None - Issue tagged with: bug
Metadata Update from @tkopecek: - Issue set to the milestone: 1.28
I would expect that it could be caused by some NFS configuration. I've not seen such an issue in any other koji instance. I would try to update bodhi's get_rpm_header to 1) retry also on empty result (not only exception) 2) add some short sleep before retry (if it is possible)
get_rpm_header
Filed https://github.com/fedora-infra/bodhi/pull/4326
It will take some time to get it applied to prod, though.
Today I noticed one problem with the build dummy-test-package-gloster-0-6961.fc36.
dummy-test-package-gloster-0-6961.fc36
It took 10 seconds (2 retries, since I've put a 5 second sleep) from receiving the build tagged message before Koji was able to return RPM headers:
2022-01-30 14:57:26,061 INFO [bodhi][PoolThread-twisted.internet.reactor-1] dummy-test-package-gloster-0-6961.fc36 tagged into f36-updates-candidate 2022-01-30 14:57:26,614 WARNI [bodhi][PoolThread-twisted.internet.reactor-1] No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36' 2022-01-30 14:57:31,621 ERROR [bodhi][PoolThread-twisted.internet.reactor-1] No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36': Unable to handle message in Automatic Update handler: Id: 50f69b81-72a2-44a2-9336-88e2b276ec9a Topic: org.fedoraproject.prod.buildsys.tag Headers: { "fedora_messaging_schema": "base.message", "fedora_messaging_severity": 20, "sent-at": "2022-01-30T14:57:26+00:00" } Body: { "build_id": 1910199, "instance": "primary", "name": "dummy-test-package-gloster", "owner": "packagerbot/os-master02.iad2.fedoraproject.org", "release": "6961.fc36", "tag": "f36-updates-candidate", "tag_id": 44419, "user": "packagerbot/os-master02.iad2.fedoraproject.org", "version": "0" } Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/__init__.py", line 81, in __call__ handler_info.handler(msg) File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/automatic_updates.py", line 151, in __call__ changelog = build.get_changelog(lastupdate=True) File "/usr/lib/python3.9/site-packages/bodhi/server/models.py", line 1759, in get_changelog rpm_header = get_rpm_header(self.nvr) File "/usr/lib/python3.9/site-packages/bodhi/server/util.py", line 103, in get_rpm_header raise ValueError("No rpm headers found in koji for %r" % nvr) ValueError: No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36' 2022-01-30 14:57:31,736 WARNI [fedora_messaging.twisted.consumer][MainThread] Returning message id 50f69b81-72a2-44a2-9336-88e2b276ec9a to the queue 2022-01-30 14:57:31,739 INFO [fedora_messaging.twisted.consumer][MainThread] Consuming message from topic org.fedoraproject.prod.buildsys.tag (message id 50f69b81-72a2-44a2-9336-88e2b276ec9a) 2022-01-30 14:57:31,739 INFO [bodhi][PoolThread-twisted.internet.reactor-0] Received message from fedora-messaging with topic: org.fedoraproject.prod.buildsys.tag 2022-01-30 14:57:31,740 INFO [bodhi][PoolThread-twisted.internet.reactor-0] dummy-test-package-gloster-0-6961.fc36 tagged into f36-updates-candidate 2022-01-30 14:57:31,875 WARNI [bodhi][PoolThread-twisted.internet.reactor-0] No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36' 2022-01-30 14:57:36,880 ERROR [bodhi][PoolThread-twisted.internet.reactor-0] No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36': Unable to handle message in Automatic Update handler: Id: 50f69b81-72a2-44a2-9336-88e2b276ec9a Topic: org.fedoraproject.prod.buildsys.tag Headers: { "fedora_messaging_schema": "base.message", "fedora_messaging_severity": 20, "sent-at": "2022-01-30T14:57:26+00:00" } Body: { "build_id": 1910199, "instance": "primary", "name": "dummy-test-package-gloster", "owner": "packagerbot/os-master02.iad2.fedoraproject.org", "release": "6961.fc36", "tag": "f36-updates-candidate", "tag_id": 44419, "user": "packagerbot/os-master02.iad2.fedoraproject.org", "version": "0" } Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/__init__.py", line 81, in __call__ handler_info.handler(msg) File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/automatic_updates.py", line 151, in __call__ changelog = build.get_changelog(lastupdate=True) File "/usr/lib/python3.9/site-packages/bodhi/server/models.py", line 1759, in get_changelog rpm_header = get_rpm_header(self.nvr) File "/usr/lib/python3.9/site-packages/bodhi/server/util.py", line 103, in get_rpm_header raise ValueError("No rpm headers found in koji for %r" % nvr) ValueError: No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36' 2022-01-30 14:57:36,898 WARNI [fedora_messaging.twisted.consumer][MainThread] Returning message id 50f69b81-72a2-44a2-9336-88e2b276ec9a to the queue 2022-01-30 14:57:36,901 INFO [fedora_messaging.twisted.consumer][MainThread] Consuming message from topic org.fedoraproject.prod.buildsys.tag (message id 50f69b81-72a2-44a2-9336-88e2b276ec9a) 2022-01-30 14:57:36,901 INFO [bodhi][PoolThread-twisted.internet.reactor-2] Received message from fedora-messaging with topic: org.fedoraproject.prod.buildsys.tag 2022-01-30 14:57:36,902 INFO [bodhi][PoolThread-twisted.internet.reactor-2] dummy-test-package-gloster-0-6961.fc36 tagged into f36-updates-candidate 2022-01-30 14:57:37,582 INFO [fedora_messaging.twisted.consumer][MainThread] Successfully consumed message from topic org.fedoraproject.prod.buildsys.tag (message id 50f69b81-72a2-44a2-9336-88e2b276ec9a)
@kevin Could this be related to fedora's NFS volume setup?
Metadata Update from @tkopecek: - Issue set to the milestone: None (was: 1.28)
Metadata Update from @tkopecek: - Issue untagged with: bug
I wouldn't see how... there's no errors I am aware of.
File: /mnt/koji/packages/dummy-test-package-gloster/0/6961.fc36/noarch/dummy-test-package-gloster-0-6961.fc36.noarch.rpm Size: 433228 Blocks: 856 IO Block: 65536 regular file Device: 32h/50d Inode: 1469603766 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 48/ apache) Gid: ( 48/ apache) Context: system_u:object_r:nfs_t:s0 Access: 2022-01-30 23:49:27.094831000 +0000 Modify: 2022-01-30 14:56:49.366298000 +0000 Change: 2022-01-30 14:57:24.344221000 +0000 Birth: -
@mikem Any idea? It looks weird as koji already written that file before sending the message. I'm not sure how we can get to state that the same hub can't see that file for a few seconds after that.
This reminds me of issues we had years ago before the fedmsg plugin deferred sending until postCommit, but the plugin is definitely doing that now.
(side note: it looks like the fedmsg plugin on koji.fp.o does not match the code from https://pagure.io/koji-fedmsg-plugin, and isn't part of an rpm. How is this plugin managed?) @kevin?
As I understand it, you have two (or more?) hub systems behind a proxy. Best guess is that there is some occasional lag between one hub writing a file and the other hub being able to see it. The first call failure from the recent example given by @mattia is about about 2 seconds after the build completion timestamp, which is very close to both the time of the write and the time of the message sending.
I don't think it makes sense to add a seconds long retry to the hub code for this.
A trivially naive experiment detects no lag between writing a file to scratch on koji01 and reading it on koji02, but I suspect that if it is an fs lag issue, it is probably random and intermittent, likely triggered by heavy io load.
In the getRPMHeaders code, there appear to be only a few ways that the code could return an empty result.
getRPMHeaders
headers=[]
I believe that the missing file is the most likely, but it would be good to confirm. Perhaps we could stick some log lines in the handler for debugging?
It would also be really helpful to understand how often this is happening. Between this issue and the linked bodhi one from 2020, I can see two examples.
Yeah, the fedmsg plugin is deployed directly from ansible: https://pagure.io/fedora-infra/ansible/blob/main/f/roles/koji_hub/templates/fedmsg-koji-plugin.py we try and keep that other one in sync as a 'upstream', but sometimes that fails. ;(
I can actually see more info on frequency because bodhi sends an error when it hits this.
I saw this problem last on 2020-12-01 Then it started again on 2022-01-26
There seems to be between 0 and 5 of these a day.
The first one:
2022-01-26 14:11:59,470 ERROR [bodhi][PoolThread-twisted.internet.reactor-1] No rpm headers found in koji for 'fapolicyd-1.1-1.fc36': Unable to handle message in Automatic Update handler: Id: d1aae630-9ff0-4647-8841-b7a051bc34f3 Topic: org.fedoraproject.prod.buildsys.tag Headers: { "fedora_messaging_schema": "base.message", "fedora_messaging_severity": 20, "sent-at": "2022-01-26T14:11:54+00:00" } Body: { "build_id": 1908650, "instance": "primary", "name": "fapolicyd", "owner": "rsroka", "release": "1.fc36", "tag": "f36-updates-candidate", "tag_id": 44419, "user": "rsroka", "version": "1.1" } Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/__init__.py", line 81, in __call__ handler_info.handler(msg) File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/automatic_updates.py", line 151, in __call__ changelog = build.get_changelog(lastupdate=True) File "/usr/lib/python3.9/site-packages/bodhi/server/models.py", line 1759, in get_changelog rpm_header = get_rpm_header(self.nvr) File "/usr/lib/python3.9/site-packages/bodhi/server/util.py", line 103, in get_rpm_header raise ValueError("No rpm headers found in koji for %r" % nvr) ValueError: No rpm headers found in koji for 'fapolicyd-1.1-1.fc36'
The most recent one:
2022-02-15 19:35:28,843 ERROR [bodhi][PoolThread-twisted.internet.reactor-1] No rpm headers found in koji for 'help2man-1.49.1-1.fc36': Unable to handle message in Automatic Update handler: Id: e8c5b28e-283c-4495-9b7f-0502961eff0f Topic: org.fedoraproject.prod.buildsys.tag Headers: { "fedora_messaging_schema": "base.message", "fedora_messaging_severity": 20, "sent-at": "2022-02-15T19:35:23+00:00" } Body: { "build_id": 1918627, "instance": "primary", "name": "help2man", "owner": "corsepiu", "release": "1.fc36", "tag": "f36-updates-candidate", "tag_id": 44419, "user": "corsepiu", "version": "1.49.1" } Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/__init__.py", line 81, in __call__ handler_info.handler(msg) File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/automatic_updates.py", line 151, in __call__ changelog = build.get_changelog(lastupdate=True) File "/usr/lib/python3.9/site-packages/bodhi/server/models.py", line 1759, in get_changelog rpm_header = get_rpm_header(self.nvr) File "/usr/lib/python3.9/site-packages/bodhi/server/util.py", line 103, in get_rpm_header raise ValueError("No rpm headers found in koji for %r" % nvr) ValueError: No rpm headers found in koji for 'help2man-1.49.1-1.fc36'
Happy to try debug to get to the bottom of this.
The patch in #3256 might help with debugging. It adds a strict option, which bodhi might want to start using if we merge it, but even without strict it will still log an error line for missing files.
Metadata Update from @tkopecek: - Issue set to the milestone: 1.29
I saw this problem last on 2020-12-01 Then it started again on 2022-01-26 There seems to be between 0 and 5 of these a day.
The problem stopped in late 2020 because we stopped trying to get rpm headers when Koji returned an error (in a way which also disabled logging the error). Until we re-enabled it with https://github.com/fedora-infra/bodhi/pull/4326
ok. I have rolled that out. Let me know what I should gather after it's been running a while...
Looking the above Koji's patch, Bodhi will log the error about missing files only when Koji returns an exception, not an empty result. For this to happen, we must patch also Bodhi to use strict.
However, you should find some error in Koji's log about this one:
2022-02-20 09:32:17,587 INFO [bodhi][PoolThread-twisted.internet.reactor-0] rust-uncased-0.9.6-1.fc37 tagged into f37-updates-candidate 2022-02-20 09:32:17,593 INFO [bodhi][PoolThread-twisted.internet.reactor-0] Build was not submitted, skipping 2022-02-20 09:32:17,824 WARNI [bodhi][PoolThread-twisted.internet.reactor-0] No rpm headers found in koji for 'rust-uncased-0.9.6-1.fc37' 2022-02-20 09:32:22,833 ERROR [bodhi][PoolThread-twisted.internet.reactor-0] No rpm headers found in koji for 'rust-uncased-0.9.6-1.fc37': Unable to handle message in Automatic Update handler: Id: 85d38520-4725-4bbe-ba56-378cc04159ed Topic: org.fedoraproject.prod.buildsys.tag Headers: { "fedora_messaging_schema": "base.message", "fedora_messaging_severity": 20, "sent-at": "2022-02-20T09:32:17+00:00" } Body: { "build_id": 1920598, "instance": "primary", "name": "rust-uncased", "owner": "ignatenkobrain", "release": "1.fc37", "tag": "f37-updates-candidate", "tag_id": 50463, "user": "ignatenkobrain", "version": "0.9.6" } Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/__init__.py", line 81, in __call__ handler_info.handler(msg) File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/automatic_updates.py", line 151, in __call__ changelog = build.get_changelog(lastupdate=True) File "/usr/lib/python3.9/site-packages/bodhi/server/models.py", line 1759, in get_changelog rpm_header = get_rpm_header(self.nvr) File "/usr/lib/python3.9/site-packages/bodhi/server/util.py", line 103, in get_rpm_header raise ValueError("No rpm headers found in koji for %r" % nvr) ValueError: No rpm headers found in koji for 'rust-uncased-0.9.6-1.fc37' 2022-02-20 09:32:23,104 WARNI [fedora_messaging.twisted.consumer][MainThread] Returning message id 85d38520-4725-4bbe-ba56-378cc04159ed to the queue 2022-02-20 09:32:23,108 INFO [fedora_messaging.twisted.consumer][MainThread] Consuming message from topic org.fedoraproject.prod.buildsys.tag (message id 85d38520-4725-4bbe-ba56-378cc04159ed) 2022-02-20 09:32:23,108 INFO [bodhi][PoolThread-twisted.internet.reactor-1] Received message from fedora-messaging with topic: org.fedoraproject.prod.buildsys.tag 2022-02-20 09:32:23,108 INFO [bodhi][PoolThread-twisted.internet.reactor-1] rust-uncased-0.9.6-1.fc37 tagged into f37-updates-candidate 2022-02-20 09:32:23,112 INFO [bodhi][PoolThread-twisted.internet.reactor-1] Build was not submitted, skipping 2022-02-20 09:32:23,777 INFO [bodhi.server][PoolThread-twisted.internet.reactor-1] Sending mail to igor.raits@gmail.com: [Fedora Update] [comment] rust-uncased-0.9.6-1.fc37 2022-02-20 09:32:24,360 INFO [fedora_messaging.twisted.consumer][MainThread] Successfully consumed message from topic org.fedoraproject.prod.buildsys.tag (message id 85d38520-4725-4bbe-ba56-378cc04159ed)
Is this what you expect:
[Sun Feb 20 09:32:17.817514 2022] [wsgi:error] [pid 653390:tid 653418] [client 10.3.163.75:59638] 2022-02-20 09:32:17,817 [ERROR] m=getRPMHeaders u=bodhi p=653390 r=10.3.163.75:59638 koji.hub: Missing rpm file: /mnt/koji/packages/rust-uncased/0.9.6/1.fc37/src/rust-uncased-0.9.6-1.fc37.src.rpm [Sun Feb 20 09:32:17.817551 2022] [wsgi:error] [pid 653390:tid 653418] [client 10.3.163.75:59638] 817 [ERROR] m=None u=None p=653390 r=?:? koji.hub: Missing rpm file: /mnt/koji/packages/rust-uncased/0.9.6/1.fc37/src/rust-uncased-0.9.6-1.fc37.src.rpm
Of course the file does exist now:
File: /mnt/koji/packages/rust-uncased/0.9.6/1.fc37/src/rust-uncased-0.9.6-1.fc37.src.rpm Size: 18852 Blocks: 40 IO Block: 65536 regular file Device: 32h/50d Inode: 1367133619 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 48/ apache) Gid: ( 48/ apache) Context: system_u:object_r:nfs_t:s0 Access: 2022-02-20 09:47:45.869440000 +0000 Modify: 2022-02-20 09:30:53.483315000 +0000 Change: 2022-02-20 09:32:14.089576000 +0000 Birth: -
Metadata Update from @tkopecek: - Issue tagged with: testing-ready
@kevin the timestamp on the error you posted is about 4s after the build completion time, or 3s after the ctime of the file (this is updated when the rpm is linked into its final location, whereas the mtime is set when the file is initially uploaded under work/).
work/
This appears to be some sort of NFS lag (likely caused by caching). I'd guess the completeBuild call that moved the build into place was handled by a different hub than than the one that is handling this getRPMHeaders call.
I'm not sure what Koji can reasonably do here. The fs is basically lying to us. I don't think it is safe for the hub to wait for this file to appear.
Ideally, we want to fix the underlying nfs lag issue.
Short of that, the only way to work around this is to add some sort of delay, perhaps selectively. This could be done on the bodhi end, e.g. check build completion time and add a wait if it was mere seconds ago.
Yeah, I agree...
After our current freeze is over, I would like to try lookupcache=positive here. That should cause it to only cache positive (ie, it does exist) results, never negative (ie, does not exist(. That should make the other hub send a NFS_LOOKUP for the file, which should find it...
But I'm not sure what the perf impact will be or if I have fully understanding the problem. :)
Metadata Update from @jobrauer: - Issue tagged with: testing-done
I was wondering if there was a way we could tell the message bus to add a short delay before delivering some messages.
Interesting thought. I think it should be possible... rabbitmq has a 'Delayed Message Plugin' that lets you set a header on a message and delays it that long. I guess fedora-messaging would need to expose this somehow.
Filed RFE: https://github.com/fedora-infra/fedora-messaging/issues/254
Commit 433e741 relates to this ticket
Commit fcce0b9 relates to this ticket
Metadata Update from @tkopecek: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.