#3178 getRPMHeaders sometimes returning empty result
Closed: Fixed 2 years ago by tkopecek. Opened 2 years ago by mattia.

We have an issue on Bodhi where sometimes automatic updates fail to get the changelog in notes because for unknown reason Koji's getRPMHeaders is returning an empty result.

This happens sometimes with real user made builds, while automatically generated test packages like dummy-test-package-gloster are always missing the changelog.

https://github.com/fedora-infra/bodhi/issues/4146 is an old log entry of the problem, we're now handling the ValueError case by ignoring generating the changelog, but that causes bugs referenced in the changelog to not be added to the update.

The code responsible for downloading the headers from Koji is at https://github.com/fedora-infra/bodhi/blob/f7d80356e5a8333bd790d84e1aff2d51af3c9186/bodhi/server/util.py#L68-L103
As you can see, Koji is not throwing any exception, or the outcome would be different. It is reporting an empty result and accordingly to Koji code that means it is not finding the relevant build.

Usually an automatic update is created only few seconds after the build completion time. Can a too short delay be the cause? I see updates created with the same delay (3 seconds) having the changelog, while others haven't.
Another thing to mention is that manually running some time later get_rpm_headers or get_changelog for the same buggy build returns the correct output.


Metadata Update from @tkopecek:
- Custom field Size adjusted to None
- Issue tagged with: bug

2 years ago

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.28

2 years ago

I would expect that it could be caused by some NFS configuration. I've not seen such an issue in any other koji instance. I would try to update bodhi's get_rpm_header to
1) retry also on empty result (not only exception)
2) add some short sleep before retry (if it is possible)

Filed https://github.com/fedora-infra/bodhi/pull/4326

It will take some time to get it applied to prod, though.

Today I noticed one problem with the build dummy-test-package-gloster-0-6961.fc36.

It took 10 seconds (2 retries, since I've put a 5 second sleep) from receiving the build tagged message before Koji was able to return RPM headers:

2022-01-30 14:57:26,061 INFO  [bodhi][PoolThread-twisted.internet.reactor-1] dummy-test-package-gloster-0-6961.fc36 tagged into f36-updates-candidate
2022-01-30 14:57:26,614 WARNI [bodhi][PoolThread-twisted.internet.reactor-1] No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36'
2022-01-30 14:57:31,621 ERROR [bodhi][PoolThread-twisted.internet.reactor-1] No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36': Unable to handle message in Automatic Update handler: Id: 50f69b81-72a2-44a2-9336-88e2b276ec9a
Topic: org.fedoraproject.prod.buildsys.tag
Headers: {
    "fedora_messaging_schema": "base.message",
    "fedora_messaging_severity": 20,
    "sent-at": "2022-01-30T14:57:26+00:00"
}
Body: {
    "build_id": 1910199,
    "instance": "primary",
    "name": "dummy-test-package-gloster",
    "owner": "packagerbot/os-master02.iad2.fedoraproject.org",
    "release": "6961.fc36",
    "tag": "f36-updates-candidate",
    "tag_id": 44419,
    "user": "packagerbot/os-master02.iad2.fedoraproject.org",
    "version": "0"
}
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/__init__.py", line 81, in __call__
    handler_info.handler(msg)
  File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/automatic_updates.py", line 151, in __call__
    changelog = build.get_changelog(lastupdate=True)
  File "/usr/lib/python3.9/site-packages/bodhi/server/models.py", line 1759, in get_changelog
    rpm_header = get_rpm_header(self.nvr)
  File "/usr/lib/python3.9/site-packages/bodhi/server/util.py", line 103, in get_rpm_header
    raise ValueError("No rpm headers found in koji for %r" % nvr)
ValueError: No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36'
2022-01-30 14:57:31,736 WARNI [fedora_messaging.twisted.consumer][MainThread] Returning message id 50f69b81-72a2-44a2-9336-88e2b276ec9a to the queue
2022-01-30 14:57:31,739 INFO  [fedora_messaging.twisted.consumer][MainThread] Consuming message from topic org.fedoraproject.prod.buildsys.tag (message id 50f69b81-72a2-44a2-9336-88e2b276ec9a)
2022-01-30 14:57:31,739 INFO  [bodhi][PoolThread-twisted.internet.reactor-0] Received message from fedora-messaging with topic: org.fedoraproject.prod.buildsys.tag
2022-01-30 14:57:31,740 INFO  [bodhi][PoolThread-twisted.internet.reactor-0] dummy-test-package-gloster-0-6961.fc36 tagged into f36-updates-candidate
2022-01-30 14:57:31,875 WARNI [bodhi][PoolThread-twisted.internet.reactor-0] No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36'
2022-01-30 14:57:36,880 ERROR [bodhi][PoolThread-twisted.internet.reactor-0] No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36': Unable to handle message in Automatic Update handler: Id: 50f69b81-72a2-44a2-9336-88e2b276ec9a
Topic: org.fedoraproject.prod.buildsys.tag
Headers: {
    "fedora_messaging_schema": "base.message",
    "fedora_messaging_severity": 20,
    "sent-at": "2022-01-30T14:57:26+00:00"
}
Body: {
    "build_id": 1910199,
    "instance": "primary",
    "name": "dummy-test-package-gloster",
    "owner": "packagerbot/os-master02.iad2.fedoraproject.org",
    "release": "6961.fc36",
    "tag": "f36-updates-candidate",
    "tag_id": 44419,
    "user": "packagerbot/os-master02.iad2.fedoraproject.org",
    "version": "0"
}
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/__init__.py", line 81, in __call__
    handler_info.handler(msg)
  File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/automatic_updates.py", line 151, in __call__
    changelog = build.get_changelog(lastupdate=True)
  File "/usr/lib/python3.9/site-packages/bodhi/server/models.py", line 1759, in get_changelog
    rpm_header = get_rpm_header(self.nvr)
  File "/usr/lib/python3.9/site-packages/bodhi/server/util.py", line 103, in get_rpm_header
    raise ValueError("No rpm headers found in koji for %r" % nvr)
ValueError: No rpm headers found in koji for 'dummy-test-package-gloster-0-6961.fc36'
2022-01-30 14:57:36,898 WARNI [fedora_messaging.twisted.consumer][MainThread] Returning message id 50f69b81-72a2-44a2-9336-88e2b276ec9a to the queue
2022-01-30 14:57:36,901 INFO  [fedora_messaging.twisted.consumer][MainThread] Consuming message from topic org.fedoraproject.prod.buildsys.tag (message id 50f69b81-72a2-44a2-9336-88e2b276ec9a)
2022-01-30 14:57:36,901 INFO  [bodhi][PoolThread-twisted.internet.reactor-2] Received message from fedora-messaging with topic: org.fedoraproject.prod.buildsys.tag
2022-01-30 14:57:36,902 INFO  [bodhi][PoolThread-twisted.internet.reactor-2] dummy-test-package-gloster-0-6961.fc36 tagged into f36-updates-candidate
2022-01-30 14:57:37,582 INFO  [fedora_messaging.twisted.consumer][MainThread] Successfully consumed message from topic org.fedoraproject.prod.buildsys.tag (message id 50f69b81-72a2-44a2-9336-88e2b276ec9a)

@kevin Could this be related to fedora's NFS volume setup?

Metadata Update from @tkopecek:
- Issue set to the milestone: None (was: 1.28)

2 years ago

Metadata Update from @tkopecek:
- Issue untagged with: bug

2 years ago

I wouldn't see how... there's no errors I am aware of.

  File: /mnt/koji/packages/dummy-test-package-gloster/0/6961.fc36/noarch/dummy-test-package-gloster-0-6961.fc36.noarch.rpm
  Size: 433228          Blocks: 856        IO Block: 65536  regular file
Device: 32h/50d Inode: 1469603766  Links: 1
Access: (0644/-rw-r--r--)  Uid: (   48/  apache)   Gid: (   48/  apache)
Context: system_u:object_r:nfs_t:s0
Access: 2022-01-30 23:49:27.094831000 +0000
Modify: 2022-01-30 14:56:49.366298000 +0000
Change: 2022-01-30 14:57:24.344221000 +0000
 Birth: -

@mikem Any idea? It looks weird as koji already written that file before sending the message. I'm not sure how we can get to state that the same hub can't see that file for a few seconds after that.

This reminds me of issues we had years ago before the fedmsg plugin deferred sending until postCommit, but the plugin is definitely doing that now.

(side note: it looks like the fedmsg plugin on koji.fp.o does not match the code from https://pagure.io/koji-fedmsg-plugin, and isn't part of an rpm. How is this plugin managed?) @kevin?

As I understand it, you have two (or more?) hub systems behind a proxy. Best guess is that there is some occasional lag between one hub writing a file and the other hub being able to see it. The first call failure from the recent example given by @mattia is about about 2 seconds after the build completion timestamp, which is very close to both the time of the write and the time of the message sending.

I don't think it makes sense to add a seconds long retry to the hub code for this.

A trivially naive experiment detects no lag between writing a file to scratch on koji01 and reading it on koji02, but I suspect that if it is an fs lag issue, it is probably random and intermittent, likely triggered by heavy io load.

In the getRPMHeaders code, there appear to be only a few ways that the code could return an empty result.

  1. if the rpm is not in the db
  2. if the rpm db entry doesn't have a build id
  3. if rpm path does not exist
  4. if headers=[] is passed in

I believe that the missing file is the most likely, but it would be good to confirm. Perhaps we could stick some log lines in the handler for debugging?

It would also be really helpful to understand how often this is happening. Between this issue and the linked bodhi one from 2020, I can see two examples.

Yeah, the fedmsg plugin is deployed directly from ansible:
https://pagure.io/fedora-infra/ansible/blob/main/f/roles/koji_hub/templates/fedmsg-koji-plugin.py
we try and keep that other one in sync as a 'upstream', but sometimes that fails. ;(

I can actually see more info on frequency because bodhi sends an error when it hits this.

I saw this problem last on 2020-12-01
Then it started again on 2022-01-26

There seems to be between 0 and 5 of these a day.

The first one:

2022-01-26 14:11:59,470 ERROR [bodhi][PoolThread-twisted.internet.reactor-1] No rpm headers found in koji for 'fapolicyd-1.1-1.fc36': Unable to handle message in Automatic Update handler: Id:
d1aae630-9ff0-4647-8841-b7a051bc34f3
Topic: org.fedoraproject.prod.buildsys.tag
Headers: {
    "fedora_messaging_schema": "base.message",
    "fedora_messaging_severity": 20,
    "sent-at": "2022-01-26T14:11:54+00:00"
}
Body: {
    "build_id": 1908650,
    "instance": "primary",
    "name": "fapolicyd",
    "owner": "rsroka",
    "release": "1.fc36",
    "tag": "f36-updates-candidate",
    "tag_id": 44419,
    "user": "rsroka",
    "version": "1.1"
}
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/__init__.py", line 81, in __call__
    handler_info.handler(msg)
  File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/automatic_updates.py", line 151, in __call__
    changelog = build.get_changelog(lastupdate=True)
  File "/usr/lib/python3.9/site-packages/bodhi/server/models.py", line 1759, in get_changelog
    rpm_header = get_rpm_header(self.nvr)
  File "/usr/lib/python3.9/site-packages/bodhi/server/util.py", line 103, in get_rpm_header
    raise ValueError("No rpm headers found in koji for %r" % nvr)
ValueError: No rpm headers found in koji for 'fapolicyd-1.1-1.fc36'

The most recent one:

2022-02-15 19:35:28,843 ERROR [bodhi][PoolThread-twisted.internet.reactor-1] No rpm headers found in koji for 'help2man-1.49.1-1.fc36': Unable to handle message in Automatic Update handler: Id:
e8c5b28e-283c-4495-9b7f-0502961eff0f
Topic: org.fedoraproject.prod.buildsys.tag
Headers: {
    "fedora_messaging_schema": "base.message",
    "fedora_messaging_severity": 20,
    "sent-at": "2022-02-15T19:35:23+00:00"
}
Body: {
    "build_id": 1918627,
    "instance": "primary",
    "name": "help2man",
    "owner": "corsepiu",
    "release": "1.fc36",
    "tag": "f36-updates-candidate",
    "tag_id": 44419,
    "user": "corsepiu",
    "version": "1.49.1"
}
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/__init__.py", line 81, in __call__
    handler_info.handler(msg)
  File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/automatic_updates.py", line 151, in __call__
    changelog = build.get_changelog(lastupdate=True)
  File "/usr/lib/python3.9/site-packages/bodhi/server/models.py", line 1759, in get_changelog
    rpm_header = get_rpm_header(self.nvr)
  File "/usr/lib/python3.9/site-packages/bodhi/server/util.py", line 103, in get_rpm_header
    raise ValueError("No rpm headers found in koji for %r" % nvr)
ValueError: No rpm headers found in koji for 'help2man-1.49.1-1.fc36'

Happy to try debug to get to the bottom of this.

The patch in #3256 might help with debugging. It adds a strict option, which bodhi might want to start using if we merge it, but even without strict it will still log an error line for missing files.

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.29

2 years ago

I saw this problem last on 2020-12-01
Then it started again on 2022-01-26

There seems to be between 0 and 5 of these a day.

The problem stopped in late 2020 because we stopped trying to get rpm headers when Koji returned an error (in a way which also disabled logging the error). Until we re-enabled it with https://github.com/fedora-infra/bodhi/pull/4326

ok. I have rolled that out. Let me know what I should gather after it's been running a while...

Looking the above Koji's patch, Bodhi will log the error about missing files only when Koji returns an exception, not an empty result. For this to happen, we must patch also Bodhi to use strict.

However, you should find some error in Koji's log about this one:

2022-02-20 09:32:17,587 INFO  [bodhi][PoolThread-twisted.internet.reactor-0] rust-uncased-0.9.6-1.fc37 tagged into f37-updates-candidate
2022-02-20 09:32:17,593 INFO  [bodhi][PoolThread-twisted.internet.reactor-0] Build was not submitted, skipping
2022-02-20 09:32:17,824 WARNI [bodhi][PoolThread-twisted.internet.reactor-0] No rpm headers found in koji for 'rust-uncased-0.9.6-1.fc37'
2022-02-20 09:32:22,833 ERROR [bodhi][PoolThread-twisted.internet.reactor-0] No rpm headers found in koji for 'rust-uncased-0.9.6-1.fc37': Unable to handle message in Automatic Update handler: Id: 85d38520-4725-4bbe-ba56-378cc04159ed
Topic: org.fedoraproject.prod.buildsys.tag
Headers: {
    "fedora_messaging_schema": "base.message",
    "fedora_messaging_severity": 20,
    "sent-at": "2022-02-20T09:32:17+00:00"
}
Body: {
    "build_id": 1920598,
    "instance": "primary",
    "name": "rust-uncased",
    "owner": "ignatenkobrain",
    "release": "1.fc37",
    "tag": "f37-updates-candidate",
    "tag_id": 50463,
    "user": "ignatenkobrain",
    "version": "0.9.6"
}
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/__init__.py", line 81, in __call__
    handler_info.handler(msg)
  File "/usr/lib/python3.9/site-packages/bodhi/server/consumers/automatic_updates.py", line 151, in __call__
    changelog = build.get_changelog(lastupdate=True)
  File "/usr/lib/python3.9/site-packages/bodhi/server/models.py", line 1759, in get_changelog
    rpm_header = get_rpm_header(self.nvr)
  File "/usr/lib/python3.9/site-packages/bodhi/server/util.py", line 103, in get_rpm_header
    raise ValueError("No rpm headers found in koji for %r" % nvr)
ValueError: No rpm headers found in koji for 'rust-uncased-0.9.6-1.fc37'
2022-02-20 09:32:23,104 WARNI [fedora_messaging.twisted.consumer][MainThread] Returning message id 85d38520-4725-4bbe-ba56-378cc04159ed to the queue
2022-02-20 09:32:23,108 INFO  [fedora_messaging.twisted.consumer][MainThread] Consuming message from topic org.fedoraproject.prod.buildsys.tag (message id 85d38520-4725-4bbe-ba56-378cc04159ed)
2022-02-20 09:32:23,108 INFO  [bodhi][PoolThread-twisted.internet.reactor-1] Received message from fedora-messaging with topic: org.fedoraproject.prod.buildsys.tag
2022-02-20 09:32:23,108 INFO  [bodhi][PoolThread-twisted.internet.reactor-1] rust-uncased-0.9.6-1.fc37 tagged into f37-updates-candidate
2022-02-20 09:32:23,112 INFO  [bodhi][PoolThread-twisted.internet.reactor-1] Build was not submitted, skipping
2022-02-20 09:32:23,777 INFO  [bodhi.server][PoolThread-twisted.internet.reactor-1] Sending mail to igor.raits@gmail.com: [Fedora Update] [comment] rust-uncased-0.9.6-1.fc37
2022-02-20 09:32:24,360 INFO  [fedora_messaging.twisted.consumer][MainThread] Successfully consumed message from topic org.fedoraproject.prod.buildsys.tag (message id 85d38520-4725-4bbe-ba56-378cc04159ed)

Is this what you expect:

[Sun Feb 20 09:32:17.817514 2022] [wsgi:error] [pid 653390:tid 653418] [client 10.3.163.75:59638] 2022-02-20 09:32:17,817 [ERROR] m=getRPMHeaders u=bodhi p=653390 r=10.3.163.75:59638 koji.hub: Missing rpm file: /mnt/koji/packages/rust-uncased/0.9.6/1.fc37/src/rust-uncased-0.9.6-1.fc37.src.rpm
[Sun Feb 20 09:32:17.817551 2022] [wsgi:error] [pid 653390:tid 653418] [client 10.3.163.75:59638] 817 [ERROR] m=None u=None p=653390 r=?:? koji.hub: Missing rpm file: /mnt/koji/packages/rust-uncased/0.9.6/1.fc37/src/rust-uncased-0.9.6-1.fc37.src.rpm

Of course the file does exist now:

stat /mnt/koji/packages/rust-uncased/0.9.6/1.fc37/src/rust-uncased-0.9.6-1.fc37.src.rpm

File: /mnt/koji/packages/rust-uncased/0.9.6/1.fc37/src/rust-uncased-0.9.6-1.fc37.src.rpm
Size: 18852 Blocks: 40 IO Block: 65536 regular file
Device: 32h/50d Inode: 1367133619 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 48/ apache) Gid: ( 48/ apache)
Context: system_u:object_r:nfs_t:s0
Access: 2022-02-20 09:47:45.869440000 +0000
Modify: 2022-02-20 09:30:53.483315000 +0000
Change: 2022-02-20 09:32:14.089576000 +0000
Birth: -

Metadata Update from @tkopecek:
- Issue tagged with: testing-ready

2 years ago

@kevin the timestamp on the error you posted is about 4s after the build completion time, or 3s after the ctime of the file (this is updated when the rpm is linked into its final location, whereas the mtime is set when the file is initially uploaded under work/).

This appears to be some sort of NFS lag (likely caused by caching). I'd guess the completeBuild call that moved the build into place was handled by a different hub than than the one that is handling this getRPMHeaders call.

I'm not sure what Koji can reasonably do here. The fs is basically lying to us. I don't think it is safe for the hub to wait for this file to appear.

Ideally, we want to fix the underlying nfs lag issue.

Short of that, the only way to work around this is to add some sort of delay, perhaps selectively. This could be done on the bodhi end, e.g. check build completion time and add a wait if it was mere seconds ago.

Yeah, I agree...

After our current freeze is over, I would like to try lookupcache=positive here. That should cause it to only cache positive (ie, it does exist) results, never negative (ie, does not exist(.
That should make the other hub send a NFS_LOOKUP for the file, which should find it...

But I'm not sure what the perf impact will be or if I have fully understanding the problem. :)

Metadata Update from @jobrauer:
- Issue tagged with: testing-done

2 years ago

I was wondering if there was a way we could tell the message bus to add a short delay before delivering some messages.

Interesting thought. I think it should be possible... rabbitmq has a 'Delayed Message Plugin' that lets you set a header on a message and delays it that long. I guess fedora-messaging would need to expose this somehow.

Filed RFE: https://github.com/fedora-infra/fedora-messaging/issues/254

Commit 433e741 relates to this ticket

Commit fcce0b9 relates to this ticket

Metadata Update from @tkopecek:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #3256 Merged 2 years ago