#6715 kerberos credentials cache error during f26 nightly pungi run
Closed: Fixed 7 years ago Opened 7 years ago by dustymabe.

We keep seeing kerberos failures when running our nightly pungi runs in f26. It doesn't happen all the time but has happened a couple of times in f26 so far. The log from 03/20 ostree tree compose shows the error:

COMMAND: koji --profile=compose_koji runroot --new-chroot --use-shell --task-id --channel-override=compose --package=pungi --package=ostree --package=rpm-ostree --mount=/mnt/koji/compose/branched/Fedora-26-20170320.n.0 --mount=/mnt/koji/compose/ostree/26/ f26-build x86_64 'rm -f /var/lib/rpm/__db*; rm -rf /var/cache/yum/*; set -x; pungi-make-ostree tree --repo=/mnt/koji/compose/ostree/26/ --log-dir=/mnt/koji/compose/branched/Fedora-26-20170320.n.0/logs/x86_64/ostree/ostree-3 --treefile=/mnt/koji/compose/branched/Fedora-26-20170320.n.0/work/ostree-3/config_repo/fedora-ostree-workstation.json --extra-config=/mnt/koji/compose/branched/Fedora-26-20170320.n.0/work/ostree-3/extra_config.json'
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Kerberos authentication failed: Internal credentials cache error (-1765328188)

@parasense suggested that it's possible the pungi compose took longer than the krb5 ticket lifespan. This run started at 2017-03-20 07:17:28 and the failure happened around 2017-03-20 12:25:57.

@parasense thinks "we might want keytabs" for this?


The compose_koji profile is using a keytab.

@lsedlar is it possible that pungi is not calling with the correct profile?

@dustymabe: that profile 1. does use keytabs, and 2. the ticket is valid for 24 hours. Which means that within those 5 hours, there definitely would not be a ticket timeout. Regardless of the fact that the koji client takes care of renewing it, if it has the keytab.

I'm reasonably sure that this might be the bug where pungi does not always use the correct profile for some operations, or something else where it doesn't pass everything needed to the koji client.

I'm not sure what the cause is, was just trying to report the issue. Please assign this bug to the appropriate party that should investigate.

While it's definitely possible there is some bug in Pungi, in this case the log shows that the command did use --profile=compose_koji, which looks correct to me.

@puiterwijk Could it be a race condition when multiple koji commands are invoked in parallel?

I think it indeed is a race condition. I managed to replicate it with for x in $(seq 1 100) ; do sudo koji -p compose_koji hello & done >/dev/null. Every now and the some of the commands fail and print the error.

This should be fixable on Pungi side by setting KRB5CCNAME env var to an fresh directory (but only if keytab is used for authentication).

Right. After looking further, this is because when you provide a keytab, you bypass the GSSAPI code paths (that path doesn't support keytabs).

The krbv codepath does a new init_creds_keytab every single time, which gets a new credential everytime, regardless of whether or not one is already on the credential cache.
As a result, when two koji instances at the same time perform krb_login with keytabs, one will erase the credentials the other one has gotten, while the other tries to use that credential to log in.

This should fix it from Pungi side: https://pagure.io/pungi/pull-request/607
The reason why we only ever see this ostree tasks is that in all other phases that start koji commands in parallel there already is a protection (of sorts) against this: there are sleeps so the commands don't actually run at the same time.

I deployed the new pungi on branched-composer last night.

It still failed, but it's a new error now, so probibly we can close this issue.

...
DEBUG util.py:439:  ERROR running command: rpm-ostree compose tree --repo=/mnt/koji/compose/atomic/26/ --write-commitid-to=/mnt/koji/compose/branched/Fedora-26-20170514.n.0/logs/x86_64/Atomic/ostree-2/commitid.log /mnt/koji/compose/branched/Fedora-26-20170514.n.0/work/ostree-2/config_repo/fedora-atomic-docker-host.json
DEBUG util.py:439:  COMMAND: rpm-ostree compose tree --repo=/mnt/koji/compose/atomic/26/ --write-commitid-to=/mnt/koji/compose/branched/Fedora-26-20170514.n.0/logs/x86_64/Atomic/ostree-2/commitid.log /mnt/koji/compose/branched/Fedora-26-20170514.n.0/work/ostree-2/config_repo/fedora-atomic-docker-host.json
DEBUG util.py:439:  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DEBUG util.py:439:  Previous commit: 347a653bc1b2d81cd807e0d956ac3a43411e773715064a6ef93a2184099a04c6
DEBUG util.py:439:  error: cannot update repo 'source_repo_from-20170514181412': Cannot prepare internal mirrorlist: Cannot resolve path for: "None"
DEBUG util.py:439:  Traceback (most recent call last):
DEBUG util.py:439:    File "/usr/bin/pungi-make-ostree", line 15, in <module>
DEBUG util.py:439:      ostree.main()
DEBUG util.py:439:    File "/usr/lib/python2.7/site-packages/pungi/ostree/__init__.py", line 89, in main
DEBUG util.py:439:      func()
DEBUG util.py:439:    File "/usr/lib/python2.7/site-packages/pungi/ostree/tree.py", line 101, in run
DEBUG util.py:439:      self._make_tree()
DEBUG util.py:439:    File "/usr/lib/python2.7/site-packages/pungi/ostree/tree.py", line 46, in _make_tree
DEBUG util.py:439:      shortcuts.run(cmd, show_cmd=True, stdout=True, logfile=log_file)
DEBUG util.py:439:    File "/usr/lib/python2.7/site-packages/kobo/shortcuts.py", line 335, in run
DEBUG util.py:439:      raise RuntimeError(err_msg)
DEBUG util.py:439:  RuntimeError: ERROR running command: rpm-ostree compose tree --repo=/mnt/koji/compose/atomic/26/ --write-commitid-to=/mnt/koji/compose/branched/Fedora-26-20170514.n.0/logs/x86_64/Atomic/ostree-2/commitid.log /mnt/koji/compose/branched/Fedora-26-20170514.n.0/work/ostree-2/config_repo/fedora-atomic-docker-host.json
...

I believe that is caused by mismatch between pungi version on the composer (4.1.15) and in the buildroot (4.1.13).

I believe that is caused by mismatch between pungi version on the composer (4.1.15) and in the buildroot (4.1.13).

yep. kevin and I figured that out last night and he submitted a buildroot overrides for the new version of pungi. We'll see if it works this time.

Also, we should probably go through our pungi configuration and update all the repo_from/source_repo_from to just repo now that those are deprecated.

Metadata Update from @dustymabe:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

7 years ago

Login to comment on this ticket.

Metadata