#8228 Nightly failure in backup/restore while calling 'id admin'
Closed: fixed 4 years ago by abbra. Opened 4 years ago by frenaud.

Issue

The nightly tests for backup/restore sometimes fail checking that id admin properly finds the admin user after a restore. This happens because the SSSD backend may still be offline at that time.
For an example and logs, see PR 4368 with logs here:

self = <ipatests.test_integration.test_backup_and_restore.TestBackupReinstallRestoreWithDNS object at 0x7f575263f090>

    def test_full_backup_reinstall_restore_with_DNS_zone(self):
        """backup, uninstall, reinstall, restore"""
>       self._full_backup_restore_with_DNS_zone(reinstall=True)

test_integration/test_backup_and_restore.py:352: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_integration/test_backup_and_restore.py:340: in _full_backup_restore_with_DNS_zone
    tasks.resolve_record(self.master.ip, self.example2_test_zone)
/usr/lib64/python3.7/contextlib.py:119: in __exit__
    next(self.gen)
test_integration/test_backup_and_restore.py:158: in restore_checker
    got = check(host)
test_integration/test_backup_and_restore.py:89: in check_admin_in_id
    result = host.run_command(['id', 'admin'])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ipatests.pytest_ipa.integration.host.Host master.ipa.test (master)>
argv = ['id', 'admin'], set_env = True, stdin_text = None, log_stdout = True
raiseonerr = True, cwd = None, bg = False, encoding = 'utf-8', ok_returncode = 0

    def run_command(self, argv, set_env=True, stdin_text=None,
                    log_stdout=True, raiseonerr=True,
                    cwd=None, bg=False, encoding='utf-8', ok_returncode=0):
        """Wrapper around run_command to log stderr on raiseonerr=True

        :param ok_returncode: return code considered to be correct,
                              you can pass an integer or sequence of integers
        """
        result = super().run_command(
            argv, set_env=set_env, stdin_text=stdin_text,
            log_stdout=log_stdout, raiseonerr=False, cwd=cwd, bg=bg,
            encoding=encoding
        )
        # in FIPS mode SSH may print noise to stderr, remove the string
        # "FIPS mode initialized" + optional newline.
        result.stderr_bytes = FIPS_NOISE_RE.sub(b'', result.stderr_bytes)
        try:
            result_ok = result.returncode in ok_returncode
        except TypeError:
            result_ok = result.returncode == ok_returncode
        if not result_ok and raiseonerr:
            result.log.error('stderr: %s', result.stderr_text)
            raise subprocess.CalledProcessError(
                result.returncode, argv,
>               result.stdout_text, result.stderr_text
            )
E           subprocess.CalledProcessError: Command '['id', 'admin']' returned non-zero exit status 1.

Metadata Update from @frenaud:
- Issue tagged with: test-failure, tests

4 years ago

Note: the issue happens since commit 1eb6a9b ipa-restore: restart services at the end
Now a restore performs a double ipactl restart, meaning LDAP service is restarting at the end and SSSD is not re-connecting immediately to LDAP.

@frenaud I commented on a different nightly test about it and provided a solution:

The following three failed tests seems to have race condition with 389-ds restart:

fedora-latest/test_backup_and_restore_TestBackupReinstallRestoreWithDNS
fedora-latest/test_backup_and_restore_TestBackupReinstallRestoreWithDNSSEC 
fedora-latest/test_backup_and_restore_TestBackupReinstallRestoreWithKRA 

All of them have a sequence that tries to ensure certain operations continue working after a backup was restored. However, there is no waiting for IPA services to actually run properly before we start the checks. It might take some time to get 389-ds into functioning state and then sssd will take time to mark IPA domain as online.

The best way to solve it is to add waiting for a service to be operational to each check. Right now, it seems, we are hitting SSSD not recovering from offline LDAP server 'soon enough'. This might be ensured by calling sssctl domain-status ipa.test -o and waiting until it returns

Online status: Online

A loop might be for 10 runs with one or two seconds inbetween.

In short, I believe commit 1eb6a9b actually uncovered a real issue in the test.

Metadata Update from @frenaud:
- Issue assigned to frenaud

4 years ago

Metadata Update from @frenaud:
- Custom field on_review adjusted to https://github.com/freeipa/freeipa/pull/4383

4 years ago

master:

  • 3753862 ipatests: wait for SSSD to become online in backup/restore tests

ipa-4-8:

  • ebb3c22 ipatests: wait for SSSD to become online in backup/restore tests

Metadata Update from @abbra:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

4 years ago

ipa-4-7:

  • dcdab7b ipatests: wait for SSSD to become online in backup/restore tests

Login to comment on this ticket.

Metadata