#3463 TESTS: make intgcheck is not always passing in the internal CI (enumeration tests)
Closed: Fixed 2 years ago by mzidek. Opened 3 years ago by mzidek.

The enumeration tests in make intgcheck are sometimes failing (last two weeks quite often).

It happens mostly on the Debian machine, but other machines can fail too. Lukas mentioned that it also happened while running the tests locally.

It may be bug in the enumeration tests or in the SSSD's enumeration code.


Failures in tests are not acceptable. Therefore changing priority to blocker.

Metadata Update from @lslebodn:
- Issue priority set to: blocker

3 years ago

There is no reason for triage.
Failure in test (which is far from rare) is automatically a blocker for next release.

Metadata Update from @lslebodn:
- Issue set to the milestone: SSSD 1.15.4

3 years ago

I've set up a Debian testing machine on my personal server and could reproduce the issue within a few runs of make intgcheck-run INTGCHECK_PYTEST_ARGS="-k test_add_remove_user".

At the first look, the problem seems to occur because the time we wait between adding the user to LDAP and testing that the operation is reflected on SSSD may not be enough.

I've tweaked the test a little bit, as shown below, and left the test running till it fails ...

root@testing:~/sssd/x86_64# git diff
diff --git a/src/tests/intg/test_enumeration.py b/src/tests/intg/test_enumeration.py
index 47772dea2..32e666e4b 100644
--- a/src/tests/intg/test_enumeration.py
+++ b/src/tests/intg/test_enumeration.py
@@ -406,7 +406,7 @@ def user_and_groups_rfc2307_bis(request, ldap_conn):
 def test_add_remove_user(ldap_conn, blank_rfc2307):
     """Test user addition and removal are reflected by SSSD"""
     e = ldap_ent.user(ldap_conn.ds_inst.base_dn, "user", 2001, 2000)
-    time.sleep(INTERACTIVE_TIMEOUT/2)
+    time.sleep(INTERACTIVE_TIMEOUT)
     # Add the user
     ent.assert_passwd(ent.contains_only())
     ldap_conn.add_s(*e)

I'll update the status here whenever it fails or after a day or two of non-failure runs.

It's been running for the last 8+ hours without any failure.

Changing hardcoded timeout without explanation "why" is not sufficient solution. (because you might hide a bug)

The first problem is that current value in test is not described. So we cannot easily say whether issue is in sssd or in test.

Example of description in other place:
https://pagure.io/SSSD/sssd/blob/master/f/src/tests/intg/test_secrets.py#_428

Here is closed PR with attempt to fix this issue by modifying timeout/sleep values. It may be of some value for the person that will end up looking into this:

https://github.com/SSSD/sssd/pull/345

Temporary workaround which should be reverted before 1.15.4
master:

Unfortunately we haven't been able to figure out the root cause and at the same time we must release 1.16.0 no later than tomorrow (Oct-20). Therefore moving to 1.16.1

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 1.16.1 (was: SSSD 1.15.4)

3 years ago

Since the commit that removed the tests was reverted in 26803ff I'm closing this ticket.

@lslebodn please feel free to reopen this ticket, i would have asked you before closing it in the first place normally, but since we're about to release the new tarball and you're away, I'm closing without asking :)

Metadata Update from @jhrozek:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Since the commit that removed the tests was reverted in 26803ff I'm closing this ticket.

I do not think that this ticket is fixed by commit 26803ff.
Therefore re-openning.

Metadata Update from @lslebodn:
- Issue set to the milestone: SSSD Continuous integration (was: SSSD 1.16.1)
- Issue status updated to: Open (was: Closed)

3 years ago

For the reference: https://github.com/SSSD/sssd/pull/956

Testing periodic tasks (which is also enumeration) is quite tricky.
Because ti is not easy to find out when the 1st enumeration was triggered.
And testing should be ideally done in the middle between two periodic tasks.
INTERACTIVE_TIMEOUT/2 is quite confusing and should be probably changed to some constant e.g. (INITIAL_DELAY).

Ideal would be to have initial delay flexible instead of hard-coded. cause different architectures/distributions might trigger tasks slightly differently. Tests passed without any issue with {0,1,2,3,4} as initial delay delay on bare metal machine. But virtual machines are more problematic IIRC. (And i could see issues sometime on arm32)

my 2c

Testing periodic tasks (which is also enumeration) is quite tricky.
Because ti is not easy to find out when the 1st enumeration was triggered.
And testing should be ideally done in the middle between two periodic tasks.

Why is it necessary to do in the middle?

Why checking after (ENUMERATION_PERIOD + DELAY),
where ENUMERATION_PERIOD = https://github.com/SSSD/sssd/blob/master/src/tests/intg/test_enumeration.py#L151 , and DELAY is probably equal to ENUMERATION_PERIOD,
wont work?

INTERACTIVE_TIMEOUT/2 is quite confusing and should be probably changed to some constant e.g. (INITIAL_DELAY).

If I understood Michal's explanation correctly (probably I didn't), for most of the tests this is not "initial" but merely periodic task.
I mean, test should expect that next task will only be run in ENUMERATION_PERIOD + DELAY in a worst case.

Commit 116b144 relates to this ticket

Some new patches have landed in the master. Closing this for now. We can open a new issue or re-open this one if the problem still persists.

master:
116b144
3477f2c

Metadata Update from @mzidek:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/4489

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata