#49013 Fix nunc-stans 0.2.0 update issues in DS
Closed: Fixed None Opened 2 years ago by firstyear.

It seems updating nunc-stans and enabling it causes some issues (which we expected)

We should run the full test suites over the code, find the issues, and fix them.

The first detected issue was:

py.test dirsrvtests/tests/suites/paged_results/paged_results_test.py::test_multi_suffix_search

Nunc-stans causes the logs to not be flushed correctly.


Cool, I'll run these and see if I get similar results. Sorry it may take a few days to complete the fix though if that's okay.

That's ok, we're just trying to catch bugs earlier, before they land in downstream.

I think we are in agreement then, I've been making it one of my goals to find issues before they get released. It's probably worth my time to just fix the issues I find in the tests regardless of this is the source of the issue.

"""The server does not shutdown cleanly - it shuts down too fast
actually. No guardian file gets written and we get a disorderly
shutdown after every restart:

[20/Oct/2016:10:49:21.029568919 -0400] - INFO - main -
389-Directory/1.3.6.0.20161020git61c72f9 B2016.294.1449 starting up
[20/Oct/2016:10:49:21.046759149 -0400] - NOTICE - dblayer_start -
Detected Disorderly Shutdown last time Directory Server was running,
recovering database.
[20/Oct/2016:10:49:21.190043099 -0400] - INFO - slapd_daemon - slapd
started. Listening on All Interfaces port 389 for LDAP requests"""

Right, it looks like in daemon.c when we are in the nunc-stans loop at line 1190, we are all great. But when we exit ns_thrpool_wait(tp), we skip the remained of slapd_daemon. This causes:

  • conn table clean up to not happen.
  • db lock files aren't added or cleaned up properly.
  • we don't run log_access_flush() at the end.

So there are a heap of little flow ons. I don't think NS 0.2.0 is the culprit, but more the enabling by default of NS.

So now to work out why. But progress is happening on this issue.

These changes pass the majority of the test suites. I will investigate the failures.

Okay, it looks like the failures are:

  • Missing path for ns-newaccount.pl
  • valgrind failure (because I don't have valgrind ...)
  • replication related cleaning issues
  • False negatives from the DSE search ...
  • Which is triggered by a memleak in snmpagent.

So I think I'll be submitting fixes to these in other tickets, not this one.

Looks good except for one area in daemon.c:

{{{
1167 for (ii = 0; ii < listeners; ++ii) {
1176 for (size_t ii = 0; ii < listeners; ++ii) {
1168 1177 listener_idxs[ii].ct = the_connection_table; / to pass to handle_new_connection /
1169 1178 ns_add_io_job(tp, listener_idxs[ii].listenfd, NS_JOB_ACCEPT|NS_JOB_PERSIST|NS_JOB_PRESERVE_FD,
1170 1179 ns_handle_new_connection, &listener_idxs[ii], &listener_idxs[ii].ns_job);
}}}

listeners is an int (and listener_idxs is calloced using listeners), so why change ii to size_t (and why declare size_t in the for statement - I thought this causes compiler errors/warnings)?

Thanks,
Mark

It's an array, so it's indexed by a type of size_t. So this change is correct.

As well, we have enabled C99 mode across the board recently, so this is now valid. :)

commit 50f6881
Writing objects: 100% (7/7), 2.16 KiB | 0 bytes/s, done.
Total 7 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
49ac334..50f6881 master -> master

Thanks Mark!

Metadata Update from @firstyear:
- Issue assigned to firstyear
- Issue set to the milestone: 1.3.6.0

2 years ago

Login to comment on this ticket.

Metadata