I have a 389 DS with 154 replication agreements and in the future I'll 400+. In the current configuration, DS takes 5 minutes to shutdown. If I disable the replication plugin (Multimaster) shutdown takes 3 seconds.
I need the agreements because in my scenario I have two servers in the headquarters of the company with the whole tree and, 150 servers replicating two specific subtrees each one.
It looks like you've done some testing with this. Did you test with zero load the on the server, and it still takes a long time to shutdown? Or does the server needs to be under some type of "update" load?
I just want to setup a easy/efficient testcase.
I going to be focusing on this next, but I'm not sure how much of an improvement can be made in this scenario. Each replication agreement creates a new thread, so when we shut down, each thread has to be notified. This just takes time with this many agreements/threads. I hope to improve the shutdown time, but it will still take much longer than a server with no replication agreements.
I found one bottleneck when shutting down the repl agreements. It my "smaller testcase" of 25 agreements I saw the shutdown go from around 24 seconds to 3 seconds when there is no load. Under load the shutdown went when from 40 seconds to around 10 seconds.
I am getting my numbers from the error log:
[12/Mar/2012:12:43:00 -0400] - slapd shutting down - closing down internal subsystems and plugins
-----> Agreements being shutdown
[12/Mar/2012:12:43:09 -0400] - Waiting for 4 database threads to stop
I would very much be interesting in seeing the numbers from the system that has the 150+ agreements.
Diegows, is this something you would be willing to test?
Patch looks good - would like to know if this is sufficient?
git merge ticket271
.../plugins/replication/repl5_inc_protocol.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
[mareynol@localhost servers]$ git push origin master
Counting objects: 13, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 956 bytes, done.
Total 7 (delta 5), reused 0 (delta 0)
68c4ee3..3f960dc master -> master
Deigows, I too would like to see the impact this patch has on your environment.
I'm going to close this bug, but if you could email both Rich and I with your results, when you get them, that would be great. I'm sure the stop is still going to be well over a minute, but it should be much better than 5 minutes.
originally targeted for 1.2.11.rc1, but actually in the 1.2.11.a1 release
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=830348
Added initial screened field value.
Metadata Update from @rmeggins:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.2.11.a1
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here:
If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)
to comment on this ticket.