#49257 Manual resetting of nsslapd-dbcachesize using ldapmodify , values reverts to optimal value after restart of service.
Closed: fixed 2 years ago Opened 3 years ago by mreynolds.

Ticket was cloned from Red Hat Bugzilla (product Red Hat Enterprise Linux 7): Bug 1450896

Please note that this Bug is private and may not be accessible as it contains confidential Red Hat customer information.

Description of problem:

Manual resetting of nsslapd-dbcachesize  using ldapmodify , values reverts to
optimal value after restart of service.

------------------------------------------------------------------------------
[root@qe-blade-01 ~]# ldapsearch -h 127.0.0.1 -b "cn=config" -D "cn=Directory
Manager" -w Secret123 nsslapd-cache-autosize nsslapd-dbcachesize |grep -e
nsslapd-cache-autosize -e nsslapd-dbcachesize
# requesting: nsslapd-cache-autosize nsslapd-dbcachesize
nsslapd-dbcachesize: 160999144


[root@qe-blade-01 ~]# ldapmodify -h 127.0.0.1  -D "cn=Directory Manager" -w
Secret123 -x  << EOF
dn: cn=config,cn=ldbm database,cn=plugins,cn=config
changetype: modify
replace: nsslapd-dbcachesize
nsslapd-dbcachesize: 260999144
EOF

modifying entry "cn=config,cn=ldbm database,cn=plugins,cn=config"

[root@qe-blade-01 ~]# ldapsearch -h 127.0.0.1 -b "cn=config" -D "cn=Directory
Manager" -w Secret123 nsslapd-cache-autosize nsslapd-dbcachesize |grep -e
nsslapd-cache-autosize -e nsslapd-dbcachesize
# requesting: nsslapd-cache-autosize nsslapd-dbcachesize
nsslapd-dbcachesize: 260999144

[root@qe-blade-01 ~]# restart-dirsrv
Restarting instance "qe-blade-01"

[root@qe-blade-01 ~]# ldapsearch -h 127.0.0.1 -b "cn=config" -D "cn=Directory
Manager" -w Secret123 nsslapd-cache-autosize nsslapd-dbcachesize |grep -e
nsslapd-cache-autosize -e nsslapd-dbcachesize
# requesting: nsslapd-cache-autosize nsslapd-dbcachesize
nsslapd-dbcachesize: 160999144
------------------------------------------------------------------------------

The above issue is due to "nsslapd-dbcachesize" dependent on
"nsslapd-cache-autosize" value.  Only if value of "nsslapd-cache-autosize" is 0
, admin will be able to change the value.

Based on logs there is no warning or information pointing to above dependency.

Warning should be logged of dependency when "nsslapd-dbcachesize" values is
being changed.

Version-Release number of selected component (if applicable):

389-ds-base-1.3.6.1-13.el7.x86_64

How reproducible:

Always

Steps to Reproduce:

Give above

Actual results:

Not able to change the 389-ds-base-1.3.6.1-13.el7.x86_64 value manually due to
no dependency alert.

Expected results:

Should able to change the 389-ds-base-1.3.6.1-13.el7.x86_64 value manually
after dependency alert.


Additional info:

Metadata Update from @mreynolds:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1450896

3 years ago

Metadata Update from @mreynolds:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1450896

3 years ago

Metadata Update from @mreynolds:
- Issue assigned to mreynolds

3 years ago

This behaviour is documented here:

http://www.port389.org/docs/389ds/design/autotuning.html

However, I agree there should be a warning in the errors log though

Metadata Update from @mreynolds:
- Custom field type adjusted to defect

3 years ago

According to http://www.port389.org/docs/389ds/design/autotuning.html#what-about-admins-who-want-to-manually-tune, I think manually set values should be respected. In addition it makes sense to preserve admin settings. So I agree that some additional warnings will be welcome but IMHO it looks more like a bug.

According to http://www.port389.org/docs/389ds/design/autotuning.html#what-about-admins-who-want-to-manually-tune, I think manually set values should be respected. In addition it makes sense to preserve admin settings. So I agree that some additional warnings will be welcome but IMHO it looks more like a bug.

Actually it says:

If autosize value has been set by the admin, we ignore dbcachesize and always use the autotuned value.

So in this case nsslapd-cache-autosize is set, so it will override the nsslapd-dbcachesize at the next startup. So setting the dbcache size while auto-sizing is turned on has no real effect.

The patch I am about to submit for review will actually reject nsslapd-dbcachesize updates if autotuning is enabled. At first i thought a warning was okay, but it's not. If the update has no effect, and it is overwritten anyway, then it should be rejected so the client knows the update did not work as intended. Thoughts?

Sorry my mistake. You are right the code follows the design.

Now I tend to disagree with the design on that specific point.
Only the admin knows the constraint of DS running on the host. If admin prefers a given value, he should be allowed to set it accordingly. I would just reject extreme values like <10Mb.

Sorry my mistake. You are right the code follows the design.
Now I tend to disagree with the design on that specific point.
Only the admin knows the constraint of DS running on the host. If admin prefers a given value, he should be allowed to set it accordingly. I would just reject extreme values like <10Mb.

The admin just has to turn autosizing off, then they can set it to whatever they want. That does seem reasonable to me. And in my patch (yet to be provided) it explicitly states in the client error message and errors log that the autosizing setting needs to be removed in order to set the dbcachesize manually.

Or, if we allow dbcachesize to be set, then we need to automatically turn off autosizing so the cachesize is preserved and takes effect through the next server startup. Personally I don't like this option, but perhaps I could be persuaded with a really good argument :)

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to review

3 years ago

I think this looks okay to me. It would be worth writing a lib389 test to show the different behaviours, but I can do that sometime if you like . ack

Metadata Update from @firstyear:
- Custom field reviewstatus adjusted to ack (was: review)

3 years ago

I agree the patch is valid. Ack

Now just to clarify my thought, autotuning is a feature to adapt the value of a set of attributes to the machine DS is running on. I think we should have the following order of priority in settings for each tunable values: default, autotune, admin settings. So we can imagine, autotuning being on, some attributes set by admin and others leave to autotune setting.

For example, some customers using external database may need to increase the number of workers (for example if the external DB is slow) and so force threadnumber value. But a the same time the customer wants to leave to autotune the responsibility of tuning entry/db caches.
Is that possible with the current implementation ?

I think this looks okay to me. It would be worth writing a lib389 test to show the different behaviours, but I can do that sometime if you like . ack

Simon is writing up testcases for this right now. I'll work with him to add the proper tests

I agree the patch is valid. Ack
Now just to clarify my thought, autotuning is a feature to adapt the value of a set of attributes to the machine DS is running on. I think we should have the following order of priority in settings for each tunable values: default, autotune, admin settings. So we can imagine, autotuning being on, some attributes set by admin and others leave to autotune setting.
For example, some customers using external database may need to increase the number of workers (for example if the external DB is slow) and so force threadnumber value. But a the same time the customer wants to leave to autotune the responsibility of tuning entry/db caches.
Is that possible with the current implementation ?

Yes. The cache sizing is strictly handled by this attribute: nsslapd-cache-autosize. This has no effect on thread numbers. Thread auto tuning is only applied if the nsslapd-threadnumber is set to "-1". So they are independent "auto tunings".

71a98aa..910a4ce 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

04635e4..9327a99 master -> master

We need to also reject the entry cache settings when using autosizing

0001-Ticket-49257-Reject-nsslapd-cachememsize-nsslapd-cac.patch

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to review (was: ack)

3 years ago

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.3.6.0 (was: 0.0 NEEDS_TRIAGE)

3 years ago

The patch looks good but I think that fix limits the flexibility of the admin.

So far, setting of entrycache size have been dynamically taken into account.
So if an admin want to "play" with entry cache he was able to reduce/increase it and it was immediately taken into account. IMHO this is useful for tuning/testing.

If autotune is setup, whatever the admin set it will be overwritten at startup. So the initial value is autotuned. Is it a problem to keep the ability to tune the cache while running ?

Actually, looking at the code I don't think the dbcache/entry cache is supposed to get reset at every startup. Only if the dbcache is set to 0., but there is a bug that resets it regardless of its value. In that case we should be allowing these updates! So I might have to revert part of my last commit. I've sent @firstyear an email confirming the desired behavior. So I'm putting this issue on hold until I hear from him

@mreynolds and @tbordaz I just responded to mark. I think that there isn't a bug in the value code in start.c.

The statement which Mark called into question is:

    if (li->li_dbcachesize == 0 || li->li_cache_autosize > 0) {

So here, if li_dbcachesize is 0, we want to trigger the autotuning - this is reasonable, we can't start with a 0 value.

The question about the "||", and the cache_autosize being value.

So the logic is that if you have said "Yes, I really have a cache_autosize value" then you want the dbcache tuning to always apply.

In start.c, we never actually change this value, even autotuning. We use a value external called autosize_percentage, to avoid messing with the user setting.

As a result, I suspect if you are seeing an issue here it could be that li_cache_autosize is not 0.

I think I would need to see a reproducer cases to really know more, but tracing all the logic and the code, I can't see a fault I'm sorry :(

I hope this helps,

@mreynolds and @firstyear I finally change my mind and agree with you. If autotuning is set, cache values will be in the hand of autotuning both at startup and also while the server is running.
If admin wants to test others values, autotuning needs to be disabled first.

Ack.

Metadata Update from @tbordaz:
- Custom field reviewstatus adjusted to ack (was: review)

3 years ago

What I'm seeing is that we run "autotune" at every startup if autosizing is not zero - regardless if the dbachesize is set or not. But according to the comments in the code, if dbcachesize is set we don't autosize. So which is it?

I'm fine either way I just want to know because I'm writing fixes to reject updates to dbcachesize/cachememsize if autosizing is enabled because we always autosize at startup. But if there are cases where we don't auto size because dbcachesize is already set, then we should allow updates to the dbcache/entry cache, and then there is also a bug in the logic as I previously mentioned(because we ALWAYS autosize at startup). So I just need clarification on the design :)

Or let me ask you this way, If auto sizing is set, are we supposed to always autosize at startup? If yes, great, I'll continue my work, if not, then we have a bug in start.c and I also need to revert my "rejection" fix.

What I'm seeing is that we run "autotune" at every startup if autosizing is not zero - regardless if the dbachesize is set or not. But according to the comments in the code, if dbcachesize is set we don't autosize. So which is it?

I'm fine either way I just want to know because I'm writing fixes to reject updates to dbcachesize/cachememsize if autosizing is enabled because we always autosize at startup. But if there are cases where we don't auto size because dbcachesize is already set, then we should allow updates to the dbcache/entry cache, and then there is also a bug in the logic as I previously mentioned(because we ALWAYS autosize at startup). So I just need clarification on the design :)

Or let me ask you this way, If auto sizing is set, are we supposed to always autosize at startup? If yes, great, I'll continue my work, if not, then we have a bug in start.c and I also need to revert my "rejection" fix.

So there are "three cases".

  • You have a new server. autosize is 0, dbcache is 0. autotune will run once and set dbcache to non-zero. Done.
  • You have a server where autosize is 0, and dbcache > 0. Autotune won't run.
  • You have a server where autosize is > 0. regardless of dbcache setting, you will have autosizing run.

Does that clear it up?

So there are "three cases".

You have a new server. autosize is 0, dbcache is 0. autotune will run once and set dbcache to non-zero. Done.
You have a server where autosize is 0, and dbcache > 0. Autotune won't run.
You have a server where autosize is > 0. regardless of dbcache setting, you will have autosizing run.

Does that clear it up?

If you say so :) The comments in the code are vague so I wanted your confirmation on what its supposed to be doing.

 Second, once the admin sets a value, or autotuning set a value, it sticks.

But it doesn't stick if autotuning is set. This comment implies that you can override the auto sizing by setting dbcachesize manually and it sticks, but in fact autosizing must be disabled otherwise it will always over write it at the next startup.

As long as it's doing what you designed it to do then I'm fine, I just wanted your confirmation.

ca548a7..35c0834 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

b81c8ba..f39cc62 master -> master

Okay, it may be a communication issue. Think of it as

  • the new autotuning feature is actually a once-off manual tune
  • manual tuning still works
  • autosizing still over-rides all if you enable it.

Does that make more sense?

I've added the reproducer for the feature to https://pagure.io/389-ds-base/issue/49021

Party, the test case fails because of the issue we discuss here. I'll copy my comment from there here:

"I've applied the comments to the new patch.
It has some failures now though that relates to the fact we can't edit 'nsslapd-cachememsize'.

In http://www.port389.org/docs/389ds/design/autotuning.html under Manual tuning in detail section, we can find the information that we can autotune the values once again (not only on the instance installation time).
It can be done with setting nsslapd-cachememsize and nsslapd-cache-autosize to 0 (and restarting).

if (cachememsize == 0 && nsslapd-cache-autosize == 0) || nsslapd-cache-autosize > 0:
    cachememsize = auto entry cachesize value, and write to dse.ldif

Now this operation works for nsslapd-dbcachesize only.

if (dbcachesize == 0 && nsslapd-cache-autosize == 0) || nsslapd-cache-autosize > 0:
    dbcachesize = auto db cachesize value, and write to dse.ldif

You can verify all of this with my patch."

@spichugi this ticket now rejects updates to dbcachesize if nsslapd-cache-autosize is set. So the failures in your testcase are expected. You have to first set nsslapd-cache-autosize to zero before you can update nsslapd-dbcachesize and nsslapd-cachememsize.

Okay, it may be a communication issue. Think of it as

the new autotuning feature is actually a once-off manual tune
manual tuning still works
autosizing still over-rides all if you enable it.

Does that make more sense?

You didn't answer by questions though, and to me its still vague. But you didn't reject my patches so I'm assuming I was correct:

  • If autosizing is on, you CAN NOT set the dbcachesize
  • If autosizing is on, we ALWAYS resize at every startup

This is the current behaviour, and I'll assume it's by design.

We have inconsistency then (you can reproduce it with my test suite from https://pagure.io/389-ds-base/issue/49021):
- trying to set dbcachesize while nsslapd-cache-autosize is set - no UNWILLING_TO_PERFORM and operation is successful;
- trying to set cachememsize while nsslapd-cache-autosize is set - UNWILLING_TO_PERFORM happens.

We have inconsistency then (you can reproduce it with my test suite from https://pagure.io/389-ds-base/issue/49021):
- trying to set dbcachesize while nsslapd-cache-autosize is set - no UNWILLING_TO_PERFORM and operation is successful;
- trying to set cachememsize while nsslapd-cache-autosize is set - UNWILLING_TO_PERFORM happens.

I reproduced this with your test script, investigating...

Okay, the script should not test for UNWILLING_TO_PERFORM when using "0" for dbcachesize - this is allowed as its also forces the cache size to be retuned at startup

Once I fixed that, I got another error when setting autosize to 20. The issue is that autosizing caps out at 536870912 - so even though you set the autosize percentage to 20, thats still more than the cap, so it gets reset..

Okay, the script should not test for UNWILLING_TO_PERFORM when using "0" for dbcachesize - this is allowed as its also forces the cache size to be retuned at startup

If autosize is non 0, it's going to be reset at startup anyway, so it's a moot-ish point.

Once I fixed that, I got another error when setting autosize to 20. The issue is that autosizing caps out at 536870912 - so even though you set the autosize percentage to 20, thats still more than the cap, so it gets reset..

For the DB cache it has a cap based on lkrispens advice. This is well commented in the code :)

Okay, the script should not test for UNWILLING_TO_PERFORM when using "0" for dbcachesize - this is allowed as its also forces the cache size to be retuned at startup

If autosize is non 0, it's going to be reset at startup anyway, so it's a moot-ish point.

Right...

Once I fixed that, I got another error when setting autosize to 20. The issue is that autosizing caps out at 536870912 - so even though you set the autosize percentage to 20, thats still more than the cap, so it gets reset..

For the DB cache it has a cap based on lkrispens advice. This is well commented in the code :)

I know...

My last update was directed at Simon about his testcase - not about the autosize design.

My last update was directed at Simon about his testcase - not about the autosize design.

I've added a patch that passes to https://pagure.io/389-ds-base/issue/49021
So currently I've put next checks in the test case (the one when we have autosizing 'on'):

  • trying to set nsslapd-dbcachesize to 0 while nsslapd-cache-autosize is set - no UNWILLING_TO_PERFORM and operation is successful;

  • trying to set nsslapd-dbcachesize to 3333333 while nsslapd-cache-autosize is set - no UNWILLING_TO_PERFORM and operation is successful;

  • trying to set nsslapd-cachememsize to 0 while nsslapd-cache-autosize is set - UNWILLING_TO_PERFORM happens;

  • trying to set nsslapd-cachememsize to 3333333 while nsslapd-cache-autosize is set - no UNWILLING_TO_PERFORM and operation is successful;

My last update was directed at Simon about his testcase - not about the autosize design.

I've added a patch that passes to https://pagure.io/389-ds-base/issue/49021
So currently I've put next checks in the test case (the one when we have autosizing 'on'):

trying to set nsslapd-dbcachesize to 0 while nsslapd-cache-autosize is set - no UNWILLING_TO_PERFORM and operation is successful;

Correct

trying to set nsslapd-dbcachesize to 3333333 while nsslapd-cache-autosize is set - no UNWILLING_TO_PERFORM and operation is successful;

This should be rejected with UNWILLING_TO_PERFORM

trying to set nsslapd-cachememsize to 0 while nsslapd-cache-autosize is set - UNWILLING_TO_PERFORM happens;

Correct

trying to set nsslapd-cachememsize to 3333333 while nsslapd-cache-autosize is set - no UNWILLING_TO_PERFORM and operation is successful;

This should be rejected with UNWILLING_TO_PERFORM

My last update was directed at Simon about his testcase - not about the autosize design.
I've added a patch that passes to https://pagure.io/389-ds-base/issue/49021
So currently I've put next checks in the test case (the one when we have autosizing 'on'):
trying to set nsslapd-dbcachesize to 0 while nsslapd-cache-autosize is set - no UNWILLING_TO_PERFORM and operation is successful;

Correct

trying to set nsslapd-dbcachesize to 3333333 while nsslapd-cache-autosize is set - no UNWILLING_TO_PERFORM and operation is successful;

This should be rejected with UNWILLING_TO_PERFORM

trying to set nsslapd-cachememsize to 0 while nsslapd-cache-autosize is set - UNWILLING_TO_PERFORM happens;

Correct

trying to set nsslapd-cachememsize to 3333333 while nsslapd-cache-autosize is set - no UNWILLING_TO_PERFORM and operation is successful;

This should be rejected with UNWILLING_TO_PERFORM

Ok, I've updated the ticket and its execution now has the failures.

The next failures we have now:

  • trying to set nsslapd-cachememsize to 0 while nsslapd-cache-autosize is set - UNWILLING_TO_PERFORM isn't raised now;

  • trying to set nsslapd-dbcachesize to 3333333 while nsslapd-cache-autosize is set - UNWILLING_TO_PERFORM isn't raised now;

Found regression with my fix that caused backend creation to fail, this patch resolves that issue

0001-Ticket-49257-only-register-modify-callbacks.patch

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to review (was: ack)

2 years ago

Metadata Update from @firstyear:
- Custom field reviewstatus adjusted to ack (was: review)

2 years ago

d74ba63..32d46d2 master -> master

d77e285..3c8affb 389-ds-base-1.3.6 -> 389-ds-base-1.3.6 (CI test)

Metadata Update from @mreynolds:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Missing a commit:

3c8affb..22f4326 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

Metadata Update from @mreynolds:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1450896 https://bugzilla.redhat.com/show_bug.cgi?id=1492821 (was: https://bugzilla.redhat.com/show_bug.cgi?id=1450896)
- Custom field version adjusted to None

2 years ago

Login to comment on this ticket.