#49028 Autotune cache sizes by default
Closed: wontfix 5 years ago Opened 5 years ago by firstyear.

Right now, our backends are statically tuned by default.

We should change to automatic tuning by default. This will help projects like FreeIPA that currently do no memory tuning, and also make Admins lives easier as systems will be "performant" out of the box.


Note: That was the original intention, but we decided not to due to the so many failures observed in the automated tests. Back then, we had to support several different OS'es. That was definitely one of the reasons the automated style was disliked. So, the situation might be improved now. But still we have to be careful, IMHO.

Also, the best configuration at the start up may not be the best in the long run. We have to consider the necessary server's growth and some fragmentation.

We may want to have some ability to check the cache size periodically and adjust to the available physical memory at the check time?

On the other hand, the entry cache may disappear once we have an improved backend (we have no solid time table yet, though...)

I think we have to be careful yes, but I have put in a lot of work to fix the memory detection code. I think it's very possible to realise this now.

I think that we just need to be careful in the "percentage" of memory that we set initially. If we set too high, we will hit the fragmentation, too low, we won't use the hardware properly. But right now anything is better than nothing.

When we update the value from the percentage, we update the dbcachesize values in cn=config to show "what" the server selected, this way it's visible to the admin. Then to turn off autotuning would be to remove the autotuning percentage.

Per weekly meeting, we need more discussion for the autotuning overall.

Setting the ticket to FUTURE for now.

{{{
[09/Jan/2017:12:58:45.633322976 +1000] - INFO - main - 389-Directory/1.3.6.1 B2017.09.256 starting up
[09/Jan/2017:12:58:45.727736258 +1000] - ERR - ldbm_back_start - cache autosizing. found 12015832k physical memory
[09/Jan/2017:12:58:45.746115777 +1000] - ERR - ldbm_back_start - cache autosizing. found 1201580k avaliable
[09/Jan/2017:12:58:45.758882865 +1000] - ERR - ldbm_back_start - cache autosizing: db cache: 480632k, each entry cache (1 total): 720948k
[09/Jan/2017:12:58:46.089389544 +1000] - INFO - slapd_daemon - slapd started. Listening on All Interfaces port 389 for LDAP requests
}}}

Log output from a fresh install of Directory Server on my laptop. Note the use of 10% of system memory, and the split to dbcache. This patch adds a cap to the dbcachesize as per Ludwig's advice.

The patch looks good

Reading autotune design/code there is something I am not sure regarding the entry cache.

My understanding is that the autotune entry cache (li_cache_autosize_ec) is not cap. I think it should for example at 300Mb. IMHO having a cache>300-400Mb will consume free memory and reduce the file system cache, but will likely not give additional performance. In fact if you need to cache most of the db entries, it needs to be tested and then the value being set, autotune will keep it.

Also what happens if there are 2 or 3 suffixes, is the autosize-split amount of memory be shared among all suffixes ?

Replying to [comment:7 tbordaz]:

The patch looks good

Based on emails overnight, I'm going to change the patch somewhat from this point.

Reading autotune design/code there is something I am not sure regarding the entry cache.

My understanding is that the autotune entry cache (li_cache_autosize_ec) is not cap. I think it should for example at 300Mb. IMHO having a cache>300-400Mb will consume free memory and reduce the file system cache, but will likely not give additional performance. In fact if you need to cache most of the db entries, it needs to be tested and then the value being set, autotune will keep it.

The current behavior is when we set the value it's written to dbcachesize and cachememsize, so they are set once and left alone.

I thought the advice was to cap the bdb cachesize at ~500mb as after that we don't gain a lot, then we use the remaining space we have allocated for entry cache.

Additionally, my default is 10% of system free ram, so we are not sacrificing the VFS cache for DS entry cache. Were we setting say 50% or more this would be a concern, but here on a 16GB system, we only alloc 1.6GB of entry cache. So I'm not sure I share your concern, because the more of the entries you can keep in the entry cache, the better the performance in my experience.

Also what happens if there are 2 or 3 suffixes, is the autosize-split amount of memory be shared among all suffixes ?

It's a bit annoying, but if there are 2 suffixes we'll see:

16Gb of ram, 1.6GB of "cache".

~500MB to db cache.
~1100MB for entry cache.
entry cache size = 1100 / (number of suffixes)

So for two suffixes, each one recieves ~550MB. For three, we would see ~375MB.

We could do something smarter than this, but I think for now it's not a big deal.

This patch fixes the autotuning to be a "once off" at a new install. So when we run DS the first time we see:

{{{
[10/Jan/2017:14:53:47.535435145 +1000] - INFO - main - 389-Directory/1.3.6.1 B2017.010.450 starting up
[10/Jan/2017:14:53:47.630742922 +1000] - NOTICE - ldbm_back_start - found 12015832k physical memory
[10/Jan/2017:14:53:47.649688422 +1000] - NOTICE - ldbm_back_start - found 1201580k avaliable
[10/Jan/2017:14:53:47.666961469 +1000] - NOTICE - ldbm_back_start - cache autosizing: db cache: 480632k
[10/Jan/2017:14:53:47.679660999 +1000] - NOTICE - ldbm_back_start - cache autosizing: userRoot entry cache (1 total): 786432k
[10/Jan/2017:14:53:47.706106896 +1000] - NOTICE - ldbm_back_start - total cache size: 1215817318 B;
[10/Jan/2017:14:53:48.102515612 +1000] - INFO - slapd_daemon - slapd started. Listening on All Interfaces port 389 for LDAP requests
}}}

These values are then written to dse.ldif. For example:

{{{
nsslapd-dbcachesize: 393733734
nsslapd-cachememsize: 805306368
}}}

Now, when we restart the server, because we have values in dbcachesize and cachememsize, they are used. We can see the "lack of autosizing" on the next restart.

{{{
[10/Jan/2017:14:54:30.591478249 +1000] - INFO - main - 389-Directory/1.3.6.1 B2017.010.450 starting up
[10/Jan/2017:14:54:30.718371542 +1000] - NOTICE - ldbm_back_start - found 12015832k physical memory
[10/Jan/2017:14:54:30.731066219 +1000] - NOTICE - ldbm_back_start - found 1201580k avaliable
[10/Jan/2017:14:54:30.746010062 +1000] - NOTICE - ldbm_back_start - total cache size: 1215817318 B;
[10/Jan/2017:14:54:31.273191968 +1000] - INFO - slapd_daemon - slapd started. Listening on All Interfaces port 389 for LDAP requests
}}}

If we reset one of these values to 0, it's autoresized based on the hardware: IE we can trigger a reset and recalculation.

{{{
nsslapd-dbcachesize: 0
}}}

{{{
[10/Jan/2017:15:10:14.136445574 +1000] - INFO - main - 389-Directory/1.3.6.1 B2017.010.459 starting up
[10/Jan/2017:15:10:14.224748531 +1000] - NOTICE - ldbm_back_start - found 12015832k physical memory
[10/Jan/2017:15:10:14.243204142 +1000] - NOTICE - ldbm_back_start - found 1201580k avaliable
[10/Jan/2017:15:10:14.259212657 +1000] - NOTICE - ldbm_back_start - cache autosizing: db cache: 480632k
[10/Jan/2017:15:10:14.272267758 +1000] - NOTICE - ldbm_back_start - total cache size: 1215817318 B;
[10/Jan/2017:15:10:14.596502795 +1000] - INFO - slapd_daemon - slapd started. Listening on All Interfaces port 389 for LDAP requests
}}}

Note how we only tune the dbcache in this instance.

Finally, if the user sets the autosize value to a custom value (ie not 0), this is used every start up regardless of hardcoded values for dbcache and cachememsize.

So the order of behaviour is:

  • If dbcachesize or cachememsize have a non-zero value, it is used.
  • If dbcachesize or cachememsize have a 0 value, generate an autosize value and store it to dbcachesize / cachememsize.
  • If autosize is > 0, always use autosize values regardless of the presence of dbcachesize / cachememsize.

This is nearly identical behaviour to current server behaviour: The exception being a fresh install gets a scaled default rather than static.

Passes basic and dynamic suites of tests.

Looks very cool.

Some minor questions from my curiosity... :)

When you install a new server, what value do dbcachesize, cachememsize and autosize have?
Based upon the #comment:9, {0, 0, 40%}, respectively?

Can we remove these config attributes? If we can, it works as the same as setting the initial value to them? (I remember config params in cn=config can be deleted... I forgot we allow to do so on the ones in the backend in the same way...)

Do you have an idea how we could describe the recommended scenario in the doc? Most customers could just set {0, 0, 40%} in the beginning? (The first 2 are filled automatically, right?)

Maybe, this design doc could have the new info you implemented? ;) Thanks!!
http://www.port389.org/docs/389ds/design/autotuning.html

Anyway, you have my ack.

Replying to [comment:11 nhosoi]:

Looks very cool.

Some minor questions from my curiosity... :)

When you install a new server, what value do dbcachesize, cachememsize and autosize have?
Based upon the #comment:9, {0, 0, 40%}, respectively?

Almost.

dbcachesize: 0
cachememsize: 0
autosize: 10
autosize-split: 40

So we only use 10% of system memory, to avoid any fragmentation growth, or conflict with IPA components.

Can we remove these config attributes? If we can, it works as the same as setting the initial value to them? (I remember config params in cn=config can be deleted... I forgot we allow to do so on the ones in the backend in the same way...)

Yes, you can just delete them the same way.

Do you have an idea how we could describe the recommended scenario in the doc? Most customers could just set {0, 0, 40%} in the beginning? (The first 2 are filled automatically, right?)

Yeah, I think I can describe this. How about I put that in the doc (as below)?

Maybe, this design doc could have the new info you implemented? ;) Thanks!!
http://www.port389.org/docs/389ds/design/autotuning.html

Sure, I'll update this page shortly.

Anyway, you have my ack.

Thanks! I'm really glad you are happy with this, because I know you had input about the topic.

commit 28e5e15
Writing objects: 100% (13/13), 5.46 KiB | 0 bytes/s, done.
Total 13 (delta 10), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
f0fdc4d..28e5e15 master -> master

Metadata Update from @nhosoi:
- Issue assigned to firstyear
- Issue set to the milestone: 1.3.6 backlog

5 years ago

Issue: a lot of errors and warnings in the errors log.

Build tested: 389-ds-base-1.3.6.1-9.el7.x86_64

Step to reproduce:
1) Install an instance:
[root@qeos-246 upstream]# setup-ds.pl

2) Check the errors log:
[root@qeos-246 upstream]# tail /var/log/dirsrv/slapd-qeos-246/errors

    [02/May/2017:07:19:51.303486751 -0400] - WARN - spal_meminfo_get - Unable to retrieve /sys/fs/cgroup/memory/memory.limit_in_bytes. There may be no cgroup support on this platform
    [02/May/2017:07:19:51.304190154 -0400] - NOTICE - ldbm_back_start - total cache size: 212714127 B;
    [02/May/2017:07:19:51.304821907 -0400] - ERR - _spal_get_uint64_t_file - Unable to open file "/sys/fs/cgroup/memory/memory.soft_limit_in_bytes". errno=13
    [02/May/2017:07:19:51.305735526 -0400] - WARN - spal_meminfo_get - Unable to retrieve /sys/fs/cgroup/memory/memory.soft_limit_in_bytes. There may be no cgroup support on this platform
    [02/May/2017:07:19:51.306290656 -0400] - ERR - _spal_get_uint64_t_file - Unable to open file "/sys/fs/cgroup/memory/memory.limit_in_bytes". errno=13
    [02/May/2017:07:19:51.306856087 -0400] - WARN - spal_meminfo_get - Unable to retrieve /sys/fs/cgroup/memory/memory.limit_in_bytes. There may be no cgroup support on this platform
    [02/May/2017:07:19:51.308164630 -0400] - ERR - _spal_get_uint64_t_file - Unable to open file "/sys/fs/cgroup/memory/memory.usage_in_bytes". errno=13
    [02/May/2017:07:19:51.308736617 -0400] - WARN - spal_meminfo_get - Unable to retrieve /sys/fs/cgroup/memory/memory.limit_in_bytes. There may be no cgroup support on this platform
    [02/May/2017:07:19:51.309812171 -0400] - INFO - dblayer_start - Resizing db cache size: 639334809 -> 61719183
    [02/May/2017:07:19:51.457095244 -0400] - INFO - slapd_daemon - slapd started.  Listening on All Interfaces port 389 for LDAP requests

The issue present only on RHEL 7.4. Fedora is okay.
[root@qeos-246 upstream]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 Beta (Maipo)

Metadata Update from @spichugi:
- Custom field reviewstatus reset (from review?)
- Issue close_status updated to: None (was: Fixed)

5 years ago

Metadata Update from @spichugi:
- Issue status updated to: Open (was: Closed)

5 years ago

The error reported while accessing /sys/fs/cgroup/memory/ are likely related to BZ#1444864.
Autotune is broken because of some invalid SELinux rules

The error reported while accessing /sys/fs/cgroup/memory/ are likely related to BZ#1444864.
Autotune is broken because of some invalid SELinux rules

Thank you! It is.

Going to close this because it's BZ related.

Metadata Update from @firstyear:
- Issue status updated to: Closed (was: Open)

5 years ago

Just a small side note - it was discussed in earlier comments (by Thierry, https://pagure.io/389-ds-base/issue/49028#comment-122428), i would just like to emphasize it. 389ds always has at least two default backends/suffixes (userRoot & NetscapeRoot). The current design gives exactly the same entry cache size (maxentrycachesize) for both of them. It's a complete waste of memory - NetscapeRoot is never larger than 2-5Mb (it is 350k in our production environment for example), while userRoot may be huge (in our production currententrycachesize is ~300Mb ).

So while i would really like to use this new feature to avoid manual tuning of memory using more aggressive settings than 10%/40% (our ldap servers are dedicated to ns-slapd), the huge memory chunk allocated to NetscapeRoot suffix is a showstopper for me.

Well the dbcache is global amongst all backends anyway - this can not be changed. So its really only the entry cache that should "not" be evenly divided amongst all backends. Not all backends are equal :)

I think the best approach here might be to make the entry cache autosizing configurable per backend instance (on or off). "on" by default. Then afterwards the admin can turn it off for netscaperoot, etc. This might add some complexity as I don't think the entry cache for the other backends will autosize themselves once it's already set (unless we always autosize at every startup). Thoughts William?

I think the best approach here might be to make the entry cache autosizing configurable per backend instance (on or off). "on" by default. Then afterwards the admin can turn it off for netscaperoot, etc. This might add some complexity as I don't think the entry cache for the other backends will autosize themselves once it's already set (unless we always autosize at every startup).

+1
Another question:
"on (by default) -> off -> on" would re-calculate the optimal entry cache size when the available memory size has changed since the DS started? If so, that might be a bonus...

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2087

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix

2 years ago

Login to comment on this ticket.

Metadata