Description of problem: From https://bugzilla.redhat.com/show_bug.cgi?id=1349577 There is another overflow in performance counters: ./ldap/servers/slapd/back-ldbm/perfctrs.h: 33 PRUint32 page_access_rate; ./ldap/servers/slapd/back-ldbm/perfctrs.c: 167 perf->cache_size_bytes = mpstat->st_gbytes * ONEG + mpstat->st_bytes; 168 perf->page_access_rate = mpstat->st_cache_hit + mpstat->st_cache_miss; <=== 169 perf->cache_hit = mpstat->st_cache_hit; 170 perf->cache_try = mpstat->st_cache_hit + mpstat->st_cache_miss; <=== Also objects-locked has the value of page_access_rate: 260 { SLAPI_LDBM_PERFCTR_AT_PREFIX "objects-locked", 261 offsetof( performance_counters, page_access_rate ) }, 262 { SLAPI_LDBM_PERFCTR_AT_PREFIX "page-create-rate", 263 offsetof( performance_counters, page_create_rate ) }, ldapsearch -x -D "cn=Directory Manager" -w Secret123 -b "cn=monitor,cn=ldbm database,cn=plugins,cn=config" -LLL dbcachetries nsslapd-db-objects-locked dn: cn=monitor,cn=ldbm database,cn=plugins,cn=config dbcachetries: 509565 dn: cn=database,cn=monitor,cn=ldbm database,cn=plugins,cn=config nsslapd-db-objects-locked: 509565 Version-Release number of selected component (if applicable): 389-ds-base-1.3.5.10-11.el7.x86_64
Metadata Update from @nhosoi: - Issue set to the milestone: 1.3.6.0
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to new - Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1377452 https://bugzilla.redhat.com/show_bug.cgi?id=1349577 (was: https://bugzilla.redhat.com/show_bug.cgi?id=1377452) - Issue close_status updated to: None
Metadata Update from @mreynolds: - Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1377452, https://bugzilla.redhat.com/show_bug.cgi?id=1349577 (was: https://bugzilla.redhat.com/show_bug.cgi?id=1377452 https://bugzilla.redhat.com/show_bug.cgi?id=1349577)
Metadata Update from @mreynolds: - Issue assigned to mreynolds
<img alt="0001-Issue-48989-Overflow-in-counters-and-monitor.patch" src="/389-ds-base/issue/raw/files/915a0801849bba2f6097bc6ea356b0d0257b2bbc57a85e1d90b4107e9fad04f9-0001-Issue-48989-Overflow-in-counters-and-monitor.patch" />
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to review (was: new)
Don't use slapi_long, just use uint64_t from inttypes.h please. We should avoid long long, as it's not deterministic.
We don't need 1866 -#define NSPRIu64» "llu", because inttypes.h provides PRIu64 for string formats, so we should use this instead.
For now, don't use __ATOMIC_RELAXED, __ATOMIC_RELAXED, use SEQ_CST, we can relax these as we go with tests to be sure of the effect.
Avoid ,"%llu", use the PRI types from inttypes.h
Sorry to be so difficult about this, I think if we are going to do this, we should get it right. Thanks for your time on this.
Neither is uint64_t - we already talked about this. We NEED to do this for printf'ing. That is the whole point of all of this - otherwise we get overflows or compiler errors.
Yeah I left that as is to keep the patch small - figured all of this could be replaced by another patch.
I was also wondering about __ATOMIC_SEQ_CST in slapi_counters.c. I saw that you changed this to atomic release/acquire in nunc-stans. Is that something we want to do here as well?
I can do that __PRI64_PREFIX + PRIu64 = "llu" - this needs testing though...
You're not, but I don't think you know what's going on exactly. If we don't use "%llu" for our logging functions/printfs we overflow. So although PRUint64 and uint64_t behave as unsigned long longs, the compiler thinks they are just unsigned longs. So to avoid the overflows we must use "%llu" for these types, but... this will cause compiler warnings about the wrong type being used because gcc thinks unint64_t is a unsigned long, etc. So to solve both the issues we have to cast. I would rather cast when we return the counter value, than cast every time we use printf formatting. Hence, slapi_long to make things consistent and easy.
I'm all ears if you can tell me how to get this working without casting, AND without overflows & compiler warnings.
Anyway I need to test inttypes.h string formats to see if it works as expected.
Thanks. Mark
I found the root cause of ALL of this... NSPR printf functions!
The standard printf functions work correctly with uint64_t/PRUint64 (no overflows - no compiler warnings). This means we can remove slapi_long and all the casting. So I need to redo the entire fix, but this is good news and it will be significantly cleaner.
New patch
<img alt="0001-Issue-48989-Overflow-in-counters-and-monitor.patch" src="/389-ds-base/issue/raw/files/a5aa4324a43c9d29af5ee7a42f2b9176ccf9492dcffbb6adb745573bac49b3f7-0001-Issue-48989-Overflow-in-counters-and-monitor.patch" />
Ah hah! That would explain it.
I think we should be reducing our reliance on NSPR, it adds a lot of overhead and it's relevance is not as high these days.
I'm much happier with this patch: I think one thing is that we need to be sure we only read from this struct via the functions rather than the direct access, but I think this is a great start for now.
Metadata Update from @firstyear: - Custom field reviewstatus adjusted to ack (was: review)
fdf78dc..434a92f master -> master
ee63e40..70084f1 389-ds-base-1.3.5 -> 389-ds-base-1.3.5
Metadata Update from @mreynolds: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
<img alt="0001-Ticket-48989-Improve-counter-overflow-fix.patch" src="/389-ds-base/issue/raw/files/e44f83238e462ba9d0d5428c7cecb0df95cfe4602bf7ff2fc9c6b72f211d37dd-0001-Ticket-48989-Improve-counter-overflow-fix.patch" />
Metadata Update from @firstyear: - Issue status updated to: Open (was: Closed)
Metadata Update from @firstyear: - Custom field reviewstatus adjusted to review (was: ack)
You removed the atomic counter assembly code for 32bit archs on linux, so is __atomic_store_8() actually atomic on 32bit envs? Meaning does it use its "backup" internal mutex, or is it truly atomic? I don't want to remove functionality for code cleanliness.
The rest looks fine :)
It depends on the platform. IIRC there is capability to do 64bit atomics on x86 systems in some cases, but I may be remembering incorrectly.
Regardless, the whole point of this patch is to fix the overflow, which is a 32bit limit. The hardcoded inline asm only works on a 32bit int. So it's wrong anyway. This way we have a guarantee that:
This is exactly what we want :)
OK, ack
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to ack (was: review)
commit 15ce646 To ssh://git@pagure.io/389-ds-base.git 85807be..15ce646 master -> master
Metadata Update from @firstyear: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
commit 1155d50 To ssh://git@pagure.io/389-ds-base.git 15ce646..1155d50 master -> master
One line fix for missing return.
We have portability issues on RHEL ppc:
libtool: link: gcc -std=gnu99 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m32 -mcpu=power7 -mtune=power7 -Wl,-z -Wl,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o .libs/pwdhash-bin ldap/servers/slapd/tools/pwdhash_bin-pwenc.o ./.libs/libslapd.so -L/usr/lib -ltcmalloc -lkrb5 -lk5crypto -lcom_err -lpcre -lpthread -lsystemd -lplc4 -lplds4 -lnspr4 -lssl3 -lnss3 -lsvrcore -lldap_r -llber -lsasl2 -ldl -Wl,-rpath -Wl,/usr/lib/dirsrv ./.libs/libslapd.so: undefined reference to __atomic_fetch_sub_8' ./.libs/libslapd.so: undefined reference to__atomic_load_8' ./.libs/libslapd.so: undefined reference to __atomic_store_8' ./.libs/libslapd.so: undefined reference to__atomic_fetch_add_8' collect2: error: ld returned 1 exit status
__atomic_fetch_sub_8' ./.libs/libslapd.so: undefined reference to
__atomic_store_8' ./.libs/libslapd.so: undefined reference to
libtool: link: gcc -std=gnu99 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m32 -mcpu=power7 -mtune=power7 -Wl,-z -Wl,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o .libs/pwdhash-bin ldap/servers/slapd/tools/pwdhash_bin-pwenc.o ./.libs/libslapd.so -L/usr/lib -ltcmalloc -lkrb5 -lk5crypto -lcom_err -lpcre -lpthread -lsystemd -lplc4 -lplds4 -lnspr4 -lssl3 -lnss3 -lsvrcore -lldap_r -llber -lsasl2 -ldl -Wl,-rpath -Wl,/usr/lib/dirsrv ./.libs/libslapd.so: undefined reference to `__atomic_fetch_sub_8' ./.libs/libslapd.so: undefined reference to `__atomic_load_8' ./.libs/libslapd.so: undefined reference to `__atomic_store_8' ./.libs/libslapd.so: undefined reference to `__atomic_fetch_add_8' collect2: error: ld returned 1 exit status
Metadata Update from @mreynolds: - Issue status updated to: Open (was: Closed)
You mean 32bit ppc? or ppc64? Is there a link to a build you can send me?
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12870341
<img alt="0001-Ticket-48989-Re-implement-lock-counter.patch" src="/389-ds-base/issue/raw/files/2b89fc694f537fc77e9966ca1430016e7af2a7b9b9f4d5372637aec57117758d-0001-Ticket-48989-Re-implement-lock-counter.patch" />
3c869cc..e59420e master -> master
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to ack (was: review) - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
Reopening, since one of the origin l issues was not addressed.
<img alt="0001-Ticket-48989-fix-perf-counters.patch" src="/389-ds-base/issue/raw/files/02153967e1e4bd9097c7ea97494372a0c19bbd9baa32b454d249d80ebb385e35-0001-Ticket-48989-fix-perf-counters.patch" />
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to review (was: ack) - Issue status updated to: Open (was: Closed)
ack! I'm all for removing code like this. We should really check all our counters and see if they make sense, because there is a cost to using atomics that is not free. :)
I hear you, but the rest of the stats are mostly pure Berkeley DB stats - which should not be removed.
770bf3a..18a77e9 master -> master
0b116ee..f84cdfd 389-ds-base-1.3.6 -> 389-ds-base-1.3.6
The backport to 1.3.5 was incomplete :(
commit 0b5b193 commit abe6d51 commit 0ff9bb3 To ssh://git@pagure.io/389-ds-base.git e2ee1c1..0ff9bb3 389-ds-base-1.3.5 -> 389-ds-base-1.3.5
Currently test for this issue is the slowest, it runs for 10-15 minutes. We can cheat a little bit and change the counter value to something close to 2^32 than 0. This change requires installed gdb and debuginfo packages on a system under test.
It's not an elegant solution, so I'm open to other opinions on how we can improve this test run time. <img alt="0001-Issue-48989-Improve-test-execution-time.patch" src="/389-ds-base/issue/raw/files/2540553868d83f47f6bdca4fa4e5a807bef8eefc6887f1d5c07cca562c640adb-0001-Issue-48989-Improve-test-execution-time.patch" />
Metadata Update from @vashirov: - Custom field version adjusted to None - Issue status updated to: Open (was: Closed)
Metadata Update from @vashirov: - Custom field reviewstatus adjusted to review (was: ack)
@vashirov There is a cmocka test for counters in the code base: we don't need to test them from the lib389 tools. I think we can remove the lib389 test instead because this is really slow.
Ah, great! I agree, cmocka is a better tool for testing this. New patch is attached. <img alt="0001-Issue-48989-Delete-slow-lib389-test.patch" src="/389-ds-base/issue/raw/files/137207fab94bc33141f543adb42aa5e15b6978580847127089e11239df480887-0001-Issue-48989-Delete-slow-lib389-test.patch" />
To ssh://pagure.io/389-ds-base.git 01272f7..e9dd75d master -> master
Metadata Update from @vashirov: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/2048
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: fixed)
Login to comment on this ticket.