#48989 Integer overflow in performance counters
Closed: wontfix 6 years ago Opened 7 years ago by nhosoi.

Description of problem:

From https://bugzilla.redhat.com/show_bug.cgi?id=1349577

There is another overflow in performance counters:
./ldap/servers/slapd/back-ldbm/perfctrs.h:
 33     PRUint32    page_access_rate;

./ldap/servers/slapd/back-ldbm/perfctrs.c:
167             perf->cache_size_bytes = mpstat->st_gbytes * ONEG +
mpstat->st_bytes;
168             perf->page_access_rate = mpstat->st_cache_hit +
mpstat->st_cache_miss;       <===
169             perf->cache_hit = mpstat->st_cache_hit;
170             perf->cache_try = mpstat->st_cache_hit + mpstat->st_cache_miss;
<===

Also objects-locked has the value of page_access_rate:
260     { SLAPI_LDBM_PERFCTR_AT_PREFIX "objects-locked",
261             offsetof( performance_counters, page_access_rate ) },
262     { SLAPI_LDBM_PERFCTR_AT_PREFIX "page-create-rate",
263             offsetof( performance_counters, page_create_rate ) },

ldapsearch -x -D "cn=Directory Manager" -w Secret123  -b "cn=monitor,cn=ldbm
database,cn=plugins,cn=config"  -LLL dbcachetries  nsslapd-db-objects-locked
dn: cn=monitor,cn=ldbm database,cn=plugins,cn=config
dbcachetries: 509565

dn: cn=database,cn=monitor,cn=ldbm database,cn=plugins,cn=config
nsslapd-db-objects-locked: 509565

Version-Release number of selected component (if applicable):
389-ds-base-1.3.5.10-11.el7.x86_64

Metadata Update from @nhosoi:
- Issue set to the milestone: 1.3.6.0

7 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to new
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1377452 https://bugzilla.redhat.com/show_bug.cgi?id=1349577 (was: https://bugzilla.redhat.com/show_bug.cgi?id=1377452)
- Issue close_status updated to: None

7 years ago

Metadata Update from @mreynolds:
- Issue assigned to mreynolds

7 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to review (was: new)

7 years ago

Don't use slapi_long, just use uint64_t from inttypes.h please. We should avoid long long, as it's not deterministic.

We don't need 1866 -#define NSPRIu64» "llu", because inttypes.h provides PRIu64 for string formats, so we should use this instead.

For now, don't use __ATOMIC_RELAXED, __ATOMIC_RELAXED, use SEQ_CST, we can relax these as we go with tests to be sure of the effect.

Avoid ,"%llu", use the PRI types from inttypes.h

Sorry to be so difficult about this, I think if we are going to do this, we should get it right. Thanks for your time on this.

Don't use slapi_long, just use uint64_t from inttypes.h please. We should avoid long long, as it's not deterministic.

Neither is uint64_t - we already talked about this. We NEED to do this for printf'ing. That is the whole point of all of this - otherwise we get overflows or compiler errors.

We don't need 1866 -#define NSPRIu64» "llu", because inttypes.h provides PRIu64 for string formats, so we should use this instead.

Yeah I left that as is to keep the patch small - figured all of this could be replaced by another patch.

For now, don't use __ATOMIC_RELAXED, __ATOMIC_RELAXED, use SEQ_CST, we can relax these as we go with tests to be sure of the effect.

I was also wondering about __ATOMIC_SEQ_CST in slapi_counters.c. I saw that you changed this to atomic release/acquire in nunc-stans. Is that something we want to do here as well?

Avoid ,"%llu", use the PRI types from inttypes.h

I can do that __PRI64_PREFIX + PRIu64 = "llu" - this needs testing though...

Sorry to be so difficult about this, I think if we are going to do this, we should get it right. Thanks for your time on this.

You're not, but I don't think you know what's going on exactly. If we don't use "%llu" for our logging functions/printfs we overflow. So although PRUint64 and uint64_t behave as unsigned long longs, the compiler thinks they are just unsigned longs. So to avoid the overflows we must use "%llu" for these types, but... this will cause compiler warnings about the wrong type being used because gcc thinks unint64_t is a unsigned long, etc. So to solve both the issues we have to cast. I would rather cast when we return the counter value, than cast every time we use printf formatting. Hence, slapi_long to make things consistent and easy.

I'm all ears if you can tell me how to get this working without casting, AND without overflows & compiler warnings.

Anyway I need to test inttypes.h string formats to see if it works as expected.

Thanks.
Mark

I found the root cause of ALL of this... NSPR printf functions!

The standard printf functions work correctly with uint64_t/PRUint64 (no overflows - no compiler warnings). This means we can remove slapi_long and all the casting. So I need to redo the entire fix, but this is good news and it will be significantly cleaner.

Ah hah! That would explain it.

I think we should be reducing our reliance on NSPR, it adds a lot of overhead and it's relevance is not as high these days.

I'm much happier with this patch: I think one thing is that we need to be sure we only read from this struct via the functions rather than the direct access, but I think this is a great start for now.

Metadata Update from @firstyear:
- Custom field reviewstatus adjusted to ack (was: review)

7 years ago

fdf78dc..434a92f master -> master

ee63e40..70084f1 389-ds-base-1.3.5 -> 389-ds-base-1.3.5

Metadata Update from @mreynolds:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

7 years ago

Metadata Update from @firstyear:
- Issue status updated to: Open (was: Closed)

7 years ago

Metadata Update from @firstyear:
- Custom field reviewstatus adjusted to review (was: ack)

7 years ago

You removed the atomic counter assembly code for 32bit archs on linux, so is __atomic_store_8() actually atomic on 32bit envs? Meaning does it use its "backup" internal mutex, or is it truly atomic? I don't want to remove functionality for code cleanliness.

The rest looks fine :)

It depends on the platform. IIRC there is capability to do 64bit atomics on x86 systems in some cases, but I may be remembering incorrectly.

Regardless, the whole point of this patch is to fix the overflow, which is a 32bit limit. The hardcoded inline asm only works on a 32bit int. So it's wrong anyway. This way we have a guarantee that:

  • it's 64bit capable on all platforms
  • it's always atomic (by cpu instruction, or mutex)

This is exactly what we want :)

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to ack (was: review)

7 years ago

commit 15ce646
To ssh://git@pagure.io/389-ds-base.git
85807be..15ce646 master -> master

Metadata Update from @firstyear:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

7 years ago

commit 1155d50
To ssh://git@pagure.io/389-ds-base.git
15ce646..1155d50 master -> master

One line fix for missing return.

We have portability issues on RHEL ppc:

libtool: link: gcc -std=gnu99 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m32 -mcpu=power7 -mtune=power7 -Wl,-z -Wl,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o .libs/pwdhash-bin ldap/servers/slapd/tools/pwdhash_bin-pwenc.o ./.libs/libslapd.so -L/usr/lib -ltcmalloc -lkrb5 -lk5crypto -lcom_err -lpcre -lpthread -lsystemd -lplc4 -lplds4 -lnspr4 -lssl3 -lnss3 -lsvrcore -lldap_r -llber -lsasl2 -ldl -Wl,-rpath -Wl,/usr/lib/dirsrv
./.libs/libslapd.so: undefined reference to __atomic_fetch_sub_8' ./.libs/libslapd.so: undefined reference to__atomic_load_8'
./.libs/libslapd.so: undefined reference to __atomic_store_8' ./.libs/libslapd.so: undefined reference to__atomic_fetch_add_8'
collect2: error: ld returned 1 exit status

We have portability issues on RHEL ppc:

libtool: link: gcc -std=gnu99 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m32 -mcpu=power7 -mtune=power7 -Wl,-z -Wl,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o .libs/pwdhash-bin ldap/servers/slapd/tools/pwdhash_bin-pwenc.o  ./.libs/libslapd.so -L/usr/lib -ltcmalloc -lkrb5 -lk5crypto -lcom_err -lpcre -lpthread -lsystemd -lplc4 -lplds4 -lnspr4 -lssl3 -lnss3 -lsvrcore -lldap_r -llber -lsasl2 -ldl -Wl,-rpath -Wl,/usr/lib/dirsrv
./.libs/libslapd.so: undefined reference to `__atomic_fetch_sub_8'
./.libs/libslapd.so: undefined reference to `__atomic_load_8'
./.libs/libslapd.so: undefined reference to `__atomic_store_8'
./.libs/libslapd.so: undefined reference to `__atomic_fetch_add_8'
collect2: error: ld returned 1 exit status

Metadata Update from @mreynolds:
- Issue status updated to: Open (was: Closed)

7 years ago

You mean 32bit ppc? or ppc64? Is there a link to a build you can send me?

You mean 32bit ppc? or ppc64? Is there a link to a build you can send me?

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12870341

Metadata Update from @firstyear:
- Custom field reviewstatus adjusted to review (was: ack)

7 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to ack (was: review)
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

7 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to review (was: ack)
- Issue status updated to: Open (was: Closed)

6 years ago

ack! I'm all for removing code like this. We should really check all our counters and see if they make sense, because there is a cost to using atomics that is not free. :)

Metadata Update from @firstyear:
- Custom field reviewstatus adjusted to ack (was: review)

6 years ago

ack! I'm all for removing code like this. We should really check all our counters and see if they make sense, because there is a cost to using atomics that is not free. :)

I hear you, but the rest of the stats are mostly pure Berkeley DB stats - which should not be removed.

770bf3a..18a77e9 master -> master

0b116ee..f84cdfd 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

Metadata Update from @mreynolds:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 years ago

The backport to 1.3.5 was incomplete :(

commit 0b5b193
commit abe6d51
commit 0ff9bb3
To ssh://git@pagure.io/389-ds-base.git
e2ee1c1..0ff9bb3 389-ds-base-1.3.5 -> 389-ds-base-1.3.5

Currently test for this issue is the slowest, it runs for 10-15 minutes. We can cheat a little bit and change the counter value to something close to 2^32 than 0. This change requires installed gdb and debuginfo packages on a system under test.

It's not an elegant solution, so I'm open to other opinions on how we can improve this test run time.
0001-Issue-48989-Improve-test-execution-time.patch

Metadata Update from @vashirov:
- Custom field version adjusted to None
- Issue status updated to: Open (was: Closed)

6 years ago

Metadata Update from @vashirov:
- Custom field reviewstatus adjusted to review (was: ack)

6 years ago

@vashirov There is a cmocka test for counters in the code base: we don't need to test them from the lib389 tools. I think we can remove the lib389 test instead because this is really slow.

Metadata Update from @firstyear:
- Custom field reviewstatus adjusted to ack (was: review)

6 years ago

To ssh://pagure.io/389-ds-base.git
01272f7..e9dd75d master -> master

Metadata Update from @vashirov:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2048

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Login to comment on this ticket.