On a system with >=32GB RAM autotuning sets nsslapd-cachememsize to a large value that is bigger than INT_MAX:
sh-4.4# grep Mem /proc/meminfo MemTotal: 65401208 kB MemFree: 43524404 kB MemAvailable: 58409212 kB
In the logs:
[10/May/2018:20:44:47.490690337 +0000] - ERR - ldbm_config_set - Value 4630511616 for attr nsslapd-cachememsize is greater than the maximum 2147483647 [10/May/2018:20:44:47.494356162 +0000] - ERR - parse_ldbm_instance_config_entry - Error with config attribute nsslapd-cachememsize : Error: value 4630511616 for attr nsslapd-cachememsize is greater than the maximum 2147483647 [10/May/2018:20:44:47.498026129 +0000] - ERR - ldbm_instance_config_load_dse_info - Error parsing the config DSE
During normal operation it is corrected, for example on a different system:
[10/May/2018:16:33:13.489337009 -0400] - WARN - ldbm_instance_config_cachememsize_set - delta +22407994880 of request 22408506880 reduced to 7872421888
But when ns-slapd is running in archive2db mode, we don't correct the value and segmentation fault can occur:
#0 0x00007ffff4646d2e in __strcmp_sse2_unaligned () at /lib64/libc.so.6 #1 0x00007ffff796bcc0 in slapd_comp_path () at /usr/lib64/dirsrv/libslapd.so.0 #2 0x00007fffea9ab630 in dblayer_restore (li=0x555556084c80, src_dir=0x55555634c6c0 "/var/lib/dirsrv/slapd-standalone1/bak/backup_test", task=0x0, bename=0x0) at ldap/servers/slapd/back-ldbm/dblayer.c:6569 #3 0x00007fffea99c75a in ldbm_back_archive2ldbm (pb=0x555556390000) at ldap/servers/slapd/back-ldbm/archive.c:165 #4 0x0000555555565011 in slapd_exemode_archive2db (mcfg=0x7fffffffe880) at ldap/servers/slapd/main.c:2571 #5 0x0000555555565011 in main (argc=6, argv=0x7fffffffec58) at ldap/servers/slapd/main.c:953
Apparently inst->inst_dir_name is NULL, because we bail out before it is set to a valid value.
Fedora 28 389-ds-base-1.4.0.8-1.fc28.x86_64
With the fix from 5d700cc basic_test.py::test_basic_import_export no longer crashes the server, but still it would be good to make backup/restore more robust to avoid potential crashes.
Metadata Update from @vashirov: - Custom field component adjusted to None - Custom field origin adjusted to None - Custom field reviewstatus adjusted to None - Custom field type adjusted to None - Custom field version adjusted to None
Metadata Update from @mreynolds: - Issue assigned to mreynolds
I have a fix for this, but I don't know if it's right. Is the bug that we are not correcting the cache size? Or should we just abort the bak2db when the inst struct has a NULL parent dir because of the invalid cache size? I need to look into this more...
Metadata Update from @mreynolds: - Assignee reset
I'd say it's a combination of three bugs: 1. We should correct the cache size if it's larger than available RAM in archive2db/db2archive modes. 2. We should not continue with archive2db if some of the required parameters are invalid or NULL. 3. slapd_comp_path() should be more robust and check for NULL pointer.
slapd_comp_path()
Okay so the problem only occurs when the value is greater than the integral type (unsigned 64bit integer). This check is done right before all backend config set functions are called. We error out before can adjust the value as we never actually call the set function..
Autotuning would never set a cache size higher that a uint64-t (its impossible), and that's the only condition that would trigger this error. It would be very invasive to the backend code to try and resize the cache if its larger than a uint64 in this scenario.
So [1] will be addressed if the value is a valid number(in range of uint64), but if you manually set the cache size to 10000000000000000000000000000000 the restore/backup will fail. Otherwise it will resize, if needed, and complete.
PR to come soon...
https://pagure.io/389-ds-base/pull-request/49688
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to review (was: None)
master:
commit 15ff2e3
Metadata Update from @mreynolds: - Issue close_status updated to: fixed - Issue set to the milestone: 1.4.0 - Issue status updated to: Closed (was: Open)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/2728
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: fixed)
Login to comment on this ticket.