#7813 FREEIPA CRASH due to error - BDB0060 PANIC : fatal region error detected; run recovery.
Opened 9 months ago by deepaklo. Modified 7 months ago

Request for enhancement

As <persona, e.g.="" admin=""> , I want <what?> so that <why?>.

Issue

ns-slapd: [16/Dec/2018:13:19:34.679806367 +0000] libdb: BDB0060 PANIC: fatal region error detected; run recovery

the error logs has the above information.

Steps to Reproduce

  1. NA
    2.
    3.

Actual behavior

(what happens)
LDAP service(ipa) gets crashed.

Expected behavior

(what do you expect to happen)
it should nt crash

Version/Release/Distribution

$ rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server

package freeipa-server is not installed
package freeipa-client is not installed
ipa-server-4.4.0-14.el7.centos.4.x86_64
ipa-client-4.4.0-14.el7.centos.4.x86_64
389-ds-base-1.3.5.10-15.el7_3.x86_64
pki-ca-10.3.3-16.el7_3.noarch
krb5-server-1.14.1-27.el7_3.x86_64
[root@prd-ldpaut01 ~]#

Additional info:

Any additional information, configuration, data or log snippets that is needed for reproduction or investigation of the issue.

Log file locations: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Linux_Domain_Identity_Authentication_and_Policy_Guide/config-files-logs.html
Troubleshooting guide: https://www.freeipa.org/page/Troubleshooting

https://access.redhat.com/solutions/3098131

AS per redhat KB,

nsslapd-dncachememsize should be increased to 150GB or more. How to achiveve this requirement? step by step explanation would be great.

ldm should be restored from another replica or backup? how to do this step by step explanation would be great.


Was any of the filesystems of that server full during the event?

Do you have actual FreeIPA backups or working replica at this time?

Note that the KB recommend to set nsslapd-dncachememsize above 150Mb, not 150Gb. IMHO valid range would be [150Mb-400Mb].
The setting is

dn: cn=<backend>,cn=ldbm database,cn=plugins,cn=config
changetype: modify
replace: nsslapd-dncachememsize
nsslapd-dncachememsize: <value>

where <backend> is either ipaca or userRoot

Opps.. click to fast. Anything in the /var/log/dirsrv/slapd-xxx/errors file about the reason of the DB panic ?

Was any of the filesystems of that server full during the event? No file system was full during the event
Do you have actual FreeIPA backups or working replica at this time?

yes , replica on the 2nd node is working.

Opps.. click to fast. Anything in the /var/log/dirsrv/slapd-xxx/errors file about the reason of the DB panic ?

BDB0689 changelog/id2entry.db page 3435 is on free list with type 5
BDB0061 PANIC: Invalid argument
BDB0060 PANIC: fatal region error detected; run recovery
Serious Error---Failed in dblayer_txn_abort, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery)

WARNING: changelog: entry cache size 2097152 B is less than db size 69763072 B; We recommend to increase the entry cache size nsslapd-cachememsize.

i am very much struggling with this issue, any additional details required , please probe, i shall furnish the details required.

Was any of the filesystems of that server full during the event?

no file system full

Do you have actual FreeIPA backups or working replica at this time?

yes we do have a replica and that doesnt have the same errors.

@deepaklo from the DB PANIC pov it does not match any known issue. Investigation of such DB issue requires the db environment (/var/lib/dirsrv/slapd-xx/db/__db* + log.xxxx), db files would help, a deep investigation why/how a used page is flagged free and finally some chance to identify the RC. Does it occurs frequently ? only on Centos (no rhel box ?), is it reproducible ?

yes occurs more frequently, almost every day or 2 days once.

I think @tbordaz needs a private copy of your database in order to diagnose exactly what is wrong with it. Is this problem still occurring?

Login to comment on this ticket.

Metadata