Issue #7813: FREEIPA CRASH due to error - BDB0060 PANIC : fatal region error detected; run recovery. - freeipa

freeipa

#7813 FREEIPA CRASH due to error - BDB0060 PANIC : fatal region error detected; run recovery.

Closed: insufficientinfo 3 years ago by rcritten. Opened 5 years ago by deepaklo.

Request for enhancement

As <persona, e.g. admin> , I want <what?> so that <why?>.

Issue

ns-slapd: [16/Dec/2018:13:19:34.679806367 +0000] libdb: BDB0060 PANIC: fatal region error detected; run recovery

the error logs has the above information.

Steps to Reproduce

NA
2.
3.

Actual behavior

(what happens)
LDAP service(ipa) gets crashed.

Expected behavior

(what do you expect to happen)
it should nt crash

Version/Release/Distribution

$ rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server

package freeipa-server is not installed
package freeipa-client is not installed
ipa-server-4.4.0-14.el7.centos.4.x86_64
ipa-client-4.4.0-14.el7.centos.4.x86_64
389-ds-base-1.3.5.10-15.el7_3.x86_64
pki-ca-10.3.3-16.el7_3.noarch
krb5-server-1.14.1-27.el7_3.x86_64
[root@prd-ldpaut01 ~]#

Additional info:

Any additional information, configuration, data or log snippets that is needed for reproduction or investigation of the issue.

Log file locations: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Linux_Domain_Identity_Authentication_and_Policy_Guide/config-files-logs.html
Troubleshooting guide: https://www.freeipa.org/page/Troubleshooting

https://access.redhat.com/solutions/3098131

AS per redhat KB,

nsslapd-dncachememsize should be increased to 150GB or more. How to achiveve this requirement? step by step explanation would be great.

ldm should be restored from another replica or backup? how to do this step by step explanation would be great.

fcami commented 5 years ago

Was any of the filesystems of that server full during the event?

Do you have actual FreeIPA backups or working replica at this time?

tbordaz commented 5 years ago

Note that the KB recommend to set nsslapd-dncachememsize above 150Mb, not 150Gb. IMHO valid range would be [150Mb-400Mb].
The setting is

dn: cn=<backend>,cn=ldbm database,cn=plugins,cn=config
changetype: modify
replace: nsslapd-dncachememsize
nsslapd-dncachememsize: <value>

where <backend> is either ipaca or userRoot

tbordaz commented 5 years ago

Opps.. click to fast. Anything in the /var/log/dirsrv/slapd-xxx/errors file about the reason of the DB panic ?

deepaklo commented 5 years ago

Was any of the filesystems of that server full during the event? No file system was full during the event
Do you have actual FreeIPA backups or working replica at this time?

yes , replica on the 2nd node is working.

Opps.. click to fast. Anything in the /var/log/dirsrv/slapd-xxx/errors file about the reason of the DB panic ?

BDB0689 changelog/id2entry.db page 3435 is on free list with type 5
BDB0061 PANIC: Invalid argument
BDB0060 PANIC: fatal region error detected; run recovery
Serious Error---Failed in dblayer_txn_abort, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery)

WARNING: changelog: entry cache size 2097152 B is less than db size 69763072 B; We recommend to increase the entry cache size nsslapd-cachememsize.

i am very much struggling with this issue, any additional details required , please probe, i shall furnish the details required.

deepaklo commented 5 years ago

Was any of the filesystems of that server full during the event?

no file system full

Do you have actual FreeIPA backups or working replica at this time?

yes we do have a replica and that doesnt have the same errors.

tbordaz commented 5 years ago

@deepaklo from the DB PANIC pov it does not match any known issue. Investigation of such DB issue requires the db environment (/var/lib/dirsrv/slapd-xx/db/__db* + log.xxxx), db files would help, a deep investigation why/how a used page is flagged free and finally some chance to identify the RC. Does it occurs frequently ? only on Centos (no rhel box ?), is it reproducible ?

deepaklo commented 5 years ago

yes occurs more frequently, almost every day or 2 days once.

rcritten commented 5 years ago

I think @tbordaz needs a private copy of your database in order to diagnose exactly what is wrong with it. Is this problem still occurring?

Metadata Update from @rcritten:
- Issue close_status updated to: insufficientinfo
- Issue status updated to: Closed (was: Open)

3 years ago

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

None

Milestone

None

affects_doc

None

source

None

knownissue

None

type

None

blockedby

None

test_case

None

component

None

blocking

None

on_review

None

keywords

None

test_coverage

None

reviewer

None

external_tracker

None

rhbz

None

tester

None

changelog

None

design

None

freeipa

Source Code

#7813 FREEIPA CRASH due to error - BDB0060 PANIC : fatal region error detected; run recovery. Closed: insufficientinfo 3 years ago by rcritten. Opened 5 years ago by deepaklo.

Request for enhancement

Issue

Steps to Reproduce

Actual behavior

Expected behavior

Version/Release/Distribution

Additional info:

Metadata

#7813 FREEIPA CRASH due to error - BDB0060 PANIC : fatal region error detected; run recovery.

Closed: insufficientinfo 3 years ago by rcritten. Opened 5 years ago by deepaklo.