#2978 IPA system acting lethargic meaning slow, unresponsive and unusable at times running Command: "ipa sudorule-add-user sudo_rulename --groups=usergroupname"
Closed: Fixed None Opened 11 years ago by dpal.

https://bugzilla.redhat.com/show_bug.cgi?id=846333 (Red Hat Enterprise Linux 6)

Issue:
IPA system acting lethargic meaning slow, unresponsive and unusable at times
after running Command: "ipa sudorule-add-user sudo_rulename
--groups=usergroupname"

Overview:
I'm currently setting up a test environment to perform some authentication,
administration and runtime load against an IPA test environment on some higher
end machines. Once the test environment gets setup the plan is to run these
tasks 24/7 at various load levels to test the reliability of IPA system under
test.  While building the Sudo config environment for IPA the system gets into
this slow then unresponsive and unusable state. The issue is inadvertently
causing dos issues.  Is it possible the amount of users in my user/groups are
causing the problems and reducing the amount from 1000 to 100 may solve the
issue temporarily and allow me to continue?  Can an administrator increase the
query limits to subside the error messages posted indicating ""ERROR: limits
exceeded for this query"

Task Script:
The script causing the ipa server havic is responsible for building the proper
sudo objects to get the system configured.  The script is single threaded
python script calling the IPA cli in a sequence defined below.  The command
that delays, generates errors when executing is "ipa sudorule-add-user".  The
command initially had 5 user/groups defined in the command line so I reduced it
to 1 and called it 5 separate times, one for each new user group.  Made no
difference...

rpm -qi 389-ds-base
Name        : 389-ds-base                  Relocations: (not relocatable)
Version     : 1.2.10.2                          Vendor: Red Hat, Inc.
Release     : 19.el6_3                      Build Date: Wed 20 Jun 2012
05:11:42 PM EDT
Install Date: Wed 01 Aug 2012 11:00:01 AM EDT      Build Host:
x86-008.build.bos.redhat.com
Group       : System Environment/Daemons    Source RPM:
389-ds-base-1.2.10.2-19.el6_3.src.rpm
Size        : 4854889                          License: GPLv2 with exceptions
Signature   : (none)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
URL         : http://port389.org/
Summary     : 389 Directory Server (base)

-Env Preconditions:
10K users exist
10 User groups exist  (1k per group)
1550 sudo commands (/usr/bin/*)
100 sudo groups

Script that build the Sudo rules via cli"
for 1 to 10:
-Add SudoRule
-Add SudoRule Hosts (qty2)
-Add SudoRule UserGroups1
-Add SudoRule UserGroups2
-Add SudoRule UserGroups3
-Add SudoRule UserGroups4
-Add SudoRule UserGroups5
-Add SudoRule Allow Command Groups (qty5)
-Add SudoRule Deny Command Groups (qty5)


Is it Repeatable:
Yes consistent

Symptoms:
-Sudo cli client script:
Once the scipt starts it will inevitably have issues running the command "ipa
sudorule-add-user".  A delay of up to 7 minutes may incur and messages
indicating "ERROR: limits exceeded for this query" get generated.  I never made
it past the creation of 7 sudo rules since it was taking to long and errors
were posting themselves, so I terminated the script.

-IPA User Interface:
Once the script starts the UI inevitably becomes unusable.  Have been getting
ui dialogs indicating, limits exceeded for this query and Internal server
errors.  At this point the UI is inoperable.

-Kinit:
Kinit fails to connect to allow me to authenticate in this state. kinit: Cannot
contact any KDC for realm 'TESTRELM.COM' while getting initial credentials.

-CPU:
Once the scipt starts the ns-slapd process jumps up to and over 100% at the
target Ipa master and master rep server.
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20151 dirsrv    20   0 3069m 650m  20m S 99.2  4.1 172:11.83 ns-slapd

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
1114 dirsrv    20   0 3069m 404m  20m S 589.6  2.5 125:13.71 ns-slapd

-IPA Master DirSec Error Log Snip:
06/Aug/2012:14:47:56 -0400] slapd_ldap_sasl_interactive_bind - Error: could not
perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact
LDAP server) ((null)) errno 110 (Connection timed out)
[06/Aug/2012:14:47:56 -0400] slapi_ldap_bind - Error: could not perform
interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server)
[06/Aug/2012:14:47:56 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with
GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ((null))
[06/Aug/2012:14:50:51 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with
GSSAPI auth resumed
[06/Aug/2012:14:53:02 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Timed out sending
update operation to consumer (uniqueid c258d222-dc2f11e1-b897d99e-aafdf81b, CSN
501fe82f000400040000): Timeout.
[06/Aug/2012:14:58:04 -0400] - repl5_inc_waitfor_async_results timed out
waiting for responses: 4437 4795
[06/Aug/2012:15:00:04 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Warning: unable to send
endReplication extended operation (Timed out)
[06/Aug/2012:15:00:27 -0400] slapd_ldap_sasl_interactive_bind - Error: could
not perform interactive bind for id [] mech [GSSAPI]: LDAP error -2 (Local
error) (SASL(-1): generic failure: GSSAPI Error: An invalid name was supplied
(Hostname cannot be canonicalized)) errno 110 (Connection timed out)
[06/Aug/2012:15:00:27 -0400] slapi_ldap_bind - Error: could not perform
interactive bind for id [] mech [GSSAPI]: error -2 (Local error)
[06/Aug/2012:15:00:27 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with
GSSAPI auth failed: LDAP error -2 (Local error) (SASL(-1): generic failure:
GSSAPI Error: An invalid name was supplied (Hostname cannot be canonicalized))
[06/Aug/2012:15:00:28 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with
GSSAPI auth resumed
[06/Aug/2012:15:03:33 -0400] slapd_ldap_sasl_interactive_bind - Error: could
not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't
contact LDAP server) ((null)) errno 110 (Connection timed out)
[06/Aug/2012:15:03:33 -0400] slapi_ldap_bind - Error: could not perform
interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server)
[06/Aug/2012:15:03:33 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with
GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ((null))
[06/Aug/2012:15:06:45 -0400] slapd_ldap_sasl_interactive_bind - Error: could
not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't
contact LDAP server) ((null)) errno 110 (Connection timed out)
[06/Aug/2012:15:06:45 -0400] slapi_ldap_bind - Error: could not perform
interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server)
[06/Aug/2012:15:11:53 -0400] slapd_ldap_sasl_interactive_bind - Error: could
not perform interactive bind for id [] mech [GSSAPI]: LDAP error -2 (Local
error) (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.
Minor code may provide more information (Cannot contact any KDC for realm
'TESTRELM.COM')) errno 115 (Operation now in progress)
[06/Aug/2012:15:11:53 -0400] slapi_ldap_bind - Error: could not perform
interactive bind for id [] mech [GSSAPI]: error -2 (Local error)
[06/Aug/2012:15:11:53 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with
GSSAPI auth failed: LDAP error -2 (Local error) (SASL(-1): generic failure:
GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information
(Cannot contact any KDC for realm 'TESTRELM.COM'))
...
...

Work Around the issues:
If there is not a magical config setting to resolve this issue, I will attempt
to work around the issue by reducing the amount of users existing in a users
group.  I reduce the users from 1000 to 100 and give it a go.


**Test Hardware:
Ipa Server 1&2 = Linux sti-high-1.testrelm.com 2.6.32-279.el6.x86_64 #1 SMP Wed
Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
Ipa Client 1&2 = Linux sti-high-3.testrelm.com 2.6.32-279.el6.x86_64 #1 SMP Wed
Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    4
CPU socket(s):         2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Stepping:              2
CPU MHz:               1596.000
BogoMIPS:              4787.82
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15

Memory:
             total       used       free     shared    buffers     cached
Mem:      16316084    4678128   11637956          0     199440    2701480
-/+ buffers/cache:    1777208   14538876
Swap:      8224760          0    8224760

The problem is that much if not most of the work is done in the postinstall plugins like memberof, after the result has been returned to the client. So it appears that the ipa sudo cmd line returns when the operation is complete, but the operation is not really complete. Over time, if many ipa sudo cmds are executed one after the other, the server will become completely bogged down doing postoperation processing such as memberof.

The short term solution is to put sleeps between ipa sudo cmds, especially those that trigger lots of memberof updates.

The longer term solution is to make all DS plugins to be executed inside the database transaction, so that when the ipa sudo cmd returns to the caller, all of the processing will be complete. https://fedorahosted.org/389/ticket/351

Moving to 3.2 to align with 389-ds target of 1.3.0.a1

The expectation is that this will be resolved by 389-ds 1.3 with transactions enabled.

master: f1f1b4e

Metadata Update from @dpal:
- Issue assigned to rcritten
- Issue set to the milestone: FreeIPA 3.1 Stabilization

7 years ago

Login to comment on this ticket.

Metadata