https://bugzilla.redhat.com/show_bug.cgi?id=846333 (Red Hat Enterprise Linux 6)
Issue: IPA system acting lethargic meaning slow, unresponsive and unusable at times after running Command: "ipa sudorule-add-user sudo_rulename --groups=usergroupname" Overview: I'm currently setting up a test environment to perform some authentication, administration and runtime load against an IPA test environment on some higher end machines. Once the test environment gets setup the plan is to run these tasks 24/7 at various load levels to test the reliability of IPA system under test. While building the Sudo config environment for IPA the system gets into this slow then unresponsive and unusable state. The issue is inadvertently causing dos issues. Is it possible the amount of users in my user/groups are causing the problems and reducing the amount from 1000 to 100 may solve the issue temporarily and allow me to continue? Can an administrator increase the query limits to subside the error messages posted indicating ""ERROR: limits exceeded for this query" Task Script: The script causing the ipa server havic is responsible for building the proper sudo objects to get the system configured. The script is single threaded python script calling the IPA cli in a sequence defined below. The command that delays, generates errors when executing is "ipa sudorule-add-user". The command initially had 5 user/groups defined in the command line so I reduced it to 1 and called it 5 separate times, one for each new user group. Made no difference... rpm -qi 389-ds-base Name : 389-ds-base Relocations: (not relocatable) Version : 1.2.10.2 Vendor: Red Hat, Inc. Release : 19.el6_3 Build Date: Wed 20 Jun 2012 05:11:42 PM EDT Install Date: Wed 01 Aug 2012 11:00:01 AM EDT Build Host: x86-008.build.bos.redhat.com Group : System Environment/Daemons Source RPM: 389-ds-base-1.2.10.2-19.el6_3.src.rpm Size : 4854889 License: GPLv2 with exceptions Signature : (none) Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> URL : http://port389.org/ Summary : 389 Directory Server (base) -Env Preconditions: 10K users exist 10 User groups exist (1k per group) 1550 sudo commands (/usr/bin/*) 100 sudo groups Script that build the Sudo rules via cli" for 1 to 10: -Add SudoRule -Add SudoRule Hosts (qty2) -Add SudoRule UserGroups1 -Add SudoRule UserGroups2 -Add SudoRule UserGroups3 -Add SudoRule UserGroups4 -Add SudoRule UserGroups5 -Add SudoRule Allow Command Groups (qty5) -Add SudoRule Deny Command Groups (qty5) Is it Repeatable: Yes consistent Symptoms: -Sudo cli client script: Once the scipt starts it will inevitably have issues running the command "ipa sudorule-add-user". A delay of up to 7 minutes may incur and messages indicating "ERROR: limits exceeded for this query" get generated. I never made it past the creation of 7 sudo rules since it was taking to long and errors were posting themselves, so I terminated the script. -IPA User Interface: Once the script starts the UI inevitably becomes unusable. Have been getting ui dialogs indicating, limits exceeded for this query and Internal server errors. At this point the UI is inoperable. -Kinit: Kinit fails to connect to allow me to authenticate in this state. kinit: Cannot contact any KDC for realm 'TESTRELM.COM' while getting initial credentials. -CPU: Once the scipt starts the ns-slapd process jumps up to and over 100% at the target Ipa master and master rep server. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20151 dirsrv 20 0 3069m 650m 20m S 99.2 4.1 172:11.83 ns-slapd PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1114 dirsrv 20 0 3069m 404m 20m S 589.6 2.5 125:13.71 ns-slapd -IPA Master DirSec Error Log Snip: 06/Aug/2012:14:47:56 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 110 (Connection timed out) [06/Aug/2012:14:47:56 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server) [06/Aug/2012:14:47:56 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ((null)) [06/Aug/2012:14:50:51 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth resumed [06/Aug/2012:14:53:02 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Timed out sending update operation to consumer (uniqueid c258d222-dc2f11e1-b897d99e-aafdf81b, CSN 501fe82f000400040000): Timeout. [06/Aug/2012:14:58:04 -0400] - repl5_inc_waitfor_async_results timed out waiting for responses: 4437 4795 [06/Aug/2012:15:00:04 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Warning: unable to send endReplication extended operation (Timed out) [06/Aug/2012:15:00:27 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: An invalid name was supplied (Hostname cannot be canonicalized)) errno 110 (Connection timed out) [06/Aug/2012:15:00:27 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -2 (Local error) [06/Aug/2012:15:00:27 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth failed: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: An invalid name was supplied (Hostname cannot be canonicalized)) [06/Aug/2012:15:00:28 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth resumed [06/Aug/2012:15:03:33 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 110 (Connection timed out) [06/Aug/2012:15:03:33 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server) [06/Aug/2012:15:03:33 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ((null)) [06/Aug/2012:15:06:45 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 110 (Connection timed out) [06/Aug/2012:15:06:45 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server) [06/Aug/2012:15:11:53 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Cannot contact any KDC for realm 'TESTRELM.COM')) errno 115 (Operation now in progress) [06/Aug/2012:15:11:53 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -2 (Local error) [06/Aug/2012:15:11:53 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth failed: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Cannot contact any KDC for realm 'TESTRELM.COM')) ... ... Work Around the issues: If there is not a magical config setting to resolve this issue, I will attempt to work around the issue by reducing the amount of users existing in a users group. I reduce the users from 1000 to 100 and give it a go. **Test Hardware: Ipa Server 1&2 = Linux sti-high-1.testrelm.com 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux Ipa Client 1&2 = Linux sti-high-3.testrelm.com 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 4 CPU socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 44 Stepping: 2 CPU MHz: 1596.000 BogoMIPS: 4787.82 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15 Memory: total used free shared buffers cached Mem: 16316084 4678128 11637956 0 199440 2701480 -/+ buffers/cache: 1777208 14538876 Swap: 8224760 0 8224760
The problem is that much if not most of the work is done in the postinstall plugins like memberof, after the result has been returned to the client. So it appears that the ipa sudo cmd line returns when the operation is complete, but the operation is not really complete. Over time, if many ipa sudo cmds are executed one after the other, the server will become completely bogged down doing postoperation processing such as memberof.
The short term solution is to put sleeps between ipa sudo cmds, especially those that trigger lots of memberof updates.
The longer term solution is to make all DS plugins to be executed inside the database transaction, so that when the ipa sudo cmd returns to the caller, all of the processing will be complete. https://fedorahosted.org/389/ticket/351
Moving to 3.2 to align with 389-ds target of 1.3.0.a1
The expectation is that this will be resolved by 389-ds 1.3 with transactions enabled.
master: f1f1b4e
Metadata Update from @dpal: - Issue assigned to rcritten - Issue set to the milestone: FreeIPA 3.1 Stabilization
Login to comment on this ticket.