#12158 setup ipa02.stg and ipa03.stg again as replicas
Opened a month ago by kevin. Modified 2 days ago

Staging was affected by the same thing that hit produciton, but in the staging case both ipa02.stg and ipa03.stg were uninstalled.

So, we need to resetup ipa02.stg and ipa03.stg as replicas.

I attempted to do this last week, but it failed with:

"Configuring Kerberos KDC (krb5kdc)", "  [1/6]: configurin
g KDC", "  [2/6]: adding the password extension to the directory", "  [3/6]: creating anonymou
s principal", "  [4/6]: starting the KDC", "  [5/6]: configuring KDC to start on boot", "  [6/
6]: enable PAC ticket signature support", "Done configuring Kerberos KDC (krb5kdc).", "Configu
ring kadmin", "  [1/2]: starting kadmin ", "  [2/2]: configuring kadmin to start on boot", "Do
ne configuring kadmin.", "Configuring directory server (dirsrv)", "  [1/3]: configuring TLS fo
r DS instance", "  [error] RuntimeError: Certificate issuance failed (CA_UNREACHABLE: Server a
t https://ipa01.stg.iad2.fedoraproject.org/ipa/json failed request, will retry: 4016 (Failed t
o authenticate to CA REST API).)", "Your system may be partly configured.", "Run /usr/sbin/ipa
-server-install --uninstall to clean up."]}

I am not sure why it was saying ipa01.stg was unreachable there, it appears up and functioning fine.

Anyhow, we need to sort this out and perhaps add monitoring so we know it's broken when it is next time.


Metadata Update from @zlopez:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: authentication, high-trouble, medium-gain, ops

a month ago

What kind of check you would like to add for this ?? something like topology check ?

Good question. For starters just the same replication check we have in prod... I think we didn't have it in staging because something was wrong with it there and we couldn't figure out what?

Good question. For starters just the same replication check we have in prod... I think we didn't have it in staging because something was wrong with it there and we couldn't figure out what?

Ok I'll take a look
Example check we have in prod

IPA Replication Status

    OK  09-05-2024 18:13:13     32d 16h 51m 32s     1/3     OK - Replica Status: (b'ipa03.iad2.fedoraproject.org', b'Error (0) Replica acquired successfully: Incremental update succeeded') 

I have re-setup ipa02.stg/ipa03.stg...

but the check for replication says nothing found. ;(

That is strange, I'm looking at the topology of staging and IPA servers and both looks correct to me. It is possible that the replication already happened, so there is nothing to replicate?

Did you try to check replica via ldapsearch ?? , I think it will provide more information about replica.
I don't think i have access to ipa in stg .

Whats the ldapsearch call? The check replication nagios check we use in prod returns nothing in stg...

I suspect we have a permissions problem somewhere in staging... so the monitoring can't 'see' the replication agreements/status.

I think without any filter it provides a lot of information about ldap, membership freeipa etc ...

Try ldapsearch -x -H FREE_IPA_HOST

That spews out a bunch of accounts and then stops with a limit exceeded. ;)

We need to filter on something...

Yes it seems to be a pb to reach ipa01 !
Maybe there is permission issue for /etc/ipa/ca.crt and /var/lib/ipa/ra-agent.{key|pem} ? so the file cannot be read by non-root users.
One more think is to look on how ipa is installed (which umask) , for example with 0077 , yes the replica install will fails, and in this case we should re-install the master .

# ipa-server-install --uninstall
# umask 0022
# ipa-server-install

Backups on 02/03 are failing with:

/etc/cron.daily/data-only-backup.sh:

Error: Local roles CA do not match globally used roles CA, KRA. A backup done on this host would not be complete enough to restore a fully functional, identical cluster.
The ipa-backup command failed. See /var/log/ipabackup.log for more information

We can't reinstall 01, as it's the CA server.

I don't think its a file permission issue as I ran it as root. ;)

@kevin what's the status for this ? you still want to uninstall and install ipa replicas ?
I think same pb reported here => https://pagure.io/fedora-infrastructure/issue/12149

I guess we are waiting for me or @zlopez to have time to completely reinstall ipa02.stg and ipa03.stg. I am not really convinced that this will fix the issue.

I guess we could look at dirserv logs and see if we can see any permission issues...

The problem in 12149 is that we want to modify the playbook to NOT reinstall replicas, and move that out to a seperate playbooks/manul/ playbook. :)

I totally agree
W'll try to work on PR to modify playbook

Does this still need to be done?

I have some spare time to reinstall them now and already fixed 12149.

I was able to reinstall ipa02.stg and ipa03.stg. It took me most of today to actually tweak the confirmation dialog and add task for removing replication agreement before installing the replica.

But the error mentioned by @kevin with /etc/cron.daily/data-only-backup.sh is till there. I probably know how to fix that, just need one more reinstall of the replicas. Which takes some time.

I created a PR to fix this error. Will basically add a KRA role to all the replicas.

I will try to run the playbook on staging first to test it.

I tried to deploy ipa03.stg with KRA role enabled and it failed with configuration error:

ERROR: CalledProcessError: Command '['pki', '-d', '/etc/pki/pki-tomcat/alias', '-f', '/etc/pki/pki-tomcat/password.conf', '-U', 'https://ipa03.stg.iad2.fedoraproject.org:443', '--i
gnore-banner', 'ca-kraconnector-add', '--url', 'https://ipa03.stg.iad2.fedoraproject.org:8443/kra/agent/kra/connector', '--subsystem-cert', '/tmp/tmptiisv2cu/subsystem.crt', '--tra
nsport-cert', '/tmp/tmptiisv2cu/transport.crt', '--transport-nickname', 'transportCert cert-pki-kra', '--install-token', '/tmp/tmptiisv2cu/install-token', '--debug']' returned non-
zero exit status 255.

So it wouldn't be that simple as adding --setup-kra to ipa-replica-install. I will revert the commit and reinstall the replica without --setup-kra. So we have it set up at least.

I will look at this more on Monday and see if there is something else that could be done.

I tried to run the command that failed manually and got:

java.lang.Exception: Too many arguments specified.
        at com.netscape.cmstools.system.KRAConnectorAddCLI.execute(KRAConnectorAddCLI.java:112)
        at org.dogtagpki.cli.CommandCLI.execute(CommandCLI.java:58)
        at org.dogtagpki.cli.CLI.execute(CLI.java:353)
        at org.dogtagpki.cli.CLI.execute(CLI.java:353)
        at com.netscape.cmstools.cli.SubsystemCLI.execute(SubsystemCLI.java:79)
        at org.dogtagpki.cli.CLI.execute(CLI.java:353)
        at com.netscape.cmstools.cli.MainCLI.execute(MainCLI.java:659)
        at com.netscape.cmstools.cli.MainCLI.main(MainCLI.java:698)

This seems like a problem in IPA itself as this is all done by ipa-replica-install.

Metadata Update from @zlopez:
- Issue assigned to zlopez

2 days ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog