Consider adding a FreeIPA Status / Notification page for listing actionable events that might break a set configuration. Provisioning for a solutions page linked from said Status / Notification page would also be welcome.
There's currently no easy way to determine if a certain logged issue has broken IPA.
For example, if the replication is broken between two masters, there is no way to know this from the UI unless the user knows explicitly what to look for from within the log files. A tab with red, amber or green notifications would help determine any broken configuration. (Possibly even include alerting.)
There's currently an inability to see if all the AD users were correctly mapped from within FreeIPA. At the moment, it's unclear from running ipa group-add-member if this was done fully and which users / groups were added. This could be useful if a given group is empty hence why certain defined sudo rules do not work for a user. The User Group section currently does not list any AD users.
Third nice-to-have would be the ability to see an AD user's or group's properties from within FreeIPA after being mapped through external groups.
RFE
Currently using the following version.
package freeipa-server is not installed package freeipa-client is not installed ipa-server-4.5.0-22.el7.centos.x86_64 ipa-client-4.5.0-22.el7.centos.x86_64 389-ds-base-1.3.6.1-24.el7_4.x86_64 pki-ca-10.4.1-17.el7_4.noarch krb5-server-1.15.1-8.el7.x86_64
N/A
Log file locations: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Linux_Domain_Identity_Authentication_and_Policy_Guide/config-files-logs.html Troubleshooting guide: https://www.freeipa.org/page/Troubleshooting
Seems to be a super set request of https://pagure.io/freeipa/issue/4390 , https://pagure.io/freeipa/issue/3068 , https://pagure.io/freeipa/issue/2443 and perhaps others.
Metadata Update from @rcritten: - Issue set to the milestone: FreeIPA 4.8
@rcritten also seems to have some overlap with #4008.
See also https://pagure.io/freeipa/issue/5829 ([RFE] topology analysis tool) which seems to have some overlap (especially with "architectural review" feature, if implemented).
I'm going to close this as a duplicate of the others.
Metadata Update from @rcritten: - Issue close_status updated to: duplicate - Issue status updated to: Closed (was: Open)
Hi there, I noticed many other tickets about this are closed, but related to https://www.freeipa.org/page/V4/Healthcheck which was shown to me by @mreynolds. CC @rcritten @dpal for visibility.
I really like the intent of this tool, but I think given my experience in RH I think the approach to the design may be overlooking a really important approach here. I think here you are trying to make a healthcheck tool that will check all the things you think are important, but they aren't actually important to a system implementor or support.
Honestly, the best way to approach this is to engage GSS - get them all in a meeting and get the top 5 most common cases of errors. Then you automate fixes to those cases, make those processes more robust, and if that is not possible, then your healthcheck tool is written to detect only those cases and give GSS a quicker diagnosis so they know how to follow up.
This is going to give you a huge return on investment of engineering time, on helping GSS to quickly diagnose issues, and for actually finding issues that deployments care about, focused on IPA specifics, and in many cases because you focused on automated solutions, you will even reduce support case load!
For example, you could spend a day writing disk monitoring into the tool, but no one will care because nagios will do it better, and it doesn't actually resolve any GSS cases.
A more complete case study of this is in Directory Server - When I was employed by Red Hat I would arrange meetings with GSS independently to ask for common issues. I think about 2016 this was performance, and specifically directory tuning affecting both IPA and DS. After discussing with GSS, they informed me that most customers didn't understand the ratio of dbcache/entrycache required, and in many cases there were complaints of IPA/DS being slow but we defaulted to 10MB of cache on an install. After that I wrote an automatic tuning tool that would detect memory limits and scale your server automatically with the appropriate values to assume IPA would be running on the same machine. After this release and update of the server, GSS reported that it was first easier to help customers (run this script, and restart instance, performance fixed), but also that NEW installs were never initiating calls in the first place. GSS for IDM then had a significant case load drop related to DS performance as a result of this work.
So my advice - if you want to write a healthcheck tool, you should focus on engaging with GSS, automating fixes to the issues they report and ONLY if you can't automate a fix, add it to the healthcheck so that intervention and diagnosis can be improved. A healthcheck tool is not a replacement for fixing weak processes (IE cert renewal), it should only exist when an automated fix can't be deployed (for example, weak DS aci's because we don't have contextual business knowledge to know what the admins intent was) .
My concern, and the reason I give this advice is that the current healthcheck design looks like it isn't engaging with the key stakeholders (GSS), and is more about reporting issues than automatically fixing them and making the processes more robust in the server which is a shortterm approach, rather than long term engineering.
Hope that advice helps on the approach.
What I forgot to say here is you could also write monitoring of the CA renewal case, but that's not fixing the problem, it's just making diagnosis quicker. You're spending a lot of effort to see the problem faster, but that doesn't help because you need to prevent the problem occuring at all, it's already trivially easy to detect on a system. Don't think about "how can we fix it after it explodes" think "how can it never explode at all".
@firstyear the support org was involved in developing Health Check requirements. We also continue to work on the robustness of the system, to avoid the problem occurring. I agree that prevention is better than cure (we all do). But prevention, diagnosis and treatment - all three are needed. Health Check is about diagnosis, and possibly in the future automatic remediation where possible.
@ftweedal That's not how the design reads though - I think that the design should explicitly state:
I think that there is a lot of "hidden" design decision in this case which isn't on the upstream design, so community members like me are not able to see that process, which is why it certainly looks like customer support is not included.
Login to comment on this ticket.