haproxy can provide metircs at on hosts it runs on, would be nice to get those into zabbix
Eg. https://www.haproxy.com/blog/exploring-the-haproxy-stats-page
Metadata Update from @james: - Issue assigned to gwmngilfen - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: low-gain, medium-trouble
That page is an HTML page, I don't think we want to scrape that ;)
Howver, we already have a stats socket enabled for NRPE:
[root@proxy01 ~][STG]# echo "show stat json" | socat stdio unix-connect:/var/run/haproxy-stat|jq (pages of json)
So, I think we could work with this, if we knew what things to extract.... there's a lot of data!
Issue tagged with: sprint-0
OK, so it turns out there is a template maintained by Zabbix themselves for HAProxy - https://github.com/zabbix/zabbix/tree/release/7.0/templates/app/haproxy_agent
I set this up, with just a by-hand change to the HAProxy config on proxy02.stg (and a corresponding selinux module that allows the agent to contact localhost:8404). Then I added the template and applied it to proxy02.stg ... and my, that's a lot of data:
proxy02 on zabbix.stg
That's 21 items per backend, 16 items per frontend, 24 items per service, a pile of graph prototypes, and a bunch of dashboards. Oh, and triggers for some of those too.
This looks good to me, my only concern is how much load this ads to Zabbix... but I can see use incorproating some of these checks into our daily work, for sure. Thoughts @james?
Wow. Thats a pile of stuff. ;(
But it could be useful, hard to say... we could just pull it and see if it's a problem? Or try and isolate what we care about?
Note that some backends (mirrorlists) go down all the time, because we have to restart things with new data every hour, so one backend restarts then the other one. Over and over. And of course the tcp timeout issue is making all the stuff with backends on the build network bounce. ;(
max connections could be useful. failures could normally be useful.
Yeah, it's a lot. However, it's probably best to start big and then whittle it down if the regular "check the monitoring" keeps hitting this.
I'll prep a change for this week to implement it, and we can monitor.
Issue status updated to: Closed (was: Open) Issue close_status updated to: Fixed
Log in to comment on this ticket.