#12856 haproxy provides a bunch of stats. ... would be nice to get them in zabbix
Closed: Fixed a month ago by gwmngilfen. Opened 2 months ago by james.

haproxy can provide metircs at on hosts it runs on, would be nice to get those into zabbix

Eg.
https://www.haproxy.com/blog/exploring-the-haproxy-stats-page


Metadata Update from @james:
- Issue assigned to gwmngilfen
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-gain, medium-trouble

2 months ago

That page is an HTML page, I don't think we want to scrape that ;)

Howver, we already have a stats socket enabled for NRPE:

[root@proxy01 ~][STG]# echo "show stat json" | socat stdio unix-connect:/var/run/haproxy-stat|jq
(pages of json)

So, I think we could work with this, if we knew what things to extract.... there's a lot of data!

Issue tagged with: sprint-0

2 months ago

OK, so it turns out there is a template maintained by Zabbix themselves for HAProxy - https://github.com/zabbix/zabbix/tree/release/7.0/templates/app/haproxy_agent

I set this up, with just a by-hand change to the HAProxy config on proxy02.stg (and a corresponding selinux module that allows the agent to contact localhost:8404). Then I added the template and applied it to proxy02.stg ... and my, that's a lot of data:

proxy02 on zabbix.stg

That's 21 items per backend, 16 items per frontend, 24 items per service, a pile of graph prototypes, and a bunch of dashboards. Oh, and triggers for some of those too.

This looks good to me, my only concern is how much load this ads to Zabbix... but I can see use incorproating some of these checks into our daily work, for sure. Thoughts @james?

Wow. Thats a pile of stuff. ;(

But it could be useful, hard to say... we could just pull it and see if it's a problem? Or try and isolate what we care about?

Note that some backends (mirrorlists) go down all the time, because we have to restart things with new data every hour, so one backend restarts then the other one. Over and over.
And of course the tcp timeout issue is making all the stuff with backends on the build network bounce. ;(

max connections could be useful.
failures could normally be useful.

Yeah, it's a lot. However, it's probably best to start big and then whittle it down if the regular "check the monitoring" keeps hitting this.

I'll prep a change for this week to implement it, and we can monitor.

Issue status updated to: Closed (was: Open)
Issue close_status updated to: Fixed

a month ago

Log in to comment on this ticket.

Metadata
Boards 1
sprint-0 Status: Done