Monitoring HAProxy Metrics And How They Change With Time

Krzysztof Ksiazek

HAProxy is one of the most popular load balancers for MySQL and MariaDB.

Feature-wise, it cannot be compared to ProxySQL or MaxScale, but it is fast and robust and may work perfectly fine in any environment as long as the application can perform the read/write split and send SELECT queries to one backend and all writes and SELECT … FOR UPDATE to a separate backend.

Keeping track of the metrics made available by HAProxy is very important. You have to be able to know the state of your proxy, especially if you encountered any issues.

ClusterControl always made available an HAProxy status page, which showed the state in real time. With the new, Prometheus-based SCUMM (Severalnines ClusterControl Unified Monitoring & Management) dashboards, it is now possible to track how those metrics change in time.

In this blog post we will go over the different metrics presented in the HAProxy SCUMM dashboard.

First of all, by default Prometheus and SCUMM dashboards are disabled in ClusterControl. In that case, it’s just a matter of one click to deploy them for a given cluster. If you have multiple clusters monitored by ClusterControl you can reuse the same Prometheus instance for every cluster managed by ClusterControl.

Once deployed, we can access the HAProxy dashboard. We will go over the data shown in it.

As you can see, it starts with the information about the state of the backends. Here, please note this may depend on the cluster type and how you deployed HAProxy. In this case, it was a Galera cluster and HAProxy was deployed in a round-robin fashion; therefore you see three backends for reads and three for writes, six in total. It is also the reason why you see all backends marked as up. In case of the replication cluster, things are looking different as the HAProxy will be deployed in a read/write split, and the scripts will keep only one host (master) up and running in the writer’s backend:

This is why on the screen above, you can see two backend servers marked as “down”.

Next graph focuses on the data sent and received by both backend (from HAProxy to the database servers) and frontend (between HAProxy and client hosts).

We can also check the traffic distribution between backends that are configured in HAProxy. In this case we have two backends and the queries are sent via port 3308, which acts as the round-robin access point to our Galera Cluster.

We can also find graphs showing how the traffic was distributed across all backend servers. In this case, due to round-robin access pattern, data was more or less evenly distributed across all three backend Galera servers.

Next, we can see information about sessions. How many sessions were opened from HAProxy to the backend servers. We can also track how many times per second a new session was opened to the backend. You can also check how those metrics look like when you look at them on per backend server basis.

Next two graphs show what was the maximum number of sessions per backend server and also when some connectivity issues showed up. This can be quite useful for debugging purposes when you hit some configuration error on your HAProxy instance and connections started to be dropped.

Next graph might be even more valuable as it shows different metrics related to error handling - response errors, request errors, retries on the backend side etc. Then we have a Sessions graph, which shows the overview of the session metrics.

On the next graph we can track the connection errors in time, this can be also useful to pinpoint the time when the issue started to evolve.

Finally, two graphs related to queued requests. HAProxy queues requests to backend if the backend servers are oversaturated. This can point to, for example, the overloaded database servers, which cannot cope with more traffic.

As you can see, ClusterControl tracks the most important metrics of HAProxy and can show how they change in time. This is very useful in pinpointing when an issue started and, to some extent, what could be the root cause of it. Try it out (it’s free) for yourself.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.