Severalnines Blog
The automation and management blog for open source databases

ClusterControl Developer Studio: automatically scale your clusters

Severalnines
March 14, 2016

In the previous blog posts, we gave a brief introduction to ClusterControl Developer Studio and the ClusterControl Domain Specific Language and how to extract information from the Performance Schema. ClusterControl’s Developer Studio allows you to write your own scripts, advisors and alerts. With just a few lines of code, you can already automate your clusters!

In this blog post we will dive deeper into Developer Studio and show you how you can keep an eye on performance and at the same time scale out the number of read slaves in your replication topology whenever it is necessary.

CMON RPC

The key element in our advisor will be talking to the CMON RPC: ClusterControl’s API that enables you to automate tasks. Many of the components of ClusterControl make use of this API as well and a great deal of functionality is accessible via the API.

To be able to talk to the CMON RPC we will need to install/import the cmonrpc.js helper file from the Severalnines Github Developer Studio repository into your own Developer Studio. We described this process briefly in our introductory blog post. Alternatively you could create a new file named common/cmonrpc.js and paste the contents in there.

This helper file has only one usable function that interacts with the CMON RPC at the moment: addNode. All the other functions in this helper are supporting this process, like for instance the setCmonrpcToken function that adds the RPC token in the JSON body if RPC tokens are in use.

The cmonrpc helper expects the following variables to be present:

var CMONRPC_HOST = 'localhost';
var CMONRPC_PORT = '9500';
var CMONRPC_TOKEN = ["token0", "token1", “token2”];
var FREE_HOSTS = ["10.10.10.12", "10.10.10.13", "10.10.10.14"];

The FREE_HOSTS variable contains the ip addresses of the hosts we want to use as read slaves. This variable will be used by the findUnusedHosts function and compared against the hosts already present in the cluster and return an unused host or false in case there is no unused host available.

The CMONRPC_TOKEN variable contains the RPC tokens when used. The first token will be the token found in the main cmon.cnf. If you are not using RPC tokens in your configuration, you can leave them empty.

NOTE: Currently as of 1.2.12 the ClusterControl web application does not support having a RPC token in the cmon.cnf file. If you want to run both this advisor and access the web application at the same time, then comment out the RPC token in the cmon.cnf file and leave the CMON_RPCTOKEN variable empty.

Auto Scale

Our auto scaling advisor is a very basic one: we simply look at the number of connections on our master and slaves. If we find the number of connections excessive on the slaves, we need to scale out our reads and we can do this by adding fresh servers.

We will look at long(er) term connections to prevent our advisor from scaling unnecessarily. Therefore we use the SQL statistics functionality from Developer Studio and determine the standard deviation of each node in the cluster. You could customize this to either the nth-percentile, average or maximum connections if you like, but that last one could cause unnecessary scaling.

var endTime   = CmonDateTime::currentDateTime();
var startTime = endTime - 3600;
var stats     = host.sqlStats(startTime, endTime);
var config      = host.config();
var max_connections    = config.variable("max_connections")[0]['value'];

We retrieve the SQL statistics using the host.sqlStats function, and passing it a start- and endtime, we retrieve the configured maximum number of connections as well. The sqlStats function returns an array of maps containing all statistics collected during the period we selected. Since the statistical functions of Developer Studio expect arrays containing only values, the array of maps isn’t useable in this form. So we need to create a new array and copy all the values for the number of connections.

var connections = [];
for(stx = 0; stx < stats.size(); ++stx) {
    connections[stx] = stats[stx]['connections'];
}

Then we can calculate the connections used during our selected period of time and express that as an percentage:

stdev_connections_pct = (stdev(connections) / max_connections) * 100;
if(stdev_connections_pct > WARNING_THRESHOLD) {
    THRESHOLD_MET = true;
}

Once our threshold is met, we add a new node to our cluster and this is when we call the cmonrpc helper functions. However we only want to do this once during our run, hence we set the variable THRESHOLD_MET. At the very end, we also add an extra line of advice to show we are scaling out our cluster

if (THRESHOLD_MET == true)
{
    /* find unused node */
    node = findUnusedHost();
    addNode(node);

    advice = new CmonAdvice();
    advice.setTitle(TITLE);
    advice.setAdvice("Scaling out cluster with new node:"+ node);
    advice.setJustification("Scaling slave nodes is necessary");
    advisorMap[idx+1]= advice;
}

Conclusion

Obviously, there are still a few shortcomings with this advisor: it should obviously not run more frequently than the period used for the SQL statistics selection. In our example we set it to 1 hour of statistics, so do not run the advisor more frequently than once per hour.

Also the advisor will put extra stress on the master by copying its dataset to the new slave, so you better also keep an eye on the master node in your Master-Slave topology. The advisors are limited to a runtime of 30 seconds at this moment, so if there is a slow response in the curl calls, it could exceed the runtime if you use the cmonrpc library for other purposes.

On the good side, this advisor shows how easy you can use advisors beyond what they were designed for and use them to trigger actions. Examples of such actions could be the scheduling of backups or setting hints in your configuration management tool (Zookeeper/Consul). The possibilities with Developer Studio are almost only limited by your imagination!

The complete advisor:

#include "common/mysql_helper.js"
#include "common/cmonrpc.js"

var CMONRPC_HOST = 'localhost';
var CMONRPC_PORT = '9500';
var CMONRPC_TOKEN = ["test12345", "someothertoken"];
var FREE_HOSTS = ["10.10.19.12", "10.10.19.13", "10.10.19.14"];

/**
 * Checks the percentage of used connections and scales accordingly
 * 
 */ 
var WARNING_THRESHOLD=85;
var TITLE="Auto scaling read slaves";
var THRESHOLD_MET = false;
var msg = '';

function main()
{
    var hosts     = cluster::mySqlNodes();
    var advisorMap = {};

    for (idx = 0; idx < hosts.size(); ++idx)
    {
        host        = hosts[idx];
        map         = host.toMap();
        connected     = map["connected"];
        var advice = new CmonAdvice();
        var endTime   = CmonDateTime::currentDateTime();
        var startTime = endTime - 10 * 60;
        var stats     = host.sqlStats(startTime, endTime);
        var config      = host.config();
        var max_connections    = config.variable("max_connections")[0]['value'];
        var connections = [];

        if(!connected)
            continue;
        if(checkPrecond(host) && host.role() != 'master')
        {
            /* Fetch the stats on connections over our selection period */
            for(stx = 0; stx < stats.size(); ++stx)
                connections[stx] = stats[stx]['connections'];
            stdev_connections_pct = (stdev(connections) / max_connections) * 100;
            if(stdev_connections_pct > WARNING_THRESHOLD)
            {
                THRESHOLD_MET = true;
                msg = "Slave node";
                advice.setJustification("Percentage of connections used (" + stdev_connections_pct + ") above " + WARNING_THRESHOLD + " so we need to scale out slaves.");
                advice.setSeverity(Warning); 
            }
            else
            {
                msg = "Slave node";
                advice.setJustification("Connections used ok.");
                advice.setSeverity(Ok);
            }
        }
        else
        {
            if (host.role() == 'master')
            {
                msg = "Master node";
                advice.setJustification("Master node will not be taken into consideration");
                advice.setSeverity(Ok);  
            }
            else
            {
                msg = "Cluster is not okay and there is no data";
                advice.setJustification("there is not enough load on the server or the uptime is too little.");
                advice.setSeverity(Ok);
            }
        }

        advice.setHost(host);
        advice.setTitle(TITLE);
        advice.setAdvice(msg);
        advisorMap[idx]= advice;
    }

    if (THRESHOLD_MET == true)
    {
        /* find unused node */
        var node = findUnusedHost();
        addNode(node);

        advice = new CmonAdvice();
        advice.setTitle(TITLE);
        advice.setAdvice("Scaling out cluster with new node:"+ node);
        advice.setJustification("Scaling slave nodes is necessary");
        advisorMap[idx+1]= advice;
    }


    return advisorMap;
}

Related Post

We’re keeping the tills ringing at eCommerce platform vidaXL

Severalnines
November 09, 2016

Announcing our new customer, eCommerce platform vidaXL, who use ClusterControl to manage their MongoDB & MySQL database to help them compete with the world's largest e-commerce platforms.

High Availability on a Shoestring Budget - Deploying a Minimal Two Node MySQL Galera Cluster

Severalnines
November 07, 2016

Learn how to set up a Galera cluster with just 2 nodes and use Garbd to retain high availability.

Posted in: ClusterControl Galera

ClusterControl Developer Studio: Custom database alerts by combining metrics

Severalnines
October 24, 2016

ClusterControl allows you to write elaborate advisors. This post shows you how to combine multiple metrics and trigger alerts if your MongoDB replication window becomes smaller than 24 hours.

Database Cluster Management - Manual vs Automation via ClusterControl

Severalnines
October 19, 2016

Database Cluster management tasks include restart/recovery of services that fail, topology changes, reconfiguration, rolling upgrades, backups and performing security procedures. Since the tasks usually involve multiple servers, these can be repetitive and error-prone. This blog looks at efficiency gains when using ClusterControl to manage a MySQL Galera cluster as compared to manual ways.

Posted in: ClusterControl