blog

PagerDuty Incident Alerting for ClusterControl

Ashraf Sharif

Published

Need to add phone and SMS alerting to ClusterControl? ClusterControl 1.2.8 introduces support for PagerDuty, an alerting service for Ops teams to schedule on-calls and add phone and SMS notifications to IT tools. By integrating PagerDuty with ClusterControl, you can start receiving phone, SMS and email notifications for all important database events as monitored by ClusterControl. Alerts go directly to the right person who can solve the issue.

This integration is possible thanks to a new plugin interface, that takes ClusterControl alarms in JSON format and outputs to an external system via plugins. Plugins can be either scripts or executable binaries.

 

We have built a few example plugins utilizing this plugin interface, available from our Github repository:

  • pagerduty.py: This plugin forwards the alarm raise/close events to the PagerDuty system
  • syslog.py: This plugin writes the new alarms instantly to the syslog

 

Plug-in configuration options

Let’s have a quick look at how the plugin works, but feel free to go directly to the PagerDuty setup instructions.

A plugins directory can be set through the CMON configuration file (config file entry is plugin_dir), or in the cmon_configuration table found in CMON DB (PLUGIN_DIR key).

When no value is set, the /var/cmon/plugins path will be used. The controller tries to execute the ‘executable’ scripts/binaries from the directory, while the non-executable files are skipped.

 

How does the plugin execute?

Whenever an alarm is raised, CMON feeds the alarm event to its standard input (stdin) and all executable scripts or binaries (plugins) under plugin_dir path will be executed. A JSON message will be written to those standard inputs. The expected JSON message syntax is as follows:

{

    // currently only the "alarm" is supported
    "type": "alarm",
    // whether it is a new alarm, or an update for an old alarm or an alarm removal
    "action": "new|update|remove",
    // a JSON map which contains the alarm details
    "alarm": { },
    // the hostname of the cmon controller (could be used to determine the web-ui url)
    "cmon_hostname" : "ip-address/or-host-name"
}

An example JSON message:

{

    "type": "alarm",
    "action": "new",
    "alarm": 
    {
        "cid": 1,
        "count": 1,
        "hostid": 4,
        "hostname": "192.168.197.83",
        "id": 10736685724696984427,
        "message": "Server 192.168.197.83 reports: MySQL server disconnected.",
        "name": "MySQL server disconnected",
        "nodeid": 1,
        "recommendation": "Check error log of failed mysql server.",
        "severity": "ALARM_CRITICAL",
        "type": "MySqlDisconnected"
    },
    "cmon_hostname": "192.168.197.83"
}

 

Some details about the “alarm” map, it contains the following fields: 

 

Field

Description

cid

Cluster identification Number

count

Alarm counter (how many times it was raised)

hostid

Host id (to map with hosts table)

nodeid

Internal id of the node

id

A unique identifier for this alarm (the plugin could use this to identify a specific alarm, for example for update and remove operations)

message

Message of the alarm

recommendation

Recommendation message on how to fix the problem

name

Name of the alarm (it’s simply based on the alarm type)

type

Alarm type

source

Alarm source (this might be removed soon)

severity

Alarm severity

(Please note that this is a subject to change in the future)

 

Setting up PagerDuty alerting

Log into your PagerDuty account and create a service for ClusterControl alarms. Go to PagerDuty > Services > Add New Service. Specify a service name and choose Use our API directly on Integration Type:

 

Note the Service API Key from the Service summary page:

 

On the ClusterControl server, install the required Python Requests module:

$ yum install -y python-setuptools # Redhat/CentOS

$ sudo apt-get install -y python-setuptools # Debian/Ubuntu
$ easy_install requests

Get the PagerDuty plugin from our Github repository and place it under /var/cmon/plugins:

$ git clone https://github.com/severalnines/s9s-admin

$ mkdir -p /var/cmon/plugins
$ cp s9s-admin/plugins/pagerduty/pagerduty.py /var/cmon/plugins

 

The plugin requires a configuration file defined at /etc/cmon_plugins.ini, to specify the Service API Key:

[pagerduty]
api_key=YOUR_SERVICE_API_KEY 

Ensure the script is executable:

$ chmod 755 /var/cmon/plugins/pagerduty.py

Once configured, ClusterControl will execute the pagerduty.py script whenever an alarm event is raised:

 

Login to your PagerDuty page and you will see that incidents have been triggered based on the alarm events:

Once ClusterControl has determined that the problem is resolved, it will raise a remove alarm event. The plugin will then send a resolve event through PagerDuty’s API and update the incident as Resolved:

 

Known Issues and Limitations

  • Some alarms (with the same alarm id) may be raised multiple times.
  • All scripts inside the plugin directory are only executed for two types of alarm events (new and remove) and not for non-alarm events raised in ClusterControl

Your feedback and suggestions are most welcome to improve this new feature. Happy alerting!

Subscribe below to be notified of fresh posts