Service Monitoring Examples

This page is a companion to the Managing Servers and Services with NMIS8 page, and provides some examples of how various services can be monitored with NMIS.

Web

Remote, port only

NMIS can monitor the accessibility of TCP ports (using the NMAP tool), and for a Web service that would tell you whether the server is reachable (but not whether it's fully working). This kind of monitoring does not require any software running on the target server, however.

Here is a configuration snippet for this level of monitoring, for the standard web ports 443 and 80, which you would activate for the server that you want to test:

Server Process

If SNMP is enabled for the system in question, if NMIS is polling that system and if the system and its model supports the Host Resources MIB, then NMIS can check process statuses and verify the existence of a specific process. The Service Type must be "service", the name of the process must be given as Service Name, and you need to activate the service for the node that you want to test.

For a CentOS box with Apache 2.2.x we'd be using the following service definition, which checks for processes named "httpd":

SAPI-script based

NMIS can also do a limited amount of interaction with a TCP-based service using SAPI scripts. Example scripts for POP3 and basic HTTP are shipped with NMIS in /usr/local/nmis8/conf/scripts. The default http script connects to the Web server in question and attempts to download the root index URL "/"; if this request succeeds or returns an HTTP redirect, then the  service is considered  to be ok.

To enable this kind of monitoring, you need to define the service with the Name matching the script file name. The Service Name can be a text of your choice, but the Service Type must be "script", and you must activate that service for the node that you want to communicate with:

end-to-end using a custom program

If you need more precise interaction with your web service than the SAPI scripts can provide (e.g. SSL/TLS or cookies or the like), then you'll need to use a custom script. NMIS 8.5.4g ships with an example script of that type in /usr/local/nmis8/install/scripts/webtest, which should to be moved to a directory meant for binaries (e.g. /usr/local/nmis8/bin or /usr/local/bin/) if you want to use it.

NOTE - NMIS9 ships this script in /usr/local/nmis9/conf-default/scripts/webtest.

The example script downloads a web page (optionally following a number of redirections) using http or https, and optionally checks that the document content matches a given regular expression. You need to define this service with Service Type "program", provide suitable Program settings for the program and activate the service for the server that you want to test (but please note: the custom program will always be run locally on your NMIS server!)

Here is how we verify that the Opmantek website is up and running: this downloads the page using https, then looks for the phrase "Opmantek Products":

DNS

remote, port only

NMIS can monitor the accessibility of TCP and UDP ports (using the NMAP tool), which in the case of DNS would give only a rough indication of whether the DNS server is reachable at all.

Here is a configuration snippet for this level of monitoring:

remote, protocol only

To verify the general operation of a remote DNS server, you can use the service 'dns' that's built into NMIS. This service will make a DNS request to the server in question and then triggers outage alerts based on getting a DNS record back or not (and also captures the response time).

Here is how our own internal monitoring is set up to check our own domain, which involves servers outside of our control: We've defined nodes with the model set statically to "PingOnly" for the external DNS servers in question, and activated service "opmantek-dns" for them, which looks like this:

Please note that model "PingOnly" by itself is not sufficient to disable SNMP (or WMI) accesses; you also have to change the node configuration option collect to false.

local, custom script

On a system that is under your control, and which runs NMIS you can execute arbitrary scripts to collect service statuses. The example script below checks that the local NMIS server itself has a running BIND DNS server process:

#!/bin/sh
# small script that tests that a local bind is up and communicating
if /sbin/pidof named >/dev/null 2>&1 && /usr/sbin/rndc status | grep -q 'up and running'; then
        exit 100
else
        exit 0
fi

To use this, save the script somewhere NMIS can access it (as /usr/local/bin/bindpresent for example), then configure NMIS with this service of type "program" and activate  the service  for the  NMIS server itself:


MySQL Database

remote, port only

To verify that your MySQL database server is reachable you could define a service to check TCP port 3306, similar to the examples above. Naturally that's not an end-to-end test.

remote, server process status

In addition to the port reachability you can define a service for checking the existence of the "mysqld" process, if you are polling the server in question with SNMP:

remote, custom script

The third, and most comprehensive end-to-end monitoring setup would require a small custom script that actually connects to the server and performs  a query on said server. Here is an example of such a script, which would have to be adjusted for your environment (or be changed to accept more command line arguments) and saved as /usr/local/bin/mysqltest:

#!/bin/sh
# a small wrapper around the mysql client, which connects to a test database
# and runs show tables; if successful (and there are tables) we call it good
NODE=$1                                                    # passed in, comes from node.host
DBUSER="mytest"
DBPASSWORD="something secret"
DBNAME="testdb"
OUTPUT=`mysql -u$DBUSER -p$DBPASSWORD -h$NODE $DBNAME -e "show tables;"`
if [ $? != 0 ]; then
        exit 0                                            # service bad
elif ! echo "$OUTPUT" | grep -q "Tables_in_"; then
        exit 50;                                        # service not fully ok
else
        exit 100;                                        # service good
fi

To use this service test, you'd define a service of Service Type "program", with an appropriate Program path, and with the Program Args being set to "node.host", which would be replaced by the address or hostname of the node in question:

UPS Status

custom scripts

Cheaper UPS systems that don't have builtin networking or SNMP capabilities can be monitored by NMIS as well, as long as there is some sort of management infrastructure that supports querying the UPS status. In this example we're checking two UPS systems that are connected to our NMIS server via USB cables, where the NUT (Network UPS Tools) suite takes care of the interfacing.

The upstest.pl script below uses the NUT tools to query the named UPS and reports whether it's working and at what charge level it is. (NMIS does not yet graph extra variables like the charge level here as of version 8.5.4G, but this feature will be added soon.)

#!/usr/bin/perl
# a tiny wrapper around upsc to integrate with nmis
# exits with 100 if ups online, charge otherwise
# this means the service is down only when the ups is dead, 
# NOT while its discharging.
# also reports battery charge as charge=NNN
use strict;
# args: name of the ups
my $upsname = $ARGV[0];
die "usage: $0 <upsname>\n" if (!@ARGV);
my @knownones = `upsc -L 2>/dev/null`;
die "unknown ups $upsname\n" if !grep (/^$upsname:/, @knownones);
my ($status,$charge);
for my $line (`upsc $upsname 2>/dev/null`)
{
                chomp $line;
                my ($varname,$value) = split(/\s*:\s*/, $line);
                if ($varname eq "ups.status")
                {
                                $status = $value;
                }
                elsif ($varname eq "battery.charge")
                {
                                $charge = $value;
                }
}
print "charge=$charge\n" if (defined $charge);
exit ($status =~ /^OL/? 100 : $charge);

For our UPS systems we first make use of NMIS' builtin SNMP-based process status monitoring, which checks that there is at least one active process with a given name (here 'upsd'), and then added the per-UPS status checks with the UPS names passed to the upstest script. This example setup requires that the UPSs are connected to the NMIS server itself, but NUT could of course be accessed over the network.

Here is our service definition: