Nagios: Sending SMS notifications through gsmsmsd

Critical, that’s a point you would not ever want to reach. Monitoring is all, or quite all, on recovering a situation before it becomes a problem.

When it becomes a problem, you must move quickly and solve the issue.

So,  when you have a problem, we do not need a genius, we need flat, plain procedures which help you to undestand that

  1. There’s a problem
  2. How fast you need to move
  3. How to solve it

To help you to understand the severity of a problem, there’s a common practice: what is really critical to you business must be notified with a less used, probably more expensive channel. So, using sms adresses these needs. Reading and dealing with a SMS text message is far more unconfortable than working with emails, but reaches you in a more pervasive way than other channes. Sending emails is virtually free of costs, SMS costs. So, you use sms just to read important messages, your daddy who want’s to know where you are, the one you love who reminds  you about buyng milk coming back home, your dear Nagios alerting you on a CRITICAL, really critical problem on you IT infrastructure.

On the SMS server side

So, let’s start with the sms server part. On the server you must be sure that gsmsend is always on, watching the right directory and with its hands (doh, does gsmsend has hands?) on the terminal connected to the sms modem (I used a Fastrack Supreme 10, serial, for this post).

If you issue a command like

ps ax | grep gsmsmsd

you should see something like:

/usr/bin/gsmsmsd -d /dev/ttyS0 -s /var/spool/sms/

How to reach this goal? Simple, let’s have a look to the root user crontab:

crontab -l root

And here’s what we can see:

* * * * * /usr/local/sbin/check_gsmsmsd

Using the following chart, it’s easy to undestand what we found in the crontab.

 # * * * * *  command to execute
 # │ │ │ │ │
 # │ │ │ │ │
 # │ │ │ │ └───── day of week (0 - 6) (0 to 6 are Sunday to Saturday, or use names; 7 is Sunday, the same as 0)
 # │ │ │ └────────── month (1 - 12)
 # │ │ └─────────────── day of month (1 - 31)
 # │ └──────────────────── hour (0 - 23)
 # └───────────────────────── min (0 - 59)

“Execute /usr/local/sbin/check_ gsmsmsd every minute”.

As you can undestand from its name, it is in charge of check if gsmsmsd is running and relaunch it if necessary.

cat /usr/local/sbin/check_gsmsmsd

And here’s its content:

#!/bin/sh
if pidof gsmsmsd > /dev/null
then
    logger gsmsmsd is running
    exit 0
else 
    logger gsmsmsd is not running, restart forced
    rm -fr /var/spool/sms/* 
    /usr/bin/sudo -u gsmsend /usr/bin/gsmsmsd -d /dev/ttyS0 -s /var/spool/sms/  &
fi

Nothing special: the script checks wether the pid of gsmsmsd proces exists or not. If it exists, exit nicely logging that the daemon is running, else it logs that the daemon is not working, remove any contents from the sms spool directory and launches gsmsmsd as user gsmsend with the device on ttyS0 and the sms spool directory in /var/sms.  Not the “&”, the process is being detached from the terminal.

Do we have the sms spool directory? Bet not, in this case, let’s create it:

mkdir -p /var/spool/sms/

Now, let’s be sure that the user the gmssend daemon belongs to is able to read and write on that directory:

chown gsmsend. /var/spool/sms/

We have no more to do on the sms server side. Time to go on Nagios server.

On the Nagios server side

Here is the actual Nagios notification plugin. Use it as the notification plugin for you sms host and service contact.

Let’s give a look at the plugin:

#!/bin/bash

# Few lines coded by Giorgio Zarrelli zarrelli@linux.it- 
# 2014.
# This nagios plugin is free software, and comes with  
# ABSOLUTELY NO WARRANTY. It may be used, redistributed   
# and/or modified under the terms of the GNU General 
# Public Licence (see 
# http://www.fsf.org/licensing/licenses/gpl.txt).

CAT=$(which cat)
TR=$(which tr) 
FOLD=$(which fold)
HEAD=$(which head)

The first lines,  start with the shabang and few command substitutions that will auto configure the script finding the right paths to the system utilities we are going to use.
The following line will create a (pseudo) random string. On the sms server we will have to create a file containing the actual sms message and number, and each file must have a unique file name to avoid “collisions”.

RANDOM_STRING=$($CAT /dev/urandom | $TR -dc 'a-zA-Z0-9' | $FOLD -w 32 | $HEAD -n 1)

/dev/urandom is character special device which gives access to the random numer generator of the kernel. With the string we wrote, we have a command substitution a bit more complex than usual:

  1. We extract some random byte from urandom with cat  /dev/urandom;
  2. Pass the result (|) to tr wich will  delete from the random string everything but letters and numbers, so we won’t have any strange chars. To be true, tr translates all the chars from /dev/urandom to the chars defined in the charset we provide after “-dc”;
  3. On the third step we pass the output of the previouse operation (|) to the fold utility which formats everything in a 32 chars width column;
  4. Finally we take only the first line of the column an this is our random string.
Passwordless ssh connection

Well, what the plugin does is to put a file on the sms server. Easy, that’s it, but we want to do it in a secure way and here is where ssh becomes handy: you will setup everything we need to let our plugin connect to the sms-server using ssh and without any user/password prompt.

How do you reach this goal?

First, keep in mind wich user will execute this script: it’s “nagios” user.

So you have to setup a way to let nagios user to connect to the sms-server as users gmssend, so it will be able to write in the spool.

To do the magic you must create an rsa private key and it’s  public counterpart and the transfer the public key on the sms server. Then you reference the public key in the

~/.ssh/authorized_keys

of the gsmsend user and so, when you will connect the server will encrypt all the trafic with you public key of user nagios on the Nagios server and only this user will be able to decrypt it with the matching private key.

Now it’s time to create a new pair of keys for the user nagios, so let’s “su” to this user:

su - nagios

As user nagios, you can finally issue the keygen command:

ssh-keygen -t rsa -b 4096 -f key.to.connect.to.sms.server
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in key.to.connect.to.sms.server.
Your public key has been saved in key.to.connect.to.sms.server.pub.
The key fingerprint is:
7f:d7:6a:e2:bf:a0:38:11:7c:3f:46:38:c8:1b:4d:3a nagios@nagios-server
The key's randomart image is:
+---[RSA 4096]----+
|                 |
|          .      |
|       o = .     |
|        E = .    |
|        S* +     |
|        o.  +  . |
|         ...o.. .|
|        .. o.o.. |
|        .....++. |
+-----------------+

Time to create a safe directory to store the keys in:

mkdir ~/.ssh
chmod 700 ~/.ssh

And now let’s move the keys in:

mv key.to.connect.to.sms.server* .ssh/

Let’s restrict the permissions on the key files:

chmod 600 .ssh/key.to.connect.to.sms.server*

Let’s check the permissions on files and directories:

ls -lah .ssh/
totale 8,0K
drwx------ 2 nagios nagios   80 apr 17 21:49 .
drwxr-xr-x 3 nagios nagios   85 apr 17 21:39 ..
-rw------- 1 nagios nagios 3,2K apr 17 21:34 key.to.connect.to.sms.server
-rw------- 1 nagios nagios  741 apr 17 21:34 key.to.connect.to.sms.server.pub

Once you have the pair of key you can upload the public one to the sms-server using the ssh-copy-id utility with the following syntax:

ssh-copy-id -i public.key <username>@<host>

The username is the name of the user you will connect as to the sms-server, for this example is the gsmsend, since we want to write in its sms spool. The host is the sms-server and so:

ssh-copy-id -i .ssh/key.to.connect.to.sms.server.pub gsmsend@sms-server
The authenticity of host 'sms-server (xxx.xxx.xxx.xxx)' can't be established.
ECDSA key fingerprint is d3:57:2c:e0:96:91:2e:7e:c2:ce:31:a0:ff:bf:06:79.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
gsmsend@sms-server's password: 

Number of key(s) added: 1

Now , try loggin onto the remote host using:

ssh gsmsend@sms-server

and check that only the key(s) you wanted were added.

Time to see if you can login without being prompted for a password:

ssh -i key.to.connect.to.sms.server gsmsend@sms-server
The authenticity of host 'sms-server (xxx.xxx.xxx.xxx)' can't be established.
ECDSA key fingerprint is d3:57:2c:e0:96:91:2e:7e:c2:ce:31:a0:ff:bf:06:79.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'sms-server' (ECDSA) to the list of known hosts.

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Fri Apr 17 22:11:29 2015 from nagios-server

Well, WOW, you are in and no password prompt seen in the meanwhile!

We can make things easier writing down some lines of configuration so that the ssh connection to the sms-server will act as we will.

Configure a ssh connection to a host for a user is a straightforward task: go to the user’s home directory, and check for the presence of a

.ssh

directory, with the proper access rights: we already created it, so you should have it and with the proper access rights. If you don’t see it, look back in this post and you’ll find how to create.

Move inside the .ssh directory:

cd .ssh

There you’ll find the private and public keys we already created. Ignore them, now it’s time to edit a new file:

vi .config

Here you are going to write the following lines:

"Host sms-server"
"AddressFamily inet"
"ConnectionAttempts 10"
"ForwardAgent no"
"ForwardX11 no"
"ForwardX11Trusted no"
"GatewayPorts yes"
"HostBasedAuthentication no"
"HostKeyAlias sms-server"
"HostName sms-server"
"IdentityFile ~/.ssh/key.to.connect.to.sms.server"
"PasswordAuthentication no"
"Port 22"
"Protocol 2"
"ServerAliveCountMax 3"
"ServerAliveInterval 15"
"TCPKeepAlive no"
"User gsmsend"

Let’s see what do they mean, line by line:

"Host sms-server"

As said in the manual

man ssh_config

The keyword host limits the scope of the following declarations. Take the word following Host as a label for a bunch of instructions and give it a proper name.

"AddressFamily inet"

That is for the “kind” of address we use to connect to the remote host.  Any, inet to use ipv4 only or inet6 for ipv6.

"ConnectionAttempts 10"

Let’s say that after 10 tries we give up. One try per second. The default is 1.

"ForwardAgent no"

Do you want to forward to the remote host the connection to the autentication agent ? No, trust me, you do not want.

"ForwardX11 no"

We do not need, we do not use, better to switch it off. We do not want to have the X11 connections  automatically redirected over the secure channel and DISPLAY set.

"ForwardX11Trusted no"

We do not want remote X11 clients to fiddle with data owned by trusted X11 clients.

"GatewayPorts yes"

We allow remote hosts to connect to local forwarded ports. Can be useful.

"HostBasedAuthentication no"

We do not need to try rhosts based authentication with public key authentication.

"HostKeyAlias sms-server"

Is just an alias to be used instead of the real host name when looking up or saving the host key in the host key database files. Useful when you have multiple servers running on the same host.

"HostName sms-server"

It should specify the real host name we want to log into. Well we use it to specify an alias for the host we want to connect to.  Write what makes it easy for you to remember the host you want to connect to.

"IdentityFile ~/.ssh/key.to.connect.to.sms.server"

Easy to guess, here we point to the file containing the identity we will use to autenticate to the remote host.

"PasswordAuthentication no"

We are using an identity file to authenticate so let’s make sure we do not fiddle with passwords.

"Port 22"

Here we define wich remote port to connect to. The port 22 is the standard for the ssh service but if you want to make more difficult a brute force attack from automated penetration tools, change the port on the remote host and write here the new value.

"Protocol 2"

Really? You would use the old protocol version 1? Not at all!

"ServerAliveCountMax 3"

It sets the number of max retry sending server alive messages and not receiving an answer from the remote host. Let’s take it as a timeout counter. If the host doesn’t answer to the server alive messages for 3 times, the ssh session will be disconnected.

"ServerAliveInterval 15"

This keyword sets the interval in seconds after wich if no data is received from the remote server ssh will send a server alive message through the ssh encrypted channel.

"TCPKeepAlive no"

TCP keepalive are not sent through a secure channel and so they are spoofable. Better not to use it.

"User gsmsend"

This is the user on the remote host we are logging as.

The dirty thingy string
$SSH SMS_SERVER "echo -e "$MOBILE_NUMBER\"\\n\"$NOTIFICATION_TEXT" > $SMS_SPOOL_DIR/$RANDOM_STRING"

At the end of the script you’ll find the string that actual do the dirty job.

  1. First it connects to the remote hosts;
  2. Then, on the remote host, it echoes the mobile number;
  3. A new line char follows;
  4. And on the second line there will be the notification text;
  5. Finally, the standard output to  a file in the spool directory, using as a file name  a random string calculated at the beginning of the script .

That’s it. We won’t look at the rest of the code, it’s quite simple and should give us no problems.

Just have a look to a handy variable definition:

SMS_SERVER=${SMS_SERVER:="change.me"}

When the variable is defined with a

${SOMETHING:="SOMEVALUE"}

means that if you do not pass any value, the variable defaults to

SOMEVALUE

call it a fallback value.

Here you find the actual script:

#!/bin/bash

# Few lines coded by Giorgio Zarrelli zarrelli@linux.it- 
# 2014.
# This nagios plugin is free software, and comes with  
# ABSOLUTELY NO WARRANTY. It may be used, redistributed   
# and/or modified under the terms of the GNU General 
# Public Licence (see 
# http://www.fsf.org/licensing/licenses/gpl.txt).

CAT=$(which cat)
TR=$(which tr)
FOLD=$(which fold)
HEAD=$(which head)
RANDOM_STRING=$($CAT /dev/urandom | $TR -dc 'a-zA-Z0-9' | $FOLD -w 32 | $HEAD -n 1)
SMS_SPOOL_DIR="/var/spool/sms"
ECHO=$(which echo)
SSH=$(which ssh)
SMS_SERVER=${SMS_SERVER:="change.me"}
MOBILE_NUMBER=${MOBILE_NUMBER:="+1234567890"}
NOTIFICATION_TEXT=${NOTIFICATION_TEXT:="CHANGE ME"}


print_license() {

$ECHO ""
$ECHO "This nagios plugin is free software, and comes with ABSOLUTELY" 
$ECHO "NO WARRANTY. It may be used, redistributed and/or modified under" 
$ECHO "the terms of the GNU General Public Licence (see" 
$ECHO "http://www.fsf.org/licensing/licenses/gpl.txt)."
$ECHO ""
exit 0


}

print_help() {

$ECHO ""
$ECHO "This plugins allows you to send sms notifications using a remote gsmsend server."
$ECHO ""
$ECHO "It requires three parameters:"
$ECHO -e "\n"
$ECHO " -s name of the sms server it must be the same name you wrote in config file in .ssh dir of the nagios user"
$ECHO -e "\n"
$ECHO " -n +xxxxxxx is the mobile number of the of the contact receiving the notification"
$ECHO -e "\n"
$ECHO " -t 'Message' is the message to send."
$ECHO ""
$ECHO ""
$ECHO "Other parameters:"
$ECHO -e "\n"
$ECHO "-l Prints the license of this program"
$ECHO -e "\n"
$ECHO "-c Prints a .ssh/config example"
$ECHO ""
exit 0
}


print_ssh_config() {

$ECHO ""
$ECHO "Host sms-server"
$ECHO "AddressFamily inet"
$ECHO "ConnectionAttempts 10"
$ECHO "ForwardAgent no"
$ECHO "ForwardX11 no"
$ECHO "ForwardX11Trusted no"
$ECHO "GatewayPorts yes"
$ECHO "HostBasedAuthentication no"
$ECHO "HostKeyAlias sms-server"
$ECHO "HostName xxx.xxx.xxx.xxx"
$ECHO "IdentityFile ~/.ssh/my.private.key.to.gsmsend.server.key"
$ECHO "PasswordAuthentication no"
$ECHO "Port 22"
$ECHO "Protocol 2"
$ECHO "ServerAliveCountMax 3"
$ECHO "ServerAliveInterval 15"
$ECHO "TCPKeepAlive no"
$ECHO "User gsmsend"
$ECHO ""


}


# Check wheter the script has at least one argument.

if [ $# -lt 1 ]

then
print_help
exit $STATE_UNKNOWN
fi


case "$1" in
-h | --help)
print_help
exit $STATE_OK
;;
-l | --license)
print_license
exit $STATE_UNKNOWN
;;
-c | --config)
print_ssh_config;
exit $STATE_UNKNOWN
;; 
-n | --number)
shift
MOBILE_NUMBER=$1
;;
-t | --text)
shift
NOTIFICATION_TEXT=$1
;;
-s | --server)
shift
SMS_SERVER=$1
;;
*) $ECHO "Unknown argument: $1"
print_help
exit $STATE_UNKNOWN
;;
esac
shift
done

$SSH SMS_SERVER "echo -e "$MOBILE_NUMBER\"\\n\"$NOTIFICATION_TEXT" > $SMS_SPOOL_DIR/$RANDOM_STRING"
Setting up Nagios to send SMS

Now we have all the bits we require to send some notifications using SMS, what we need now is to glue all together in Nagios.

We need to:

  1. Copy the code above and save on a file;
  2. Upload the file on the Nagios plugin directory on the Nagios server;
  3. Make the file executable by the nagios user;
  4. Create a Nagios command to call the script from within Nagios;
  5. Create a contact which will use this command as a notification command;
  6. Assign the command.

Let’s say we call the new script

send_sms.sh

Copy it on the Nagios plugin directory server, make nagios user its owner

chown nagios. send_sms.sh

and make it executable by Nagios, restricting a bit the rights on it:

chmod 550 send_sms.sh

Now we create two different notify command, one for hosts notifications and another for services notifications:

define command {
command_name notify-host-by-sms
command_line $USER1$/send_sms.sh -n '$CONTACTPAGER$' -t "'$HOSTNAME$' '$HOSTSTATE$' '$DATE$' '$TIME$'"
register 1
}

define command {
command_name notify-service-by-sms
command_line $USER1$/send_sms.sh -n '$CONTACTPAGER$' -t "'$HOSTNAME$' '$SERVICEDESC$' '$SERVICESTATE$' '$SERVICEOUTPUT$' '$DATE$' '$TIME$'"
register 1
}

Notice that

$USER1$

points to the Nagios plugins directory and is defined in the file

resource.cfg

All the other variables are passed to the script by Nagios. If you want more informations on Nagios macros, follow this link

http://nagios.sourceforge.net/docs/3_0/macros.html

Now it’s time to create a contact which will use the new command

define contact {
contact_name oncall-sms
alias SMS notifications to on call mobile
host_notifications_enabled 1
service_notifications_enabled 1
host_notification_period 24x7
service_notification_period 24x7
host_notification_options d,u,r,f,s
service_notification_options w,u,c,r,f,s
host_notification_commands notify-host-by-sms
service_notification_commands notify-service-by-sms
can_submit_commands 1
retain_status_information 0
retain_nonstatus_information 1
pager +XXXXXXXXXXXX
register 1
}

Just put after

pager

the mobile number to send SMS notifications to.

Let’s attach the SMS notification to a service

define service {
service_description On call service
host_name my_host
check_command my_command
is_volatile 0
max_check_attempts 3
check_interval 5
retry_interval 1
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
parallelize_check 1
obsess_over_service 1
check_freshness 0
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 1440
notification_period 24x7
notification_options w,u,c,r
notifications_enabled 1
contacts oncall-sms
failure_prediction_enabled 1
register 1
}

Now, let’s do the same for the host:

define host {
host_my_host
check_command check-host-alive
max_check_attempts 4
check_interval 5
retry_interval 1
passive_checks_enabled 1
check_period 24x7
check_freshness 1
freshness_threshold 0
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
contacts oncall-sms
notification_interval 1440
notification_period 24x7
notification_options d,u,r,f,s
notifications_enabled 1
failure_prediction_enabled 1
register 1
}

Keep in mind that these are just examples: modify your Nagios objects accordingly.

Final step, check that the modified configuration is not broken

nagios -v /path/to/nagios/main/config/file/nagios.cfg

If it’s all ok, reload Nagios and try to get a CRITICAL notification on a fake service or host witch uses oncall-sms as contact. Don’t you have it? Create a fake just for testing and if you do not see any errors and everything is working fine, attach the new contact to your most valuable host and services.

A mod_gearman chart

 

mod_gearman_diagram

 

 

 

 

 

 

 

 

 

 

 

Some thoughs on mod_gearman benefits.

  • It calls back  Gearmand, so it can be fitted in a vlan behind a firewall and you do not have to open any ports towards the vlan;
  • You can have more than one worker to monitor a host or service. They share the load, so you have a round robin balancing and if one faints, the others take all the checks and execute them. So, you have an easy way to  redundancy for the workers.

In this chart I “collapsed” the gearmand daemon under the  “Monitor” box.