Background
I use MythTV quite frequently and noticed that it is instable when using sasc-ng as a decoder to decrypt encrypted DVB-T channels. So approximatly every third day the MythTVbackend server stops and need to be started again. I have wriiten an earlier article about howto monitor MythTV with Nagios or op5 Monitor so I get noticed that it has stopped. But I need to manually start it again. This article describe howto make Nagios or op5 Monitor to start a stopped MythTVbackend. It can be used for starting almost any service.
I have used the examples provided by Ethan at Nagios official documentation describing eventhandlers.
Normally it is not recommended to let a tool like Nagios or op5 Monitor start a service that has stopped, because it is probably a reason why the service has stopped and the correct procedure is to fix the root cause of the problem, not the symptom.
The MythTV backend runs on one machine called lala (after a character in Teletubbies) which is not the same as the Nagios or op5 Monitor server. I use nrpe to run the start script i.e.
/etc/init.d/mythtv-backend start
There is several options here but I already setup the nrpe agent and it is simple to make Nagios or op5 Monitor to use nrpe to run a script.
Implementation
I used the script I found at Nagios documentation about eventhandlers as a base and modiied it slightly.
At my op5 Monitor machine
/opt/plugins/custom/restart-mythtv-lala.sh
#!/bin/sh # # Event handler script for restarting the mythTVbackend server on lala # # Note: This script will only restart the mythtvbackend if the service is # retried 2 times (in a "soft" state) or if the service somehow # manages to fall into a "hard" error state. # # What state is the mythbackend service in? case "$1" in OK) # The service just came back up, so don't do anything... ;; WARNING) # We don't really care about warning states, since the service is probably still running... ;; UNKNOWN) # We don't know what might be causing an unknown error, so don't do anything... ;; CRITICAL) # Aha! The HTTP service appears to have a problem - perhaps we should restart the server... # Is this a "soft" or a "hard" state? case "$2" in # We're in a "soft" state, meaning that Nagios is in the middle of retrying the # check before it turns into a "hard" state and contacts get notified... SOFT) # What check attempt are we on? We don't want to restart the web server on the first # check, because it may just be a fluke! case "$3" in # Wait until the check has been tried 3 times before restarting the web server. # If the check fails on the 4th time (after we restart the web server), the state # type will turn to "hard" and contacts will be notified of the problem. # Hopefully this will restart the web server successfully, so the 4th check will # result in a "soft" recovery. If that happens no one gets notified because we # fixed the problem! 2) echo "`date` Restarting mythtv service (2rd soft critical state)..." >> /tmp/mythtvstart # Call the init script to restart the mythbackend server #/etc/rc.d/init.d/httpd restart #date >> /tmp/mythtvstart /opt/plugins/check_nrpe -H lala -c start_mythtvbackend ;; esac ;; # The mythtvbackend service somehow managed to turn into a hard error without getting fixed. # It should have been restarted by the code above, but for some reason it didn't. # Let's give it one last try, shall we? # Note: Contacts have already been notified of a problem with the service at this # point (unless you disabled notifications for this service) HARD) echo "`date` Restarting mythtv service (hard state)..." >> /tmp/mythtvstart # Call the init script to restart the HTTPD server #/etc/rc.d/init.d/httpd restart #date >> /tmp/mythtvstart /opt/plugins/check_nrpe -H lala -c start_mythtvbackend ;; esac ;; esac exit 0
/opt/monitor/misccomands.cfg
# command 'restart-mythtv-lala' define command{ command_name restart-mythtv-lala command_line /opt/plugins/custom/start-mythtv-lala.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ }
/opt/monitor/etc/services.cfg # service 'Mythbackend' define service{ use default-service host_name lala service_description Mythbackend check_command check_tcp!6543 servicegroups MythTV,it-slav event_handler restart-mythtv-lala!$SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ contact_groups it-slav_sms,it-slav_jabber,it_slav_mail }
At my mythbackend machine lala
/etc/nrpe.d/mycommands.cfg command[start_mythtvbackend]=/usr/bin/sudo /etc/init.d/mythtv-backend start /etc/sudoers nobody ALL= (root) NOPASSWD:/etc/init.d/mythtv-backend startNotice that my nrpe agent run as user nobody
Test
I stopped the mythtvbackend by running:
peter@lala:/etc/nrpe.d$ date Mon Jun 15 20:40:55 CEST 2009 peter@lala:/etc/nrpe.d$ sudo /etc/init.d/mythtv-backend stop * Stopping MythTV server: mythbackend
And run
[root@op5 ~]# tail -f /tmp/mythtvstart Mon Jun 15 20:47:09 CEST 2009 Restarting mythtv service (2rd soft critical state)...
YES it works!
Links:
- op5 Monitor a Nagios based supported enterprise Monitoring software.
- MythTV a free OpenSource Digital Video Recorder
- Nagios Open Source Monitoring
2 Responses to “Using Nagios or op5 Monitor eventhandler to start a service that has stopped”
Leave a Reply
You must be logged in to post a comment.
December 8th, 2010 at 2:08 pm
My event_handler does not even start! What is cause?
December 8th, 2010 at 3:19 pm
How do you know that?