Nagios Network Monitor - Installation and configuration

11-24-2008 03:58 AM #1

peter

Administrator Advisor

Nagios Network Monitor – Installation and configuration

by Wayne E Goodrich (Outlaw)
(Transferred from the wiki by Peter)

Introduction

If you manage a network of any size, you want to be notified of problems before your customers or your bosses find out, but you don’t want to be tied to a console checking for the availability of hosts and services. This is where Nagios shines. If you put in the time it takes to install and customize Nagios for your environment, you’ll be rewarded with a superb monitoring and notification solution that happens to be free. In this PET, I will guide you through the installation and configuration of Nagios, and I will provide examples of customizations you can add using plugins you can write yourself.
Gather up our packages

I will use Redhat Enterprise Linux AS 4.0 in these examples, but they can be adapted for any Linux distribution. The following are required packages for HTTPD services that will drive Nagios’s web interface:

Apache
Code:
httpd 
httpd-suexec 
apr-util
Optional (for secure sockets layer, HTTPS interface)
Code:
mod_ssl
If you selected the default package set during installation, these are already installed. If you opted not to make Apache available during Redhat install, you can grab the packages from RHN using up2date or by manually downloading them.
The following are needed for Nagios basic functionality, really it’s the Nagios framework we get. Nagios’s checks are accomplished entirely through the use of plugins, which are available in a separate package. From here on out, I will suggest getting prebuilt packages from Dag Wieers’s collection, and occasionally from CPAN. To make it easier on yourself, add Dag’s repositories if you use YUM.

Nagios

nagios-2.2-1.el4.rf.i386.rpm http://dag.wieers.com/packages/nagios/
The following are needed for Nagios to actually perform checks

Nagios Plugins

nagios-plugins-1.4.1-1.2.el4.rf.i386.rpm http://dag.wieers.com/packages/nagios-plugins/
fping-2.4-1.b2.2.el4.rf.i386.rpm http://dag.wieers.com/packages/fping/
perl-Crypt-DES-2.03-3.2.el4.rf.i386.rpm http://dag.wieers.com/packages/perl-Crypt-DES/
perl-Net-SNMP-5.0.1-1.2.el4.rf.noarch.rpm http://dag.wieers.com/packages/perl-Net-SNMP/
perl-IO-Socket-INET6-2.51-1.2.el4.rf.noarch.rpm http://dag.wieers.com/packages/perl-IO-Socket-INET6/
Digest-HMAC-1.01.tar.gz http://search.cpan.org/~gaas/Digest-HMAC-1.01/lib/Digest/HMAC.pm
Digest-SHA1-2.11.tar.gz http://search.cpan.org/~gaas/Digest-SHA1-2.11/SHA1.pm
Install Necessary Packages

We can begin installation of the packages by first installing Nagios:
Code:
rpm -ivh nagios-2.2-1.el4.rf.i386.rpm
Now we begin satisfying nagios-plugins dependencies:
Code:
rpm -ivh fping-2.4-1.b2.2.el4.rf.i386.rpm
rpm -ivh perl-Crypt-DES-2.03-3.2.el4.rf.i386.rpm
mkdir /tmp/perltmp
cp *gz /tmp/perltmp
cd /tmp/perltmp
find . -name "*gz" -exec tar xvzf {} \;
cd Digest-SHA1-2.11
perl Makefile.pl
make test
make install
cd ../Digest-HMAC-1.01
 perl Makefile.pl
make test
make install
cd ../Socket6-0.19
perl Makefile.pl
make test
make install
These next two Dag perl packages expect SHA1, HMAC and Socket6 to be available as rpms, but since they were not, we have to tell rpm not to check dependencies.
Code:
rpm -ivh --nodeps perl-Net-SNMP-5.0.1-1.2.el4.rf.noarch.rpm
rpm -ivh --nodeps perl-IO-Socket-INET6-2.51-1.2.el4.rf.noarch.rpm
rpm -ivh nagios-plugins-1.4.1-1.2.el4.rf.i386.rpm
Begin Configuration

Nagios has two methods for arranging its configuration files. One way relies on a single file where you specify hosts, groups, services etc. The other allows you to split these files up by purpose for ease of administration. The single file method can become unwieldy as you add machines and services to monitor. Here, we’ll assume the multiple definition file method.
Configure The Nagios Service

Let’s become familiar with the file locations that the Dag provided packages use as defaults:

Main Nagios Configs
Code:
/etc/nagios
Plugins and CGIs
Code:
/usr/lib/nagios
Nagios Web Files
Code:
/usr/share/nagios
Here, we see the example config files in /etc/nagios:
Code:
[radar@test2 ~]$ ls -lh /etc/nagios
total 160K
-rw-rw-r--  1 root root  30K Apr  8 08:28 bigger.cfg
-rw-rw-r--  1 root root 9.4K Apr  8 08:28 cgi.cfg
-rw-rw-r--  1 root root 4.8K Apr  8 08:28 checkcommands.cfg
-rw-r--r--  1 root root  16K Aug  5  2005 command-plugins.cfg
-rw-rw-r--  1 root root  14K Apr  8 08:28 minimal.cfg
-rw-rw-r--  1 root root 4.2K Apr  8 08:28 misccommands.cfg
-rw-rw-r--  1 root root  30K Apr  8 08:28 nagios.cfg
-rw-rw----  1 root root 1.3K Apr  8 08:28 resource.cfg
The first file we’re interested in is nagios.cfg, the main config file. This file specifies, among other things, the object config (definition) files. Those are what we are most interested in at this point. We want to open /etc/nagios/nagios.cfg in an editor and comment out the line that contains minimal.cfg. Then we’ll uncomment the lines containing the object config files that we’ll need to create, and populate with our definitions. Let’s go ahead and do that, then.
Code:
# You can split other types of object definitions across several
# config files if you wish (as done here), or keep them all in a
# single config file.
#cfg_file=/etc/nagios/minimal.cfg
Here, I have commented out minimal.cfg
Code:
cfg_file=/etc/nagios/contactgroups.cfg
cfg_file=/etc/nagios/contacts.cfg
#cfg_file=/etc/nagios/dependencies.cfg
#cfg_file=/etc/nagios/escalations.cfg
cfg_file=/etc/nagios/hostgroups.cfg
cfg_file=/etc/nagios/hosts.cfg
cfg_file=/etc/nagios/services.cfg
cfg_file=/etc/nagios/timeperiods.cfg
And here I have uncommented the object config files we will work with first, to get basic functionality. We will now create these and populate them with some hosts, services, groups, etc.

While we’re at it we want to enable service commands in the CGIs, and enable flap detection:

My sites: Linux Home Networking – Linux Quick Fix Notebook

Reply With Quote

11-24-2008 03:59 AM #2

peter

Administrator Advisor

Still in nagios.cfg, change:

Code:

check_external_commands=0
check_external_commands=1

and change:

Code:

enable_flap_detection=0
enable_flap_detection=1

open minimal.cfg and copy the timeperiod definition and paste it into a new file called timeperiods.cfg and save it.

Code:

define timeperiod{
        timeperiod_name 24x7
        alias           24 Hours A Day, 7 Days A Week
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
        }

Do the same for the contact definition and contact group definition. For hosts, copy the generic-host definition, along with the localhost definition and paste into hosts.cfg.

Code:

define host{
        name                            generic-host    ; The name of this host template
        notifications_enabled           1       ; Host notifications are enabled
        event_handler_enabled           1       ; Host event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }
# Since this is a simple configuration file, we only monitor one host - the
# local host (this machine).
define host{
        use                     generic-host            ; Name of host template to use
        host_name               localhost
        alias                   localhost
        address                 127.0.0.1
        check_command           check-host-alive
        max_check_attempts      10
        notification_interval   120
        notification_period     24x7
        notification_options    d,r
        contact_groups  admins
        }
define host{
        use                     generic-host            ; Name of host template to use
        host_name               testbox
        alias                   Testbox
        address                 192.168.0.4
        check_command           check-host-alive
        max_check_attempts      10
        notification_interval   120
        notification_period     24x7
        notification_options    d,r
        contact_groups  admins
        }

I have added a networked host to check. Copy the hostgroup definition from minimal.cfg and paste into the new hostgroups.cfg.

Code:

define hostgroup{
        hostgroup_name  test
        alias           Test Servers
        members         localhost,testbox
        }

I added our testbox to this group. We will need to copy the services definitions from minimal.cfg and paste them all into the new services.cfg file. Now we verify our work using nagios:

My sites: Linux Home Networking – Linux Quick Fix Notebook

Reply With Quote

11-24-2008 04:00 AM #3

peter

Administrator Advisor

Code:

[radar@test2 nagios]$ sudo nagios -v /etc/nagios/nagios.cfg Nagios 2.2 Copyright (c) 1999-2006 Ethan Galstad (http://www.nagios.org) Last Modified: 04-07-2006 License: GPL Reading configuration data... Running pre-flight check on configuration data... Checking services... Checked 5 services. Checking hosts... Warning: Host 'testbox' has no services associated with it! Checked 2 hosts. Checking host groups... Checked 1 host groups. Checking service groups... Checked 0 service groups. Checking contacts... Checked 1 contacts. Checking contact groups... Checked 1 contact groups. Checking service escalations... Checked 0 service escalations. Checking service dependencies... Checked 0 service dependencies. Checking host escalations... Checked 0 host escalations. Checking host dependencies... Checked 0 host dependencies. Checking commands... Checked 22 commands. Checking time periods... Checked 1 time periods. Checking extended host info definitions... Checked 0 extended host info definitions. Checking extended service info definitions... Checked 0 extended service info definitions. Checking for circular paths between hosts... Checking for circular host and service dependencies... Checking global event handlers... Checking obsessive compulsive processor commands... Checking misc settings... Total Warnings: 1 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check

If we had made a mistake, nagios would do its best to hint toward the problem. So all looks good for us to have a basic functioning setup. I will address the warning about no services set up for the testbox in a bit. We will now set up apache for authentication.
Configure HTTPD authentication and CGI accesses

Look at /etc/httpd/conf.d/nagios.conf to see how authentication files are set:
Code:
AuthName "Nagios Access"
  AuthType Basic
  AuthUserFile /etc/nagios/htpasswd.users
  Require valid-user
So we need to add nagiosadmin, who’s defined as a contact, in htpasswd.users:
Code:
sudo /usr/bin/htpasswd -c /etc/nagios/htpasswd.users nagiosadmin
Make sure this file is readable by the apache user, if not already:
Code:
sudo chmod 644 /etc/nagios/htpasswd.users
Now edit cgi.cfg, uncommenting the lines containing allowed actions for the nagiosadmin user.

Configure Nagios and Apache Services for Start
Code:
[radar@test2 ~]$ sudo /sbin/chkconfig --level 35 httpd on
   [radar@test2 ~]$ sudo /sbin/chkconfig --level 35 nagios on
Unfortunately, before we proceed, we have to disable SELinux. There is no policy (that I know of) created to allow nagios functionality with SELinux enabled apache. If anyone knows the solution, please see contact info at the end of this PET, and discuss. The easiest way to disable SELinux, is to go to applications, system settings, security level and select the selinux tab. Uncheck "Enabled (Modification Requires Reboot". Then click ok and reboot.

When the machine is up, we can point the browser to https://machine/nagios. We’ll see right away in the control panel that there’s an issue with the total processes check. By looking at /etc/nagios/services.cfg for check_local_procs we see the check definition:
Code:
check_local_procs!250!400
So lets look at our checkcommands.cfg file to see how that’s defined:
Code:
$USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
Right away, we see there’s a mismatch. The default service definition supplies only 2 arguments (delimited by the ‘!’), yet the command definition is looking for 3. Lets see what that -s is for:
Code:
cd /usr/lib/nagios/plugins
   ./check_procs -h | less
The help tells us that the -s is optional:
Optional Filters:
-s, –state=STATUSFLAGS
So we’ll remove that from the command definition for now:
Code:
define command{
        command_name    check_local_procs
        command_line    $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
        }
 define command{
        command_name    check_local_procs
        command_line    $USER1$/check_procs -w $ARG1$ -c $ARG2$
        }
We’ve removed the optional ps status flag.
Restart nagios:
Code:
[radar@test2 plugins]$ sudo /sbin/service nagios restart
Running configuration check...done
Stopping network monitor: nagios
Waiting for nagios to exit . done.
Starting network monitor: nagios
Now all is green! We have basic Nagios functionality and can start adding our customizations.

Adding Services To Nagios

Remember that when we verified nagios’s configuration, we got a warning about our testbox host not having any services associated with it. What this means is that, besides the obvious, nagios will not do any host alive checks against it. Nagios tries to spread out the checks in an efficient manner and will normally only check a host’s alive state when a service is failing. Once we establish a service for testbox. It will count the host as alive if the service associated with it succeeds. You can set up a service just to ping the box, but we’ll set up a custom command using one of the provided plugins.

Using a Supplied Plugin

I have started apache on our testbox, and will use the check_http plugin to define a command, and then from that, define a service to run against testbox. We can test the plugin directly so we know what to expect:
Code:
/usr/lib/nagios/plugins/check_http -h
Gives us the usage

Code:

[radar@test2 www]$ /usr/lib/nagios/plugins/check_http -H testbox -u /error/noindex.html HTTP OK HTTP/1.1 200 OK - 4177 bytes in 0.007 seconds |time=0.006624s;;;0.000000 size=4177B;;;0

Gives us the default new install page. We can use that to set up a service to test whether apache is up on testbox. Create a new config file in /etc/nagios called custom_cmds.cfg and place the following in it:

Code:

define command{ command_name check_apache command_line $USER1$/check_http -H $ARG1$ -S -u $ARG2$ }

Now open services.cfg in an editor and define a service to use this command definition:

Code:

define service{ use generic-service ; Name of service template to use host_name testbox service_description Check Apache is_volatile 0 check_period 24x7 max_check_attempts 4 normal_check_interval 5 retry_check_interval 1 contact_groups admins notification_options w,u,c,r notification_interval 960 notification_period 24x7 check_command check_apache!testbox!/error/noindex.html

We have to tell nagios that this new command file exists by adding the path to the file:
Code:
cfg_file=/etc/nagios/custom_cmds.cfg
I added that under the existing command definition. Now we can use this file to add custom command definitions. We need to verify that we did’nt make any mistakes:
Code:
    Total Warnings: 0
   Total Errors:   0
   Things look okay - No serious problems were detected during the pre-flight check
Good. We can restart nagios:
Code:
sudo /sbin/service nagios restart
We see that the new service is there, but it’s pending. We can force it by rescheduling the next check and accepting the default time, which is immediate. We now can see that the service is working.

Pretty easy, but we may also want to write our own plugin and make a service check from that. Let’s emulate the functionality of the check_http plugin, for illustration purposes, using available tools and wrap it up in a bash script.

Create Custom Plugin

To use this example, curl needs to be installed. It is by default on RHEL.
Nagios expects plugins to return a code telling what the status of the check is. The following details what the codes are:
Code:
0 = OK
   1 = WARNING
   2 = CRITICAL
   3 = UNKNOWN
The warning and critical exit codes are ideal for setting thresholds, such as CPU usage and load averages. But since our service is either on or off, we can use critical, ok, and unknown (for bad parameters passed).
This script takes arguments and passes them to the curl command. We’ll use it to get similar functionality as the check_http plugin.

My sites: Linux Home Networking – Linux Quick Fix Notebook

Reply With Quote

11-24-2008 04:00 AM #4

peter

Administrator Advisor

Code:

#!/bin/bash
#
# testweb.sh
#
#
BADCALL="Wrong combination of parameters $@"
printuse ()
{
cat <<End-of-usage
Usage:   ./testweb.sh -h [hostname] [-H|S]
         ./testweb.sh -h [hostname] [-H|S] -p [port]
Example: ./testweb.sh -h www.redhat.com -S
         ./testweb.sh -h 192.168.0.10 -p 7778
End-of-usage
}
# Rudimentary check for proper number and combination of parameters
if [ "$#" -lt 3 ] || [ "$#" -gt 5 ] || [ "$#" -eq 4 ] || [ "$1" != "-h" ] || \
   [ ! `echo "$3" | grep [S,H]` ]
then
    echo "$BADCALL"
    printuse
    exit 3
elif [ "$#" -eq 5 ] && [ "$4" != "-p" ] || [ `echo "$5" | grep [^0-9]` ]
then
    echo $BADCALL
    printuse
    exit 3
fi
# Set the URL prefix based on parameter 3
if [ "$3" == "-S" ]
then
    PRE=https://
else
    PRE=http://
fi
# Build URL
HOST="$2"
if [ "$#" -eq 5 ]
then
    PORT=":$5"
    URL="$PRE$HOST$PORT"
else
    URL="$PRE$HOST"
fi
curl -k -s -I -w "%{size_header} bytes in %{time_total} seconds\n\n" $URL >/tmp/$HOST.header.txt
case "$?" in
    "7")
    MSG=`cat /tmp/$HOST.header.txt`
    echo "CRITICAL - Failed to connect => $MSG"
    exit 2
    ;;
    "0")
    STAT=`grep seconds /tmp/$HOST.header.txt`
    SRV=`grep Server /tmp/$HOST.header.txt | awk '{print $2}'`
    echo "OK - $SRV => $STAT"
    rm -f /tmp/$HOST.header.txt
    exit 0
    ;;
esac

And we save this in /usr/lib/nagios/plugins as testweb.sh and make it executable:

Code:

chmod 755 /usr/lib/nagios/plugins/testweb.sh

Let’s see how to use the plugin:

Code:

[radar@test2 nagios]$ /usr/lib/nagios/plugins/testweb.sh -h testbox -S
     OK - Apache/2.0.52 => 199 bytes in 0.354 seconds
     [radar@test2 nagios]$ /usr/lib/nagios/plugins/testweb.sh -h testbox -H
     OK - Apache/2.0.52 => 199 bytes in 0.008 seconds

SSL seems considerably slower, as can be expected.
We can use this now to define a new service. Let’s edit /etc/nagios/custom_cmds.cfg and add a command.

Code:

define command{
            command_name    check_apache_also
            command_line    $USER1$/testweb.sh -h $ARG1$ -S
            }

Now we edit services.cfg and define the service:

Code:

define service{
             use                             generic-service         ; Name of service template to use
             host_name                       testbox
             service_description             Check Apache Also
             is_volatile                     0
             check_period                    24x7
             max_check_attempts              4
             normal_check_interval           5
             retry_check_interval            1
             contact_groups                  admins
             notification_options            w,u,c,r
             notification_interval           960
             notification_period             24x7
             check_command                   check_apache_also!testbox
             }

And we verify our changes with nagios:

Code:

[radar@test2 nagios]$ sudo nagios -v /etc/nagios/nagios.cfg
     Total Warnings: 0
     Total Errors:   0
     Things look okay - No serious problems were detected during the pre-flight check

Now restart nagios:

Code:

[radar@test2 nagios]$ sudo /sbin/service nagios restart

The service will show pending, so force its schedule as before. And we see it works!

Conclusion

It took a little configuration, but it’s quite easy to have a functioning Nagios install, with reliable checks. There is quite a bit more to nagios, all of which you’ll want to get working. Things like service groups, notifications, dependencies and escalations will further refine the way Nagios works for you. Nagios is well documented – you can view the help files right from within a working install, or go over to Nagios’s project site.

Links

Nagios Project Site
Nagios Exchange

Next

Coming soon: Nagios Remote Process Executor (NRPE) and a custom remote plugin example

My sites: Linux Home Networking – Linux Quick Fix Notebook

Reply With Quote

12-29-2010 03:27 AM #5

dayakar

Newbie dayakar

Sir can u please help me how to install nagios in linux and from there how to monitor the windows Desktop and Server machines

Reply With Quote

Thread: Nagios Network Monitor – Installation and configuration

Thread Tools

Display

Nagios Network Monitor – Installation and configuration

Similar Threads

cluster installation and configuration

Network Monitor in Windows 2000

Issue with Nagios Plugins..any Nagios expert wishing to share emails…

network configuration???

Network configuration

Bookmarks

Bookmarks

Posting Permissions