The Eye Of Horus 0.2

The Eye Of Horus is a monitoring and alerting tool for computers.
The Eye Of Horus is a monitoring and alerting tool for computers. It's mainly useful for monitoring network services (eg, HTTP or SMTP servers) and the internal status of Unix servers (eg, load, disk usage, process counts).

In that respect, it's a lot like Nagios, but in my opinion it's better. It lacks a few features Nagios has, but it is a very simple architecture to which they can easily be added.

It's a flexible thing made from independent modules with well-defined interfaces, making it easy to customise and extend, but out of the box it'll monitor your servers and produce a nice HTML summary of their status - OK, the looks need a bit of work, but that will come soon, and it can optionally integrate with the excellent (and I mean excellent) RRDTool to store logs of statistics (response times, number of packages with known security holes, etc) - and link from the status page to nice graphs of the historical behaviour of these statistics.

HOW IT WORKS

The core of the system is horus-check.py, a Python script which reads a configuration file (specified on the command line). The configuration file specifies a list of services - either network services, in which case the host to run the check from and the host to run the check 'at' are specified, or local services, in which case only the host to run the check from need be specified. In either case, if the host to run the check from is not specified, then it defaults to the local host.

The service types reference definitions in a file which is referenced from the configuration file. In the service definitions file, a shell command to check the service is given; this command must output service status in a defined format, as a single-line YAML list. The list must contain, at least, a single-word status (OK, WARNING, FAILURE, or UNKNOWN), then optionally numeric statistics, then optionally a status message. For example:

[OK]
[UNKNOWN]
[OK, { load: 0.5, users: 3 }]
[WARNING, { load: 3, users: 30 }]
[FAILURE, { load: 95, users: 300 }]
[UNKNOWN, { }, Could not find AWK executable]

When a check is to be performed from a remote host, Horus opens an ssh connection to that host. It is assumed that the user horus is run as will have an ssh key set up to enable it to ssh to all such hosts without requiring a password.

Having performed the checks, horus-check.py then:

Reads in the status database named in the configuration file
Updates the status database with the new status of hosts
Computes an overall system status (the worst non-unknown status of any checked service)
Examines the service dependencies, and marks any service whose state is no worse than might be expected (eg, no worse than the worst state of a service it depends upon) are automatically marked as 'quiet'
Computes a list of differences between the old and new status (services added, services removed, services whose status has improved, services whose status has worsened)
If there are any differences, invokes a notification script (named in the configuration file) with them, along with the overall status
Invokes a logging script (named in the configuration file) with the new value of every statistic reported by the service checks; I will soon provide a sample logging script that uses RRDTool to generate nice graphs.

The status database (which is written in YAML, so easily accessible to user scripts) can then be used to generate HTML status report (see status.cgi).

Requirements:

PyYAML

Installation:

Copy and edit example.conf to suit your setup. Perhaps edit types.conf to add extra service types, if required, or change the commands to work on your systems.
Write your own change notification script(s), that accept a human-readable summary of the changes on stdin, and do something useful like email or SMS them on, then reference them in the notify-commands field of the configuration file.

Write your own parameter change notification script(s), that accept command line arguments like the supplied sample log.sh, and do something useful like update an RRDTool log, then reference them in the param-log-commands field of the configuration file.

Write your own scripts that parse the file specified in the status-database field of the configuration and produce funky system status displays. Try status.cgi as a starting point.

Run python horus-check.py at regular intervals, perhaps every five minutes from cron.

Set up status.cgi somewhere Apache will find it (edit it to point to the correct location of your status.db file) and you'll have a status report accessible via the Web. You can give GET parameters on the URL to filter the results:

host=hostname (to only show services on that host)
type=type (to only show services of that type)
status=OWUF (to only show services in a given set of statuses, eg WUF to only show warning, unkown, or failed services)

All the files are in YAML format, and have fairly self-explanatory structures, although I shall document them when they stabilise...

last updated on:
November 24th, 2006, 17:10 GMT
price:
FREE!
developed by:
KittenTech
homepage:
www.kitten-technologies.co.uk
license type:
GPL (GNU General Public License) 
category:
ROOT \ System \ Networking

FREE!

In a hurry? Add it to your Download Basket!

user rating

UNRATED
0.0/5
 

0/5

1 Screenshot
The Eye Of Horus

Add your review!

SUBMIT