I’ve been setting up a Puppet system at work so we can easily set up virtual servers, and also so that all the configuration is in one place.
Yesterday, someone suggested to me how I can have our existing Nagios system monitor the state of the puppet configuration. This allows me to be notified if there is something causing puppet to fail on any of the monitored nodes.
There is a Ruby script out there that is a Nagios plugin, however it requires extra Ruby libraries, and I don’t know how to handle them nicely on a Debian system. I do, however, know how to handle Perl libs. So I wrote a Perl plugin that does the same task. It’s got a lot of hard-coded paths and times and such, you’ll want to make completely sure that they work in your configuration. It’s also not well documented, but it is quite basic: check_puppet.pl
The main tricky dependency it has is on Nagios::Plugin, but it’s in CPAN, so some dh-make-perl should get you a Debian package for it easily.
I am using a simple file check for this:
/usr/lib/nagios/plugins/check_file_age -f /var/lib/puppet/state/state.yaml -w 5400 -c 7200
It will warn you if puppet isn’t doing anything after 5400 seconds, and mark critical after 2 hours
It picks up daemons locking up, compile errors, crashes, etc.
The good thing ofcourse is that it doesn’t have any dependancies!
That’s not a bad idea, but my Perl script does that, and it also checks that the daemon is running immediately. A minor difference in practice I guess though, the script ageing thing will tell you eventually.
We’re using the state.yaml monitor, the problem is that it doesn’t catch bad manifests. (The client will happily run off the cache in that case, and state.yaml gets a new timestamp.)
Through a random Google search trying to solve this exact problem, I went “oh, that guy’s blog has the same name as Robin’s”
And you were right! Moreover, the guy who *writes* that blog has exactly the same name as the guy who writes Robin’s! Will the coincidences never stop?!
Pingback: Monitoring Puppet – Part 1 « Some Softwaremanagement