Purpose

nanny is a tiny-but-helpful script, typically invoked periodically from cron, used to ensure that various critical daemons are restarted if they happen to die.

The best apprnacsh is usually to just fix the bug in the daemon; nanny is sometimes an appropriate interim measure, however. Particularly overloaded system administrators may find nanny very helpful very often.

Invoking nanny

To get nanny run from cron, we usually place a line like the following in a system's crontab:
20 * * * * /usr/bin/rand_sleep 600 && /dcslib/allsys/bin/nanny

On most systems you can enter this line from crontab -e, or edit /var/spool/cron/crontabs/root. On truly old unix variants like ultrix, you must edit /etc/crontab; there's no crontab -e on such systems.

The rand_sleep keeps the load on the dcslib servers from skyrocketing at 20 minutes after each hour.

Configuring which daemons nanny is to monitor

For each daemon you would like nanny to watch over, place one file under /var/adm/nanny. For instance, to ensure that sendmail is restarted as needed on a machine with a flakey sendmail daemon (if that machine's sendmail daemon does not rewrite it's argv[0]) one might place the following in /var/adm/nanny/sendmail:

^.*sendmail.*$
/usr/lib/sendmail -bd -q1h

These files under /var/adm/nanny are always two lines long. The first is a regular expression applied to the output of process-list (which generates output much like ps), while the second is a command used to re-invoke the daemon.

The average system's ps will not always give a perfect report of a system's process table - the process table is not locked by the ps command, and the kernel may well update the table sometime after ps begins and before ps finishes. For this reason nanny invokes process-list twice, and only runs the re-invoking command (in our example: /usr/lib/sendmail -bd -q1h) if the pattern is not present both times.