Geek Foibles

amavisd not starting after restarting Mac OS X Server
February 6, 2009, 4:31 am
Filed under: Uncategorized | Tags: , , , ,

I had a recent spot of trouble with Mac OS X Server for a client who has mail services running.  Due to a hardware issue I had to force off the server, but when I started it back up I noticed that while Postfix would accept mail from senders, it wouldn’t actually deliver it to the destination mailbox.  Instead, I’d get this message in mail.log:

postfix/error[7063]: 56118A83885: to=<>, relay=none, delay=4776, delays=4776/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to[]: Connection refused)

The complaint that it wasn’t delivering mail due to an inability to connect to the local host suggested some sort of mail filter not running, of which there are two on Mac OS X Server: clamav (used for virus scanning) and amavisd (used for spam filtering).  In my case it was easy to narrow it down, since clamav wasn’t enabled.  I checked amavis.log and found that it hadn’t been updated since I’d restarted, so it seemed clear it wasn’t launching.  By looking in system.log I could see launchd was trying to start it repeatedly since it was enabled in Server Admin, but it kept failing:

Feb  5 23:59:36 hq org.amavis.amavisd[5404]: Can't send SIG 0 to process [68]: Operation not permitted at /usr/bin/amavisd line 11203.
Feb  5 23:59:36 hq[1] (org.amavis.amavisd[5404]): Exited with exit code: 1
Feb  5 23:59:36 hq[1] (org.amavis.amavisd): Throttling respawn: Will start in 10 seconds

It was trying to kill process 68, which in my case was kadmind.  This didn’t make sense since kadmind deals with Kerberos, something spam filtering has nothing to do with.  Fortunately amavisd is a Perl script and it was giving me the line number, so I was able to open it and hunt down the offending code.  I found that when amavisd starts, it looks to see if there’s already a PID file (located at /var/amavis/, gets the PID out of that file and then tries to kill said PID.  So, if another copy of amavisd is already running when you start it, it kills the old one so that they don’t conflict.  Problem is, in my case the PID file was stale: it hadn’t been cleaned up after amavisd’s last run since I’d had to force off the system, and now the PID was in use by another, unrelated process that it doesn’t have permission to kill.

amavisd does check to see if the PID file was created before the last system startup, which would allow it to ignore stale PID files, but it only does this after trying to kill the old process.  This bug was fixed in December in version 2.6.2 of amavis, but unfortunately for Mac OS X Server users, Apple uses version 2.5.1 from May of 2007.

Removing the stale PID file resolved the issue.  Its removal allowed launchd’s next automatic attempt to start amavisd to succeed, then a few minutes later Postfix’s mail queue manager started being able to deliver all the mail queued during the downtime to the appropriate mailboxes.


2 Comments so far
Leave a comment

Saved my bacon… thanks for the clue!

Comment by Bill Eccles

Thanks a bunch !!!!

Comment by Franck Horlaville

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: