I'm running Apache 2.2.3 on Debian Etch. It generally runs happily and I haven't seen any issues since I upgraded the hosting several months ago, but at the weekend it died and I can't work out why.
I have two things I was hoping people might be able to help with.
/var/log/apache2/error.log includes: [Sun Oct 21 06:26:54 2007] [warn] child process 11300 still did not exit, sending a SIGTERM [Sun Oct 21 06:26:56 2007] [warn] child process 11300 still did not exit, sending a SIGTERM [Sun Oct 21 06:26:58 2007] [warn] child process 11300 still did not exit, sending a SIGTERM [Sun Oct 21 06:27:00 2007] [error] child process 11300 still did not exit, sending a SIGKILL [Sun Oct 21 06:27:01 2007] [notice] caught SIGTERM, shutting down
The last row before that was from a week before, and shouldn't be part of the same incident. 6am is definitely not a peak time for this box. I couldn't find anything in any other logs that seemed relevant.
Anyone got any idea what that was, or where I can look for more detail?
Secondly, I really ought to have some mechanism to check on Apache (and MySQL) periodically and make sure they're running, either restarting them or alerting me (although I can be difficult to get hold of). What do people recommend for this? Is it as simple as having cron run a script to start the daemon if it's not already on the stored PID, or are there issues I should be wary of?
Thanks, Matthew
On Wed, Oct 24, 2007 at 01:04:13PM +0100, Matthew wrote:
I'm running Apache 2.2.3 on Debian Etch. It generally runs happily and I haven't seen any issues since I upgraded the hosting several months ago, but at the weekend it died and I can't work out why.
I have two things I was hoping people might be able to help with.
/var/log/apache2/error.log includes: [Sun Oct 21 06:26:54 2007] [warn] child process 11300 still did not exit, sending a SIGTERM [Sun Oct 21 06:26:56 2007] [warn] child process 11300 still did not exit, sending a SIGTERM [Sun Oct 21 06:26:58 2007] [warn] child process 11300 still did not exit, sending a SIGTERM [Sun Oct 21 06:27:00 2007] [error] child process 11300 still did not exit, sending a SIGKILL [Sun Oct 21 06:27:01 2007] [notice] caught SIGTERM, shutting down
The last row before that was from a week before, and shouldn't be part of the same incident. 6am is definitely not a peak time for this box. I couldn't find anything in any other logs that seemed relevant.
Anyone got any idea what that was, or where I can look for more detail?
Hmm, that looks at around the right time for a logrotate, which includes a graceful to apache... I'd guess that there was something making apache very very unhappy - but it's hard to tell from that!
Secondly, I really ought to have some mechanism to check on Apache (and MySQL) periodically and make sure they're running, either restarting them or alerting me (although I can be difficult to get hold of). What do people recommend for this? Is it as simple as having cron run a script to start the daemon if it's not already on the stored PID, or are there issues I should be wary of?
I'd test the services as apposed to the PIDs that you think they should have... throw a test page up on apache with some known content and then test against that every once in a while... I'm also not one for automatically restarting services - it masks potential issues which then may go unnoticed for a long time. We use nagios + clickatell (sms service) to monitor our systems - with a period between about 11.30pm and 6.30am where smses aren't sent (thank god!).
Hope that helps,
Hmm, that looks at around the right time for a logrotate, which includes a
graceful to apache... I'd guess that there was something making apache very
very unhappy - but it's hard to tell from that!
Secondly, I really ought to have some mechanism to check on Apache (and MySQL) periodically and make sure they're running, either restarting them or alerting me (although I can be difficult to get hold of). What do people recommend for this? Is it as simple as having cron run a script to start the daemon if it's not already on the stored PID, or are there issues I should be wary of?
I'd test the services as apposed to the PIDs that you think they should
have... throw a test page up on apache with some known content and then test
against that every once in a while... I'm also not one for
automatically restarting services - it masks potential issues which then may go unnoticed for a > long time. We use nagios + clickatell (sms
service) to monitor our systems - with a period between about 11.30pm and
6.30am where smses aren't sent (thank god!).
I've battled with exactly this on a virtual server I look after. There are numerous bug reports, fixes and workarounds out there for the taking. The only certain thing is that YMMV.
My fix was probably related to the fact that apache is running in a virtual server, but here it is FWIW:
/tmp was defined as a tmpfs in /etc/fstab. After commenting this out (bearing in mind the performance impact) and rebooting, I've not had any problems (yet?).
Here's where I started my journey: http://www.bytemark.co.uk/page/Live/support/tech/inside/apachereload
HTH
Safe
Safe Hammad wrote:
I've battled with exactly this on a virtual server I look after. There are numerous bug reports, fixes and workarounds out there for the taking. The only certain thing is that YMMV.
We've also had this on an Ubuntu server.
Our issue appears to have been related to the time taken to restart Apache. For us the fix was simple: in /etc/logrotate.d/apache2 change /etc/init.d/apache2 restart > /dev/null to /etc/init.d/apache2 reload > /dev/null
I'll regret saying this, but our server which was dying on about 1 in every 2 log rotates hasn't died since we made this change. I hope saying that doesn't jinx us!
You'll find a bit more info here: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/111709 including some comments from me where I was having then fixing this problem. See also: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=400455
"Matthew" matthew@somewhatunlikely.com wrote: [...]
Secondly, I really ought to have some mechanism to check on Apache (and MySQL) periodically and make sure they're running, either restarting them or alerting me (although I can be difficult to get hold of). What do people recommend for this? Is it as simple as having cron run a script to start the daemon if it's not already on the stored PID, or are there issues I should be wary of?
You can do it that way, but there are many pitfalls to beware and such auto-starters can make mild problems much much worse in some situations. Nevertheless, I think I often use monit for such tasks.
Hope that helps,
On Wed, 2007-10-24 at 13:04 +0100, Matthew wrote:
Secondly, I really ought to have some mechanism to check on Apache (and MySQL) periodically and make sure they're running, either restarting them or alerting me (although I can be difficult to get hold of). What do people recommend for this? Is it as simple as having cron run a script to start the daemon if it's not already on the stored PID, or are there issues I should be wary of?
Seconding what Brett said, back in circa 2000-2001 when I was responsible for a subscription based online game server and some e-commerce running at a co-lo I had a script running from another location that connected to various ports to gather known content or responses and if it didn't like what it saw it emailed a one liner (mysql on host x down) to a sms gateway we ran.
Of course I thought this was really clever until the first time the damn thing went of at 3AM
The advantage being of course there are several reasons for the service to be down but the daemon is still running. Connectivity or routing issues included. Also if you monitor this from the same box then what happens if the whole box goes down ?