Re: [ALUG] Xen and OOM killer

30 Jul 2012


      At Sun, 29 Jul 2012 11:04:26 +0100,
Jenny Hopkins wrote:
...
On 11 July 2012 23:08, Adam Bower adam@thebowery.co.uk wrote:
...
On Wed, Jul 11, 2012 at 10:11:03PM +0100, Richard Lewis wrote:
...
I'm not (so far) seeing any other processes using any significant
amount of RAM, apart from MySQL. But that seems fairly static at 1.7%.
That suggests it may be a single thing that happens that causes
something to eat memory all of a sudden. I'm afraid you'll just have
to keep waiting in this case :)
Part of the point of this exercise is simply to see if memory usage
stays constant over time or if it suddenly gets used exponentially or
linearly which might help you after it has gone wrong again.
I had an OOM killer problem on one of my vms hosted at bytemark for
weeks before I managed to trace the problem:  bots trawling the trac
directory of an apache site.  Banning them with robots.txt fixed it.
I had one script running every five minutes that checked for memory
usage, and if it was above a certain amount then to send all sorts of
memory usage/processes data to an output file. From this I could see
it was always apache (even though, as Adam says, oomkiller was
randomly killing anything it could to reclaim memory), and from there
start to monitor the apache connections until I found it always stuck
on listening to googlebots.  Memory usage would balloon from a few
hundred MB to > 800 within minutes, when oomkiller kicked in, making
it very hard to pinpoint the problem.
I've still got scriptage if that helps.
Thanks for sharing your experiences. I'll consider your reply evidence
of interest in the thread and so provide a brief update.
The VM in question did eventually go on to misbehave in exactly the
same way as before. I restarted it and checked my log file which
reported complete memory saturation (RAM and swap) by lots and lots of
Apache processes. As a result, I had a look at the Apache
configuration and decided to do some performance tuning, especially of
the Keep-Alive settings. All the settings were at their default
values. The Keep-Alive timeout is possibly the most significant; I
changed that from 15s to 3s which will hopefully get Apache processes
out of the way quicker in future.
This particular VM has been running problem-free for around two weeks
now. I suppose the take home message is, don't try and blame your
virtualisation hypervisor before you've actually tuned your pre-fork
Web server sensibly. I've effectively re-discovered something which
has actually been common knowledge since about September 1993.
Richard
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Richard Lewis
ISMS, Computing
Goldsmiths, University of London
t: +44 (0)20 7078 5134
j: ironchicken@jabber.earth.li
@: lewisrichard
s: richardjlewis
http://www.richardlewis.me.uk/
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: [ALUG] Xen and OOM killer