Hi ALUG,
I've been managing a Xen hypervisor for about nine months now and
occasionally I get problems with DomUs running out of memory. The
clients are all students and most of them are running WordPress
blogs. The instances of OOM errors I've seen seem to be related to
Apache and students serving up large image files. I'm thinking that
this shouldn't really be happening; serving up a few ~1MB files
shouldn't really be causing Apache to make the OS exhaust *all*
available memory, should it? Even if it were servicing several
requests simultaneously.
Some details:
* The host is a Dell PowerEdge x86_64 system with 32 GB RAM
* The host OS is Debian 6.0
* We're running Xen 4.0.1 from Debian
* The guests all run Debian 6.0
* Each guest has 15 GB of storage, 512 MB RAM, and 1 GB of swap
* We currently have ~40 guests running
Below is some console output from a DomU that suffered this problem
earlier today. You can see that the OOM killer killed Apache. And I'm
guessing it killed sshd too as I couldn't connect to the guest. I
couldn't find any errors in Xen's logs.
Any thoughts on what might be going on here?
And come September we'll have a short window in which we could alter
out setup. Any suggestions for better ways of providing virtual
machines? Perhaps alternatives hypervisors? Or some mechanism other
than hypervisors?
Cheers,
Richard
(Working from home: http://pic.twitter.com/NetsgOS2)
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Richard Lewis
ISMS, Computing
Goldsmiths, University of London
t: +44 (0)20 7078 5134
j: ironchicken@jabber.earth.li
@: lewisrichard
s: richardjlewis
http://www.richardlewis.me.uk/
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
[ 1656.306197] rs:main Q:Reg invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 1656.306215] rs:main Q:Reg cpuset=/ mems_allowed=0
[ 1656.306222] Pid: 1135, comm: rs:main Q:Reg Not tainted 2.6.32-5-xen-amd64 #1
[ 1656.306230] Call Trace:
[ 1656.306242] [<ffffffff810b7318>] ? oom_kill_process+0x7f/0x23f
[ 1656.306251] [<ffffffff810b783c>] ? __out_of_memory+0x12a/0x141
[ 1656.306259] [<ffffffff810b7993>] ? out_of_memory+0x140/0x172
[ 1656.306268] [<ffffffff810bb742>] ? __alloc_pages_nodemask+0x4ec/0x5fe
[ 1656.306278] [<ffffffff810bcca9>] ? __do_page_cache_readahead+0x9b/0x1b4
[ 1656.306286] [<ffffffff810bcdde>] ? ra_submit+0x1c/0x20
[ 1656.306294] [<ffffffff810b5a66>] ? filemap_fault+0x17d/0x2f6
[ 1656.306302] [<ffffffff810cba22>] ? __do_fault+0x54/0x3c3
[ 1656.306313] [<ffffffff8130c7d1>] ? __wait_on_bit_lock+0x76/0x84
[ 1656.306323] [<ffffffff8100c3a5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e
[ 1656.306333] [<ffffffff810cdda8>] ? handle_mm_fault+0x3b8/0x80f
[ 1656.306342] [<ffffffff8100ecf2>] ? check_events+0x12/0x20
[ 1656.306351] [<ffffffff8130fb26>] ? do_page_fault+0x2e0/0x2fc
[ 1656.306360] [<ffffffff8130d9c5>] ? page_fault+0x25/0x30
[ 1656.306366] Mem-Info:
[ 1656.306370] Node 0 DMA per-cpu:
[ 1656.306376] CPU 0: hi: 0, btch: 1 usd: 0
[ 1656.306381] Node 0 DMA32 per-cpu:
[ 1656.306388] CPU 0: hi: 186, btch: 31 usd: 75
[ 1656.306397] active_anon:55210 inactive_anon:55287 isolated_anon:1350
[ 1656.306398] active_file:10 inactive_file:11 isolated_file:26
[ 1656.306400] unevictable:0 dirty:0 writeback:171 unstable:0
[ 1656.306401] free:1180 slab_reclaimable:922 slab_unreclaimable:2187
[ 1656.306402] mapped:16 shmem:7 pagetables:7768 bounce:0
[ 1656.306422] Node 0 DMA free:2032kB min:80kB low:100kB high:120kB active_anon:6168kB inactive_anon:6312kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14868kB mlocked:0kB dirty:0kB writeback:8kB mapped:20kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:76kB kernel_stack:16kB pagetables:292kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:3 all_unreclaimable? yes
[ 1656.306456] lowmem_reserve[]: 0 489 489 489
[ 1656.306467] Node 0 DMA32 free:2688kB min:2788kB low:3484kB high:4180kB active_anon:214672kB inactive_anon:214836kB active_file:40kB inactive_file:44kB unevictable:0kB isolated(anon):5400kB isolated(file):104kB present:500960kB mlocked:0kB dirty:0kB writeback:676kB mapped:44kB shmem:28kB slab_reclaimable:3688kB slab_unreclaimable:8672kB kernel_stack:1232kB pagetables:30780kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:328 all_unreclaimable? yes
[ 1656.306501] lowmem_reserve[]: 0 0 0 0
[ 1656.306512] Node 0 DMA: 4*4kB 1*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2040kB
[ 1656.306537] Node 0 DMA32: 232*4kB 2*8kB 1*16kB 0*32kB 1*64kB 3*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2688kB
[ 1656.306561] 8628 total pagecache pages
[ 1656.306566] 8586 pages in swap cache
[ 1656.306571] Swap cache stats: add 1078026, delete 1069440, find 120149/196407
[ 1656.306579] Free swap = 4kB
[ 1656.306583] Total swap = 1048568kB
[ 1656.308293] 131072 pages RAM
[ 1656.308302] 4044 pages reserved
[ 1656.308307] 22558 pages shared
[ 1656.308311] 123779 pages non-shared
[ 1656.308317] Out of memory: kill process 489 (apache2) score 125474 or a child
[ 1656.308324] Killed process 964 (apache2)