Hi Guys
I am having a lot of trouble with one of my VPSs - a new one I bought for threepence halfpenny from ThrustVPS.
I won't bore you with all the details of the problems I am having (I think the systems are HUGELY oversold) but I have one oddity which I am not sure is the ISPs fault or mine.
The server is a XEN VM running Ubuntu 10.04 (server edition). I am seeing ridiculously high loads (regularly 2.5 upwards) when the server is actually supposed to be doing nothing.
The only process I can see which may be responsible is a kernel process called events/1 which is chewing silly amounts of cpu - see top display below.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16 root 20 0 0 0 0 R 85 0.0 569:39.41 events/1 3196 root 20 0 19224 1420 1072 R 0 0.3 0:00.16 top 1 root 20 0 23688 1828 1244 S 0 0.4 0:00.90 init 2 root 20 0 0 0 0 S 0 0.0 0:00.01 kthreadd 3 root RT 0 0 0 0 S 0 0.0 0:00.06 migration/0 4 root 20 0 0 0 0 S 0 0.0 76:43.23 ksoftirqd/0 5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
I confess I have /no clue/ what this means and I can't find anything useful through searches.
Has anyone any idea where I should start looking?
(uname -a Linux beam 2.6.32-28-server #55-Ubuntu SMP Mon Jan 10 23:57:16 UTC 2011 x86_64 GNU/Linux)
Mick
---------------------------------------------------------------------
The text file for RFC 854 contains exactly 854 lines. Do you think there is any cosmic significance in this?
Douglas E Comer - Internetworking with TCP/IP Volume 1
http://www.ietf.org/rfc/rfc854.txt ---------------------------------------------------------------------
mick wrote:
I am having a lot of trouble with one of my VPSs - a new one I bought for threepence halfpenny from ThrustVPS.
It's a shame, but if you pay peanuts, you usually get monkeys. ThrustVPS appear to be externalising their support costs onto users and so onto LUGs?
[...]
The only process I can see which may be responsible is a kernel process called events/1 which is chewing silly amounts of cpu - see top display below.
I have a vague memory of a server at a client site that was going something similar because of a race between something like hald and the kernel, but I can't lay my hand on the right logbook to say how it was cured or if it was a software or hardware fault. I'd be slightly surprised if a hardware fault affected a VPS, though.
If I were you, I'd be checking all logs, trying to turn up kernel logging verbosity and maybe reading the fine source to see what appears in ps as events/1.
Good luck and please let ALUG know the answer to your riddle when you find it!
On Thu, 3 Mar 2011 15:48:03 +0000 (GMT) MJ Ray mjr@phonecoop.coop allegedly wrote:
mick wrote:
I am having a lot of trouble with one of my VPSs - a new one I bought for threepence halfpenny from ThrustVPS.
It's a shame, but if you pay peanuts, you usually get monkeys. ThrustVPS appear to be externalising their support costs onto users and so onto LUGs?
I agree. I run three other VPSs,
- Bytemark - fantastic service, but its not needed because I never have a problem. I'd use them for my tor nodes, I can't get the bandwidth I want at a price I can afford simply to run tor. - daily - equally good, but 750 GB pcm for my tor server - thrust - crap OpenVZ, but cheap and 1000 GB pcm for second tor node
The new thrust VPS was supposed to be on Xen and I could (supposedly) load any of a huge range of distros. I wanted debian 6, but that wouldn't load, nor would debian 5. I ended up with ubuntu as hobson's choice. Nothing but trouble since. Massive packet loss. Hugely unresponsive. I'm only persevering because they won't give money back.
Stay away from them.
[...]
The only process I can see which may be responsible is a kernel process called events/1 which is chewing silly amounts of cpu - see top display below.
I have a vague memory of a server at a client site that was going something similar because of a race between something like hald and the kernel, but I can't lay my hand on the right logbook to say how it was cured or if it was a software or hardware fault. I'd be slightly surprised if a hardware fault affected a VPS, though.
If I were you, I'd be checking all logs, trying to turn up kernel logging verbosity and maybe reading the fine source to see what appears in ps as events/1.
Good luck and please let ALUG know the answer to your riddle when you find it!
Interestingly I notice that none of my logs have been updated since about two and half hours after the time I last rebooted in (yet another) attempt to get a clean load and complained to "support". Some further furtling (during which time I found that root couldn't write and files) showed me that whole system was loaded readonly. Now that may cause a problem.
A reboot cured the RO problem, but unfortunately not the load problem. But at least I now have some up to date logs.....
Mick
---------------------------------------------------------------------
The text file for RFC 854 contains exactly 854 lines. Do you think there is any cosmic significance in this?
Douglas E Comer - Internetworking with TCP/IP Volume 1
http://www.ietf.org/rfc/rfc854.txt ---------------------------------------------------------------------
On Thu, 3 Mar 2011 15:48:03 +0000 (GMT) MJ Ray mjr@phonecoop.coop allegedly wrote:
mick wrote:
[...]
The only process I can see which may be responsible is a kernel process called events/1 which is chewing silly amounts of cpu - see top display below.
<snip>
If I were you, I'd be checking all logs, trying to turn up kernel logging verbosity and maybe reading the fine source to see what appears in ps as events/1.
Good luck and please let ALUG know the answer to your riddle when you find it!
OK, some more info. But I'm still stuck.
The problem seems to be caused by logging my iptables drops. My iptables file contains the following two lines (which I use on all my VPSs without problem)
# now log and (policy) drop start of all other incoming TCP packets -A INPUT -p tcp -m state --state NEW -j LOG --log-level emerg --log-prefix "firewall "
# and log (policy) drop of all UDP packets -A INPUT -p udp -m state --state NEW -j LOG --log-level emerg --log-prefix "firewall " #
and my /etc/rsyslog.d/50-default.conf (the equivalent of the old syslog.conf file) contains the following:
-------- # First some standard log files. Log by facility. # auth,authpriv.* /var/log/auth.log # *.*;auth,authpriv.none -/var/log/syslog #cron.* /var/log/cron.log daemon.* -/var/log/daemon.log kern.!=emerg -/var/log/kern.log lpr.* -/var/log/lpr.log mail.* -/var/log/mail.log user.* -/var/log/user.log ----------
(note the exception for emerg)
and ---------- # # Emergencies are sent to everybody logged in. # # *.emerg * ----------- (commenting this out should stop emerg messages going to ttys and the console)
and
----------- # log iptables connections to separate file # kern.=emerg -/var/log/firewall # # end ------------
(so my iptables logs should, and do, go to /var/log/firewall)
But, the (very noisy) logging also goes to the console, when it shouldn't.
I'm not sure why this should cause so much difficulty, but it is clearly the cause of the high cpu usage by the kenel "events" process because if I stop logging the drops the problem goes away. And I don't understand why the logging is being echoed to the console when this doesn't happen on any of my other VPSs (this config is common to all).
I'm still investigating, but if anyone has any bright ideas I'd be grateful to hear 'em.
Mick
---------------------------------------------------------------------
The text file for RFC 854 contains exactly 854 lines. Do you think there is any cosmic significance in this?
Douglas E Comer - Internetworking with TCP/IP Volume 1
http://www.ietf.org/rfc/rfc854.txt ---------------------------------------------------------------------