Dear All,
My Acer Aspire One, with the kernels known in Gentoo as 2.6.35-gentoo-r4 or 2.6.35-gentoo-r12, hangs on resume from suspend to disc - after counting up to 100% and printing the message "Suspending consoles (use no_console_suspend" to debug".
Suspend/resume works fine with kernel 2.6.34-gentoo-r6.
The 2.6.35 kernel configs were generated from the 2.6.34 kernel config using "make oldconfig".
Firstly: does anyone immediately know what the problem is, please?
If not: how do I persuade suspend/resume to produce an informative log file to help with diagnosis, given that the whole process happens while klogd and syslogd are stopped?
On Mon, 6 Dec 2010, Dan wrote:
My Acer Aspire One, with the kernels known in Gentoo as 2.6.35-gentoo-r4 or 2.6.35-gentoo-r12, hangs on resume from suspend to disc - after counting up to 100% and printing the message "Suspending consoles (use no_console_suspend" to debug)".
Suspend/resume works fine with kernel 2.6.34-gentoo-r6.
The 2.6.35 kernel configs were generated from the 2.6.34 kernel config using "make oldconfig".
Dear All,
We can add 2.6.36-gentoo-r5 to the list of affected kernels. What's more, the problem has become more urgent, because the combination of a recent xorg upgrade and kernel bug #13811 https://bugzilla.kernel.org/show_bug.cgi?id=13811 is forcing me to upgrade from the unaffected 2.6.34 kernel.
On the plus side, I've made some progress with diagnosis and mitigation. There are at least three distinct causes of the problem, two of which are the known kernel bugs
https://bugzilla.kernel.org/show_bug.cgi?id=20132 https://bugzilla.kernel.org/show_bug.cgi?id=21952
I know these two are implicated because, in each case, I observe that a workaround announced in the bugzilla thread reduces the failure probability on resuming from suspend to disc. Similarly, I'm fairly sure that this kernel bug:
https://bugzilla.kernel.org/show_bug.cgi?id=24032
is not implicated, because its published workaround does not further reduce the failure probability (although I've kept that workaround in place just in case).
Altogether, I've now got the failure probability down from 100% to about 25%. I'd welcome any help diagnosing and fixing the remaining failures, please. The following diagnostic information might come in handy:
On one occasion, I got a detailed kernel oops message out of the failure. The oops message begins
BUG: unable to handle kernel paging request at fffffecc IP: [<c105e0fb>] thaw_tasks+0x56/0x90 *pde 01677067 *pte 00000000 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/sound/timer/uevent Modules liked in: snd_pcm_oss snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device fbcon tileblit font bitblit softcursor i915 ath5k drm_kms_helper cfbcopyarea cfbimgblt cfbfillrect [last unloaded microcode]
On several other occasions, the post-failure behaviour has consisted of a _lot_ of evbug messages being printed to all VTs, including the one where X was supposed to be running, apparently reporting keypresses and mouse movements. On one of these occasions, I noticed that the cursor in the VT where X was supposed to be running was indeed an X cursor, and that it changed its appearance with position in such a way as to suggest that the system still knew where on the screen various X windows were supposed to be. On this one occasion (although not on all of the "evbug" occasions), I was still able to type commands on a non-X VT, although "shutdown -r now" hung before it could unmount the hard discs.
On a couple of occasions, I've managed to capture PM_TRACE output. It looked like this
1988-11-09T04:13:36.042861+00:00 hydrographer kernel: Magic number: 0:86:306 1988-11-09T04:13:36.042871+00:00 hydrographer kernel: hash matches drivers/base/power/main.c:461 1988-11-09T04:13:36.042882+00:00 hydrographer kernel: tty tty35: hash matches
or like this
1984-08-07T19:24:41.421381+01:00 hydrographer kernel: Magic number: 0:86:329 1984-08-07T19:24:41.421391+01:00 hydrographer kernel: hash matches drivers/base/power/main.c:461 1984-08-07T19:24:41.421401+01:00 hydrographer kernel: tty tty50: hash matches
On Tue, 18 Jan 2011, Dan wrote:
On Mon, 6 Dec 2010, Dan wrote:
My Acer Aspire One, with the kernels known in Gentoo as 2.6.35-gentoo-r4 or 2.6.35-gentoo-r12, hangs on resume from suspend to disc - after counting up to 100% and printing the message "Suspending consoles (use no_console_suspend" to debug)".
We can add 2.6.36-gentoo-r5 to the list of affected kernels. What's more, the problem has become more urgent, because the combination of a recent xorg upgrade and kernel bug #13811 https://bugzilla.kernel.org/show_bug.cgi?id=13811 is forcing me to upgrade from the unaffected 2.6.34 kernel.
I've investigated a little more.
The good news is that I can get reliable resuming by using the "nomodeset" boot parameter to disable kernel mode setting (so perhaps kernel bug #13811 is not as "fixed" as we'd like to think?)
The bad news is that this is not a useful workaround, because the version of the xf86 Intel video driver that has recently been marked stable in Gentoo refuses to work without kernel mode setting. And I can't just downgrade the video driver, because it's so thoroughly integrated with the rest of Xorg. Any ideas, please?