[ALUG] Hang on resume from suspend to disc in 2.6.35

Dan vi5u0-alug at yahoo.co.uk
Tue Jan 18 11:33:51 GMT 2011


On Mon, 6 Dec 2010, Dan wrote:

> My Acer Aspire One, with the kernels known in Gentoo as
> 2.6.35-gentoo-r4 or 2.6.35-gentoo-r12, hangs on resume from suspend to
> disc - after counting up to 100% and printing the message "Suspending
> consoles (use no_console_suspend" to debug)".

> Suspend/resume works fine with kernel 2.6.34-gentoo-r6.

> The 2.6.35 kernel configs were generated from the 2.6.34 kernel config
> using "make oldconfig".

Dear All,

We can add 2.6.36-gentoo-r5 to the list of affected kernels.  What's
more, the problem has become more urgent, because the combination of a
recent xorg upgrade and kernel bug #13811
<https://bugzilla.kernel.org/show_bug.cgi?id=13811> is forcing me to
upgrade from the unaffected 2.6.34 kernel.

On the plus side, I've made some progress with diagnosis and
mitigation.  There are at least three distinct causes of the problem,
two of which are the known kernel bugs

<https://bugzilla.kernel.org/show_bug.cgi?id=20132>
<https://bugzilla.kernel.org/show_bug.cgi?id=21952>

I know these two are implicated because, in each case, I observe that
a workaround announced in the bugzilla thread reduces the failure
probability on resuming from suspend to disc.  Similarly, I'm fairly
sure that this kernel bug:

<https://bugzilla.kernel.org/show_bug.cgi?id=24032>

is not implicated, because its published workaround does not further
reduce the failure probability (although I've kept that workaround in
place just in case).

Altogether, I've now got the failure probability down from 100% to
about 25%.  I'd welcome any help diagnosing and fixing the remaining
failures, please.  The following diagnostic information might come in
handy:

On one occasion, I got a detailed kernel oops message out of the
failure.  The oops message begins

BUG: unable to handle kernel paging request at fffffecc
IP: [<c105e0fb>] thaw_tasks+0x56/0x90
*pde  01677067 *pte  00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/sound/timer/uevent
Modules liked in: snd_pcm_oss snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device fbcon tileblit font bitblit softcursor i915 ath5k drm_kms_helper cfbcopyarea cfbimgblt cfbfillrect [last unloaded microcode]

On several other occasions, the post-failure behaviour has consisted
of a _lot_ of evbug messages being printed to all VTs, including the
one where X was supposed to be running, apparently reporting
keypresses and mouse movements.  On one of these occasions, I noticed
that the cursor in the VT where X was supposed to be running was
indeed an X cursor, and that it changed its appearance with position
in such a way as to suggest that the system still knew where on the
screen various X windows were supposed to be.  On this one occasion
(although not on all of the "evbug" occasions), I was still able to
type commands on a non-X VT, although "shutdown -r now" hung before it
could unmount the hard discs.

On a couple of occasions, I've managed to capture PM_TRACE output.  It
looked like this

1988-11-09T04:13:36.042861+00:00 hydrographer kernel:  Magic number: 0:86:306
1988-11-09T04:13:36.042871+00:00 hydrographer kernel:  hash matches drivers/base/power/main.c:461
1988-11-09T04:13:36.042882+00:00 hydrographer kernel: tty tty35: hash matches

or like this

1984-08-07T19:24:41.421381+01:00 hydrographer kernel:  Magic number: 0:86:329
1984-08-07T19:24:41.421391+01:00 hydrographer kernel:  hash matches drivers/base/power/main.c:461
1984-08-07T19:24:41.421401+01:00 hydrographer kernel: tty tty50: hash matches

-- 

Thanks,

Regards,

Dan



More information about the main mailing list