Ever since I built my latest box I've had *very* occasional total
crashes. By "very occasional" I mean once a week or even less, they
only ever occurred when I was actively using the system, it runs all
the time as a disk server for our home LAN and never died in the
middle of the night or while we were away.
Just recently (like yesterday and the day before) it crashed more
frequently, like twice or three times, and I decided to try and work
out what the problem was. A quick Google showed that I was getting
kernel panics as when the system froze the Caps Lock and Scroll Lock
lights flashed which apparently indicates a kernel panic. There was
absolutely no clue in the logs anywhere, not a murmur of anything
wrong before the burst of messages for the reboot.
A while ago (when I first built the system) it seemed as if it might
be an Intel video driver issue but there have been a few updates since
then and I decided to see if there might be a hardware problem.
First I looked at the temperatures, nothing untoward there, CPU was 32
degrees, motherboard was 20 degrees.
Then I tried running the memtest you can get to from the Grub menu,
aha! Test 5 produced some errors (it's the block copy test). I tried
moving the DIMMs around to see if that helped, it just moved the
errors around but didn't fix them so it seemed there was a real error.
Then I looked at the Asus web site (and my motherboard manual) to see
if there were any clues there. The latest BIOS upgrade says "1.
Improve the compatibility with some memory.", hmm, I wonder if that
means me?
So a BIOS upgrade seemed a good idea, first looks suggested that it
might be a bit difficult because I don't have a floppy disk drive in
the system. However I was pleasantly surprised to find that the BIOS
upgrade utility built into the system BIOS which you get to by hitting
ALT/F2 at boot can read the BIOS file from CD and USB as well as from
a floppy (this wasn't very well documented anywhere) so I needed
neither a floppy disk nor an MS operating system. I just wrote the
updated BIOS file to a CD and the system did the rest.
.... and now I appear to have error free memory according to memtest! :-)
It remains to be seen whether it *was* the memory problem causing my
kernel panics but it does seem likely.
So, thank you Asus for finding the bug and fixing it and for providing
utilities etc. that will work on Linux. They also have Linux drivers
on their web site (not that I needed them, ubuntu detected everything
without problems).
--
Chris Green