On Tuesday 26 October 2004 6:24 pm, Mark Rogers wrote:
Yes, I've removed the new RAM now (and I gave all the cables a good shove while I was in there).
I get exactly the same problem.
Well it's looking like the good 'ole lightbulb thing.
Lightbulbs fail most frequently just at the point that you turn them on. Thermal cycles present a greater strain on hardware than continuous running sometimes.
It may well be that hdb is failing and the power off/fit ram/power on cycle was the last straw. I'd get the smartools (smartctl) on there read the man file and tell the drive to do a offline test (confusingly enough you can do an offline test when the drive is mounted, it just slows access down a lot)...it will give you an estimate of the test run time, check back when the test is finished. This all of course assumes that your drives are smart capable (most are) it does not have to be enabled in the bios, that just turns on a basic smart test during post
But before you do any of that, and before you unpower/repower that server any more I would carefully check that you have the contents of /home backed up safely somewhere.
Useful smart commands are as follows
smartctl --test=offline /dev/hdb runs a full test of hdb smartctl --test=conveyance /dev/hdb runs a special (IDE only) test that checks for transit damage smartctl -l selftest /dev/hdb Reports back the results of the tests (some of them can take over an hour to complete) smartctl -H /dev/hdb Reports a basic health status (but not all drives test themselves at boot unless the BIOS asks them to) smartctl -a /dev/hdb Reports back all smart information