Hard disk error

Richard Lewis

26 Oct 2004 26 Oct '04

3:26 p.m.

Hello ALUG, I've just added a GB of RAM to my server. The BIOS found it alright but something strange happened when I booted up (its Debian sarge): it took about 2 or 3 minutes to start the syslogd (no disk activity or anything). Then once it had booted things got even worse! $ free -m indicates that the new RAM is working but it seems that hdb2 (which is my /var mount point) will not mount! In my /etc/fstab I have /dev/hda2 none swap sw 0 0 /dev/hda1 / ext3 defaults,errors=remount-re 0 1 /deb/hdb1 /home ext3 defaults 0 2 /deb/hdb2 /var ext3 defaults 0 2 and when I issue # mount /dev/hdb2 I get: hdb: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdb: dma_intr: error=0x40 { UncorrectableError }, LBAsect=39067264, sector=39067264 end_request: I/O error, dev hdb, sector 39067264 JBD: IO error reading journal superblock EXT3-fs: error loading journal. mount: wring fs type, bad option, bad superblock on /dev/hdb2, or too many mounted file systems and I can't seem to be able to do anything about because the machine hasn't connecting to the network (maybe it needs something from /var...) so I can't install hdparm. (It can't do anything which requires stuff from /var like run its web server, database server, apt, dhcp-client, ...) Any ideas whats wrong with it? Cheers, Richard -- Richard Lewis richardlewis@fastmail.co.uk

Show replies by date

Tim Green

26 Oct 26 Oct

3:42 p.m.

On Tue, 26 Oct 2004 06:25:47 -0700, Richard Lewis <richardlewis@fastmail.co.uk> wrote:

...

I've just added a GB of RAM to my server. hdb: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdb: dma_intr: error=0x40 { UncorrectableError }, LBAsect=39067264, sector=39067264 end_request: I/O error, dev hdb, sector 39067264 JBD: IO error reading journal superblock EXT3-fs: error loading journal. mount: wring fs type, bad option, bad superblock on /dev/hdb2, or too many mounted file systems

Any ideas whats wrong with it?

Daft question : have you tried taking the RAM out again?

Richard Lewis

3:52 p.m.

On Tue, 26 Oct 2004 14:41:04 +0100, "Tim Green" <timothy.j.green@gmail.com> said:

...

On Tue, 26 Oct 2004 06:25:47 -0700, Richard Lewis <richardlewis@fastmail.co.uk> wrote:

...
I've just added a GB of RAM to my server. hdb: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdb: dma_intr: error=0x40 { UncorrectableError }, LBAsect=39067264, sector=39067264 end_request: I/O error, dev hdb, sector 39067264 JBD: IO error reading journal superblock EXT3-fs: error loading journal. mount: wring fs type, bad option, bad superblock on /dev/hdb2, or too many mounted file systems

Any ideas whats wrong with it?

Daft question : have you tried taking the RAM out again?

Yes, its sort of somewhere near the bottom of my list of things to try. I've just copied hdparm_5.7-1_i386.deb to the server (using a quaint little invention called a 'floppy disk' ;-) but I haven't been able to install it: 'dkpg: unable to access dpkg status area: No such file or directory'; I might have to try copying all the package's files by hand...... Richard -- Richard Lewis richardlewis@fastmail.co.uk

Daniel Silverstone

3:54 p.m.

On Tue, 2004-10-26 at 06:51 -0700, Richard Lewis wrote:

...

...
Daft question : have you tried taking the RAM out again? Yes, its sort of somewhere near the bottom of my list of things to try.

So... 1. pc works 2. insert ram 3. pc no longer works 4. <do anything but try removing ram> Seems daft to me. D. -- Daniel Silverstone http://www.digital-scurf.org/ PGP mail accepted and encouraged. Key Id: 2BC8 4016 2068 7895

Matt Parker

4:19 p.m.

On 26/10/2004, "Daniel Silverstone" <dsilvers@digital-scurf.org> wrote:

...

On Tue, 2004-10-26 at 06:51 -0700, Richard Lewis wrote:

...
...
Daft question : have you tried taking the RAM out again? Yes, its sort of somewhere near the bottom of my list of things to try.

So...

1. pc works 2. insert ram 3. pc no longer works 4. <do anything but try removing ram>

5. Profit!!??? (OK, I spend too much time on Slashdot...) Matt

Richard Lewis

4:55 p.m.

On Tue, 26 Oct 2004 14:53:43 +0100, "Daniel Silverstone" <dsilvers@digital-scurf.org> said:

...

On Tue, 2004-10-26 at 06:51 -0700, Richard Lewis wrote:

...
...
Daft question : have you tried taking the RAM out again? Yes, its sort of somewhere near the bottom of my list of things to try.

So...

1. pc works 2. insert ram 3. pc no longer works 4. <do anything but try removing ram>

Seems daft to me.

Yes, you're all right of course. I'll do it now.... (its just that the problem is with one partition of a drive, it seems completely unrelated, but logic will out I suppose.... ;-) Richard -- Richard Lewis richardlewis@fastmail.co.uk

Tim Green

5:05 p.m.

On Tue, 26 Oct 2004 07:54:01 -0700, Richard Lewis <richardlewis@fastmail.co.uk> wrote:

...

...
...
...
Daft question : have you tried taking the RAM out again? I'll do it now.... (its just that the problem is with one partition of a drive, it seems completely unrelated, but logic will out I suppose.... ;-)

If you are unlucky, the drive went bad when you powered the computer down and back up again. Do you have any drive testing software not on that drive? Good luck! Tim.

Richard Lewis

5:36 p.m.

On Tue, 26 Oct 2004 16:04:40 +0100, "Tim Green" <timothy.j.green@gmail.com> said:

...

If you are unlucky, the drive went bad when you powered the computer down and back up again. Do you have any drive testing software not on that drive?

Well, fortunately all my applications are installed in /usr (on /dev/hda1) in the normal manner. I was able to do a manual install of hdparm which reported that /deb/hdb is using UDMA5 mode (whatever that means?!). On Tue, 26 Oct 2004 16:09:20 +0100, "Mark Rogers" <mark@quarella.co.uk> said:

...

Richard Lewis typed:

...
Yes, you're all right of course. I'll do it now.... (its just that the problem is with one partition of a drive, it seems completely unrelated, but logic will out I suppose.... ;-)

It's entirely possible that the memory is a red herring. Maybe the IDE cable got dislodged when you were inserting the memory module, for example. Ruling that out is a useful diagnostic step.

Yes, I've removed the new RAM now (and I gave all the cables a good shove while I was in there). I get exactly the same problem. I didn't think it was likely to be the cables because its only hdb2 which is affected (hdb1 is my /home mount point and thats working fine).

...

Another thought: Swap size is usually related to physical memory size.

Good thought, but my swap partition is on hda which (like hdb1) is completely unaffected. I'll keep fiddling... Cheers, Richard -- Richard Lewis richardlewis@fastmail.co.uk

Mark Rogers

7:21 p.m.

Richard Lewis typed:

...

Yes, I've removed the new RAM now (and I gave all the cables a good shove while I was in there).

I get exactly the same problem.

Try reseting to BIOS defaults (first try "fail safe", then "optimised" if the options are there). Adding new memory may have caused the BIOS to alter the memory timings or something like that. I'd strongly recommend running memtest86 (www.memtest86.com, easiest option is to download the (tiny) ISO image and boot from it to test). Run both without the new memory and with it, just to be sure the memory is OK, and make sure you run the tests at your desired memory clock speed (ie don't run the tests with BIOS "fail safe" defaults and assume they're relevant when using "optimised" settings). It's odd that the problem remains with the new memory removed, but it's conceivable that a fault in the new memory has lead to something incorrect being written to disk which is still there after the memory is removed. Strange behaviour like this can also be caused by an underpowered PSU (eg I've had files corrupted when transfered across a network which were fixed by upgrading the PSU; no other ill effects were noticed). Seems odd that the problem has just started, though. -- Mark Rogers, More Solutions Ltd :: Tel: 0845 45 89 555

Wayne Stallwood

11:14 p.m.

On Tuesday 26 October 2004 6:24 pm, Mark Rogers wrote:

...

...
Yes, I've removed the new RAM now (and I gave all the cables a good shove while I was in there).

I get exactly the same problem.

Well it's looking like the good 'ole lightbulb thing. Lightbulbs fail most frequently just at the point that you turn them on. Thermal cycles present a greater strain on hardware than continuous running sometimes. It may well be that hdb is failing and the power off/fit ram/power on cycle was the last straw. I'd get the smartools (smartctl) on there read the man file and tell the drive to do a offline test (confusingly enough you can do an offline test when the drive is mounted, it just slows access down a lot)...it will give you an estimate of the test run time, check back when the test is finished. This all of course assumes that your drives are smart capable (most are) it does not have to be enabled in the bios, that just turns on a basic smart test during post But before you do any of that, and before you unpower/repower that server any more I would carefully check that you have the contents of /home backed up safely somewhere. Useful smart commands are as follows smartctl --test=offline /dev/hdb runs a full test of hdb smartctl --test=conveyance /dev/hdb runs a special (IDE only) test that checks for transit damage smartctl -l selftest /dev/hdb Reports back the results of the tests (some of them can take over an hour to complete) smartctl -H /dev/hdb Reports a basic health status (but not all drives test themselves at boot unless the BIOS asks them to) smartctl -a /dev/hdb Reports back all smart information

Mark Rogers

5:06 p.m.

Richard Lewis typed:

...

Yes, you're all right of course. I'll do it now.... (its just that the problem is with one partition of a drive, it seems completely unrelated, but logic will out I suppose.... ;-)

It's entirely possible that the memory is a red herring. Maybe the IDE cable got dislodged when you were inserting the memory module, for example. Ruling that out is a useful diagnostic step. Another thought: Swap size is usually related to physical memory size. I can't see how that would be dynamic and cause problems when the memory is increased, but I throw it into the pot to see if anyone with more brain than I can make something of it. Certainly it is one connection between memory and disk. -- Mark Rogers, More Solutions Ltd :: Tel: 0845 45 89 555

Ted.Harding＠nessie.mcc.ac.uk

5:54 p.m.

On Tue, 26 Oct 2004 06:25:47 -0700, Richard Lewis <richardlewis@fastmail.co.uk> wrote:

...

I've just added a GB of RAM to my server. hdb: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdb: dma_intr: error=0x40 { UncorrectableError }, LBAsect=39067264, sector=39067264 end_request: I/O error, dev hdb, sector 39067264 JBD: IO error reading journal superblock EXT3-fs: error loading journal. mount: wring fs type, bad option, bad superblock on /dev/hdb2, or too many mounted file systems

Any ideas whats wrong with it?

I've seen very similar messages when a HDD had bad blocks on it. Not that it's necessarily the case with your drive. However, it does suggest a hardware problem on the disk front. Mark Rogers' suggestion that the cable may have got nudged during the RAM installation is a plausible possibility. Another might be that there's a problem with one of the new RAM chips. If there's a data error on one of the "bits" then signals between hardware can get corrupted. I've also known this to lead to bad data getting written to disk when buffers are flushed (the most amusing case in my experience was when I noticed that filenames from 'ls' were coming out with some false characters; a bit of detective worked showed that these corresponded to one bit in the ASCII code being stick on "0" -- e.g. "c" = ASCII 99 = 01100011 -> 01100001 = ASCII 97 = "a" So took out the disk drive, looked at circuit board with magnifying glass, detected the smallest piece of fluff I've ever seen across two pins, went "puff" at at, and cured the problem; has to rename a few files after that ... ). I think the suggestion to re-instate your previous setup and check everything out is very wise. Make sure all cables are well seated. Do this first, then if you still have problems report the details back to us. Good luck, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 26-Oct-04 Time: 16:46:25 ------------------------------ XFMail ------------------------------

7850

Age (days ago)

7850

Last active (days ago)

List overview

Download

11 comments

7 participants

participants (7)

Daniel Silverstone
Mark Rogers
Matt Parker
Richard Lewis
Ted.Harding＠nessie.mcc.ac.uk
Tim Green
Wayne Stallwood

Hard disk error

Richard Lewis

Tim Green

Richard Lewis

Daniel Silverstone

Matt Parker

Richard Lewis

Tim Green

Richard Lewis

Mark Rogers

Wayne Stallwood

Mark Rogers

Ted.Harding＠nessie.mcc.ac.uk

tags

participants (7)