On 08/10/13 16:52, Mark Rogers wrote:
I have 4x2TB disks configured for RAID5.
Initially they were in a USB3 external caddy but this never worked correctly - the raid kept dropping offline before it completed building the array.
I then switched to eSATA (same caddy) and that improved things but I still failed to build the array completely. So they're now in a new HP microserver.
Until now I assumed the issues were connectivity but since I still have the same problem there must be a disk issue of some kind. However SMART is reporting healthy even after longer self tests.
Do I just right this off as a duff disk or can I investigate this further?
syslog reports thus: Oct 4 01:30:09 backup kernel: [49309.671201] sd 4:0:0:0: [sdd] Unhandled error code Oct 4 01:30:09 backup kernel: [49309.671217] sd 4:0:0:0: [sdd] Oct 4 01:30:09 backup kernel: [49309.671222] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT Oct 4 01:30:09 backup kernel: [49309.671229] sd 4:0:0:0: [sdd] CDB: Oct 4 01:30:09 backup kernel: [49309.671233] Read(10): 28 00 d9 2e b1 f0 00 04 00 00 Oct 4 01:30:09 backup kernel: [49309.671253] end_request: I/O error, dev sdd, sector 3643716080 Oct 4 01:30:09 backup kernel: [49309.671264] md/raid:md0: read error not correctable (sector 3643714032 on sdd1). Oct 4 01:30:09 backup kernel: [49309.671274] md/raid:md0: Disk failure on sdd1, disabling device.
{SNIP}
http://www.ultimatebootcd.com/ http://www.sysresccd.org/SystemRescueCd_Homepage
I'd be tempted to download one of these rescue CDs - Ultimate Bood CD I think has loads of HDD diagnostic tests on it, including I think some manufacturer specific utilities that you can run to test and low-level test & format drives.
I'd be tempted to boot to the CD with only one drive connected, and then run manufacturer diags on the disk, do destructive testing and/or low level format if available. Ensure all disk is written to. Will take a while. Once tested, if it passes, reformat and restore data.
Then try the other disk.
Destructive testing will of course destroy your data - backup first.
If you can't find any drive specific tests, you could try one of the Disk Wiping utilities, and get it to wipe the whole disk. This will write to the whole disk multiple times, and will show you if there are any errors. My favourite is Darik's Boot and Nuke (DBAN). Wipe the disk with a comprehensive wipe which will take seveal hours. If no errors, format & restore data.
Alternatively, if finances allow, dump the disks and get new ones.
HTH
Steve