On 28/05/13 15:16, Mark Rogers wrote:
On 28 May 2013 14:40, steve-ALUG@hst.me.uk wrote:
Could you be a bit more specific as to what the errors are?
At a cursory glance, as per http://paste.ubuntu.com/5709668/, although with "ata4" replaced by "ata3"
OK
Could be a controller, or possibly both disks have failed somehow. Could it be software config somehow? I guess it could perhaps be PSU, if it wasn't supplying correct power to the drives, but I'd think that was unlikely.
I've just put one of the disks into my USB caddy again and have successfully mounted the raid partition (as read-only). Trying to copy files off I'm getting errors: May 28 15:03:22 localhost kernel: [1123077.252040] sd 12:0:0:0: [sdc] Unhandled sense code May 28 15:03:22 localhost kernel: [1123077.252052] sd 12:0:0:0: [sdc] May 28 15:03:22 localhost kernel: [1123077.252058] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE May 28 15:03:22 localhost kernel: [1123077.252064] sd 12:0:0:0: [sdc] May 28 15:03:22 localhost kernel: [1123077.252068] Sense Key : Medium Error [current] May 28 15:03:22 localhost kernel: [1123077.252076] sd 12:0:0:0: [sdc] May 28 15:03:22 localhost kernel: [1123077.252081] Add. Sense: Unrecovered read error May 28 15:03:22 localhost kernel: [1123077.252087] sd 12:0:0:0: [sdc] CDB: May 28 15:03:22 localhost kernel: [1123077.252090] Read(10): 28 00 03 2b dc 00 00 00 f0 00 May 28 15:03:22 localhost kernel: [1123077.252108] end_request: critical target error, dev sdc, sector 53206016
That seems to suggest that a hardware problem in the host server isn't the issue.
Googling some of those errors took me to this: http://www.linuxquestions.org/questions/linux-general-1/problem-mounting-che...
The error there was corrupted superblock, and various fscks didn't fix it, but http://www.cgsecurity.org/wiki/TestDisk
did.
I guess a corrupted superblock would make sense - both disks would look wrong, and it could have been caused by a power loss. Worth a look??
Both disks failing in similar ways at around the same time seems unlikely too, unless there was a power surge or something (this box doesn't go through a UPS).
1st things first - can you successfully copy any/all info off sda? If so, that solves your data preservation issue.
Working on it. Copy: yes; successfully: not sure.
Good luck!
How are you adding the new drive to the array and triggering a rebuild?
mdadm --manage /dev/md0 --add /dev/sda1 (or sdb1, depending which drive I have swapped out).
Seems fail enough, but you may need to remove sda1 or sdb1 (depending) and then using --assemble to force the new drive to be active and not the spare.
Good luck! Steve