Re: [ALUG] Software RAID error

28 May 2013


      On 28/05/13 15:16, Mark Rogers wrote:
...
On 28 May 2013 14:40,  steve-ALUG@hst.me.uk wrote:
...
Could you be a bit more specific as to what the errors are?
At a cursory glance, as per http://paste.ubuntu.com/5709668/, although
with "ata4" replaced by "ata3"
OK
...
...
Could be a controller, or possibly both disks have failed somehow. Could it
be software config somehow?  I guess it could perhaps be PSU, if it wasn't
supplying correct power to the drives, but I'd think that was unlikely.
I've just put one of the disks into my USB caddy again and have
successfully mounted the raid partition (as read-only). Trying to copy
files off I'm getting errors:
May 28 15:03:22 localhost kernel: [1123077.252040] sd 12:0:0:0: [sdc]
Unhandled sense code
May 28 15:03:22 localhost kernel: [1123077.252052] sd 12:0:0:0: [sdc]
May 28 15:03:22 localhost kernel: [1123077.252058] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 28 15:03:22 localhost kernel: [1123077.252064] sd 12:0:0:0: [sdc]
May 28 15:03:22 localhost kernel: [1123077.252068] Sense Key : Medium
Error [current]
May 28 15:03:22 localhost kernel: [1123077.252076] sd 12:0:0:0: [sdc]
May 28 15:03:22 localhost kernel: [1123077.252081] Add. Sense:
Unrecovered read error
May 28 15:03:22 localhost kernel: [1123077.252087] sd 12:0:0:0: [sdc] CDB:
May 28 15:03:22 localhost kernel: [1123077.252090] Read(10): 28 00 03
2b dc 00 00 00 f0 00
May 28 15:03:22 localhost kernel: [1123077.252108] end_request:
critical target error, dev sdc, sector 53206016
That seems to suggest that a hardware problem in the host server isn't
the issue.
Googling some of those errors took me to this:
http://www.linuxquestions.org/questions/linux-general-1/problem-mounting-che...
The error there was corrupted superblock, and various fscks didn't fix 
it, but
http://www.cgsecurity.org/wiki/TestDisk
did.
I guess a corrupted superblock would make sense - both disks would look 
wrong, and it could have been caused by a power loss.  Worth a look??
...
Both disks failing in similar ways at around the same time seems
unlikely too, unless there was a power surge or something (this box
doesn't go through a UPS).
...
1st things first - can you successfully copy any/all info off sda? If so,
that solves your data preservation issue.
Working on it. Copy: yes; successfully: not sure.
Good luck!
...
...
How are you adding the new drive to the array and triggering a rebuild?
mdadm --manage /dev/md0 --add /dev/sda1
(or sdb1, depending which drive I have swapped out).
Seems fail enough, but you may need to remove sda1 or sdb1 (depending) 
and then using --assemble to force the new drive to be active and not 
the spare.
Good luck!
Steve

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: [ALUG] Software RAID error