Re: [ALUG] Software RAID error

28 May 2013


      On 28/05/13 11:44, Mark Rogers wrote:
...
OK, well I may have taken the wrong path here then, but...
I hadn't seen any replies when I decided to replace sda, given that
sda was clearly showing errors in syslog (that looked pretty fatal to
me: http://paste.ubuntu.com/5709656/ - although with the benefit of
hindsight these may be errors on sdb which don't reference sdb by
name, mixed with errors on sda which do reference it by name - advice
welcomed on that one!)
Don't know - sorry.
...
I have sda now in a USB caddy where it doesn't even appear to exist as
far as my desktop is concerned.
Are you using Ubuntu, or something else?  If Ubuntu, does it show in 
Disks utility, or fdisk?  It may be present but not automatically 
mounted, as raid software may be confused by it now being external - bit 
of a guess there!
...
I separately have a 2TB disk pulled from somewhere it wasn't needed,
onto which I have created a new 1TB partition to match that on
sda/sdb, and installed it alongside sdb and included it in the array.
The rebuild started fine but I then started to get more errors:
http://paste.ubuntu.com/5709668/
Yes, definite impression of hardware errors on your original sdb.
...
/proc/mdstat now reports:
md0 : active raid1 sda1[2](S) sdb1[1]
       976629568 blocks super 1.2 [2/1] [_U]
I think that means that what is now showing up as sda1 is Spare - it can 
be part of the raid array, but isn't currently.  I suspect, but I'm not 
sure that sdb1 is the same sdb1 that you had before, and it's now the 
main/1st element of the raid array.  sda1 is the new disk you added.  It 
is NOT being used yet.
...
My take on all of this is that (the old) sda is dead and has gone
unnoticed, and now sdb has a problem.
My take is sdb very probably has a problem.  I don't know for sure about 
the old sda.
...
The RAID array houses several virtual machines. It isn't backed up as
such, although critical data on the individual VMs is backed up
separately. I'd really like to get as much back of this as I can
because otherwise I'm going to have to recreate about a dozen VMs,
although I'm realistic about my chances. As things stand the array is
mounted but giving errors in places, so I'm copying off what I can get
access to before I go any further.
Indeed - carry on backing up/copy from.
...
All the comments appreciated, even if I did press ahead without
reading them - I have pretty much confirmed now that sda is dead so
any hope of data recovery lies on sdb. If only I had logs going back
further to see what the sequence of events was (or, for that matter, I
was receiving mdadm notifications, something to investigate once I get
this back up and running).
I'd suggest that you continue backing up everything you can.  Then, I'd 
suggest you disconnect both sdb (the original one) and the new 2TB 
disk.  Reinsert the original sda back into its original place (i.e. not 
in the caddy).  Reboot and see if the raid array restarts but in 
degraded mode (i.e. it knows it's missing a disk).
I hope/suspect it's sdb that's been causing the problems.  IF you find 
that sda works by itself, then (assuming you have everything copied off 
the old sdb), I'd suggest that you reformat & repartition the 2TB disk 
and add it to your original sda as part of the raid array.  I suggest 
using mdadm and making sure that it's an active part of the array, not a 
spare - a spare is no use in a 2 disk raid.
HTH
Steve

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: [ALUG] Software RAID error