On 28/05/13 11:44, Mark Rogers wrote:
OK, well I may have taken the wrong path here then, but...
I hadn't seen any replies when I decided to replace sda, given that sda was clearly showing errors in syslog (that looked pretty fatal to me: http://paste.ubuntu.com/5709656/ - although with the benefit of hindsight these may be errors on sdb which don't reference sdb by name, mixed with errors on sda which do reference it by name - advice welcomed on that one!)
Don't know - sorry.
I have sda now in a USB caddy where it doesn't even appear to exist as far as my desktop is concerned.
Are you using Ubuntu, or something else? If Ubuntu, does it show in Disks utility, or fdisk? It may be present but not automatically mounted, as raid software may be confused by it now being external - bit of a guess there!
I separately have a 2TB disk pulled from somewhere it wasn't needed, onto which I have created a new 1TB partition to match that on sda/sdb, and installed it alongside sdb and included it in the array. The rebuild started fine but I then started to get more errors: http://paste.ubuntu.com/5709668/
Yes, definite impression of hardware errors on your original sdb.
/proc/mdstat now reports: md0 : active raid1 sda1[2](S) sdb1[1] 976629568 blocks super 1.2 [2/1] [_U]
I think that means that what is now showing up as sda1 is Spare - it can be part of the raid array, but isn't currently. I suspect, but I'm not sure that sdb1 is the same sdb1 that you had before, and it's now the main/1st element of the raid array. sda1 is the new disk you added. It is NOT being used yet.
My take on all of this is that (the old) sda is dead and has gone unnoticed, and now sdb has a problem.
My take is sdb very probably has a problem. I don't know for sure about the old sda.
The RAID array houses several virtual machines. It isn't backed up as such, although critical data on the individual VMs is backed up separately. I'd really like to get as much back of this as I can because otherwise I'm going to have to recreate about a dozen VMs, although I'm realistic about my chances. As things stand the array is mounted but giving errors in places, so I'm copying off what I can get access to before I go any further.
Indeed - carry on backing up/copy from.
All the comments appreciated, even if I did press ahead without reading them - I have pretty much confirmed now that sda is dead so any hope of data recovery lies on sdb. If only I had logs going back further to see what the sequence of events was (or, for that matter, I was receiving mdadm notifications, something to investigate once I get this back up and running).
I'd suggest that you continue backing up everything you can. Then, I'd suggest you disconnect both sdb (the original one) and the new 2TB disk. Reinsert the original sda back into its original place (i.e. not in the caddy). Reboot and see if the raid array restarts but in degraded mode (i.e. it knows it's missing a disk).
I hope/suspect it's sdb that's been causing the problems. IF you find that sda works by itself, then (assuming you have everything copied off the old sdb), I'd suggest that you reformat & repartition the 2TB disk and add it to your original sda as part of the raid array. I suggest using mdadm and making sure that it's an active part of the array, not a spare - a spare is no use in a 2 disk raid.
HTH Steve