I have just moved two mdadm RAID1 arrays from an old server to a new one.
Initially mdadm automatically created two arrays (md126 & md127), read-only and marked resync=pending. I mounted these and they were fine.
I then stopped those arrays and rebuilt them manually as md0 & md1, and the resync started.
Neither will now mount; mdadm seems happy but there isn't a recognisable filesystem on either array. I have stopped both arrays while I try to work out what went wrong and whether I can fix it.
Suggestions?
All the important data is backed up elsewhere but there's a lot of unimportant stuff that I'd nevertheless prefer to retrieve if I can. (Several TB of applications, distros, etc which can all be retrieved from the Internet if needed, which is why I don't waste space backing them up, but I'd still rather not have to throw them away and start again if I can avoid it.)
One thought, although I'm fairly sure this isn't what happened, is that because I have two raid arrays I might have mixed my disks up when I created my new arrays (one of each in each). I would have thought that mdadm would have tried very hard to stop me doing that though?
On 12/09/18 11:17, Mark Rogers wrote:
I have just moved two mdadm RAID1 arrays from an old server to a new one.
Initially mdadm automatically created two arrays (md126 & md127), read-only and marked resync=pending. I mounted these and they were fine.
I then stopped those arrays and rebuilt them manually as md0 & md1, and the resync started.
Neither will now mount; mdadm seems happy but there isn't a recognisable filesystem on either array. I have stopped both arrays while I try to work out what went wrong and whether I can fix it.
Suggestions?
Caveat: I am not a MDADM expert. Anything you do could destroy all the data permanently.
Read https://raid.wiki.kernel.org/index.php/RAID_Recovery and https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn and google MDADM recovery.
If nothing works, as a last resort, you're probably in a situation where you just want to recover data. If you have nothing else to lose, and accepting that it's not my fault if something goes wrong, & I suggested you look elsewhere for help first.... ...you could try mounting each disk, one at a time into a degraded raid 1 array - i.e. it's expecting 2 (or more) disks but it knows one (or more) has failed. If you can mount a drive (preferably read-only) like this and see the contents of the disk, then you can copy that elsewhere, and then make a new raid array. You can try it for each drive individually.
All the important data is backed up elsewhere but there's a lot of unimportant stuff that I'd nevertheless prefer to retrieve if I can. (Several TB of applications, distros, etc which can all be retrieved from the Internet if needed, which is why I don't waste space backing them up, but I'd still rather not have to throw them away and start again if I can avoid it.)
One thought, although I'm fairly sure this isn't what happened, is that because I have two raid arrays I might have mixed my disks up when I created my new arrays (one of each in each). I would have thought that mdadm would have tried very hard to stop me doing that though?
mdadm is quite powerful but I personally don't think it's very user friendly (probably because I don't understand it enough). I think it's probably quite easy to create a new raid array erasing an old one, rather than mounting & resyncing an old one. I guess that that is the most likely possibility. I'm intrigued why the first time the raid arrays mounted read-only and resyncing. That suggests to me there was already some sort of failure.
Anyhoo - good luck!
Steve
On 12 September 2018 at 16:50, steve-ALUG@hst.me.uk wrote:
Caveat: I am not a MDADM expert. Anything you do could destroy all the data permanently.
Understood!
mdadm is quite powerful but I personally don't think it's very user friendly (probably because I don't understand it enough). I think it's probably quite easy to create a new raid array erasing an old one, rather than mounting & resyncing an old one. I guess that that is the most likely possibility.
Indeed, I think that's what I've done. It did throw up some caution that I clearly misunderstood or didn't read properly.
I'm intrigued why the first time the raid arrays mounted read-only and resyncing.
That's just how it handled seeing the drives for the first time. This was a clean install of the OS (on a different disk), into which I then connected the twp RAID pairs from another server. It auto-detected them and auto-created two temporary arrays, marked read only which seems a fair consideration (rather than just starting to sync them without intervention).
Anyhoo - good luck!
Hmmm....
I'm pretty sure my only option is to attempt recovery as if the filesystem were corrupt. Which tools would people recommend for this? I've not done it for a while, I can see whether testdisk can see anything although I think it would need to see a partition table from what I recall?
Even though most of the stuff I need is backed up I still try to recover data because I find it a useful skill, and maybe I'm making progress.
I'm currently using testdisk to search for filesystems. That is: - testdisk /dev/sdb1 - Partition type = None -> Analyse -> Quick Search - MD Raid partition is found. Select then deeper search
testdisk then gives me several candidates for ext4 filesystems. Selecting one and using the "P: List Files" option shows me a lot of my data.
BUT then I'm stuck with my limited understanding. At this point I'd like to mount the discovered filesystem read only and see what files I can really access, and later maybe fsck it. But I can't work out what information to take from testdisk and how to pass it to mount to do that.
This is what testdisk is telling me, any suggestions how to mount from there? (It's the second ext4 entry that shows me data when I use the P option.) At this point I do not want to write to the disk in any way.
Disk /dev/sdb1 - 3000 GB / 2794 GiB - CHS 364801 255 63 Partition Start End Size in sectors P Linux md 1.x RAID 0 0 1 364784 189 21 5860266888 [fileserver:0] P ext4 16 80 63 364801 47 52 5860268936
P ext4 16 81 2 364801 47 54 5860268936
Structure: Ok.
Keys T: change type, P: list files, Enter: to continue ext4 blocksize=4096 Large_file Sparse_SB, 3000 GB / 2794 GiB
Update:
I'd still like to know how to "fix" this, either by repairing the array or mounting the filesystem based on the testdisk info.
But I have now located a spare drive and installed it, and I'm using the testdisk "list files" function and its "copy" capability to copy the files from the corrupted disks to the new disk. Thus far it's working pretty well as long as the files aren't themselves corrupt - the handful I've checked have been fine but see separate post to the list about automating wider checks.
I find testdisk can be a bit counterintuitive to use but it is a fantastic data recovery tool that everyone should be aware of (along with it's companion "photorec" program for recovering photos and other files from a corrupt disk, which I have used more than once to recover photos from "dead" disks for friends and family).
Mark
On 13/09/18 16:06, Mark Rogers wrote: []
I find testdisk can be a bit counterintuitive to use but it is a fantastic data recovery tool that everyone should be aware of (along with it's companion "photorec" program for recovering photos and other files from a corrupt disk, which I have used more than once to recover photos from "dead" disks for friends and family).
Mark
I always keep a fairly recent copy of Ultimate Boot CD and System Rescue CD around for emergencies. If your main system goes down, you may not be able to download the tools to fix it, so get some before it fails!
Steve
I always keep a fairly recent copy of Ultimate Boot CD and System Rescue CD around for emergencies. If your main system goes down, you may not be able to download the tools to fix it, so get some before it fails!
I always have enough systems around that I can download one if needed but yes this is a good idea.
Testdisk has succeeded in recovering all my data, by the way. I'll spend some time diff'ing it against what is backed up first before I decide if I can trust it though.