Having seen that others are suffering RAID problems, perhaps now is an opportune time to ask about mine.
I have an Edimax enclosure with 2 x 1TB discs in it. I got a notification from it yesterday to say that sdb had either failed or would fail within the next 24 hours.
The drives were bought at the same time as the enclosure so given that one is failing or has failed, is it wise for me to replace both?
The discs in there at the moment are Seagate drives and I don't think I specified anything special for them i.e. I just bought 2 discs. This time I'm going to go for something like a WD Red drive which are specified for NAS use.
So 2 questions. Does anybody have any recommendations for discs or are the Western Digital discs ok? And second, should I replace both or is one ok for the time being?
On 28/05/13 12:16, Chris Walker wrote:
Having seen that others are suffering RAID problems, perhaps now is an opportune time to ask about mine.
I have an Edimax enclosure with 2 x 1TB discs in it. I got a notification from it yesterday to say that sdb had either failed or would fail within the next 24 hours.
The drives were bought at the same time as the enclosure so given that one is failing or has failed, is it wise for me to replace both?
You could replace just one if necessary. If you are able to, and can afford to, I would replace both. Why? If you just replace 1, the eldest will be the most likely to fail. Newer drives will likely be faster, higher capacity and/or use less power. IMO, it's easier if you have two of the same drives - less chance of hardware incompatibilities, and more likely to work well together.
On the flip side though, it'll take longer and be harder to change 2 drives, and, you may run into problems if the partitions are not the same size on the new disks as on the old ones - depending on how your RAID works and the capabilities of the EDIMAX enclosure. You might find that you buy a 2TB disk but can only use 1TB because you can't expand the partitions on the RAID.
The discs in there at the moment are Seagate drives and I don't think I specified anything special for them i.e. I just bought 2 discs. This time I'm going to go for something like a WD Red drive which are specified for NAS use.
So 2 questions. Does anybody have any recommendations for discs or are the Western Digital discs ok? And second, should I replace both or is one ok for the time being?
I seem to remember reading that WD didn't have a good reputation, but I may be wrong about that. I've favoured Seagate and Maxstore in the past, though I think they're the same company now.
HTH Steve
On Tue, 28 May 2013 13:07:37 +0100 steve-ALUG@hst.me.uk wrote:
On 28/05/13 12:16, Chris Walker wrote:
Having seen that others are suffering RAID problems, perhaps now is an opportune time to ask about mine.
I have an Edimax enclosure with 2 x 1TB discs in it. I got a notification from it yesterday to say that sdb had either failed or would fail within the next 24 hours.
The drives were bought at the same time as the enclosure so given that one is failing or has failed, is it wise for me to replace both?
You could replace just one if necessary. If you are able to, and can afford to, I would replace both.
I've now ordered 2 WD Red 1TB discs from Scan.
RAID works and the capabilities of the EDIMAX enclosure. You might find that you buy a 2TB disk but can only use 1TB because you can't expand the partitions on the RAID.
The enclosure will only handle 1TB discs in each bay.
On 28/05/13 16:42, Chris Walker wrote:
I've now ordered 2 WD Red 1TB discs from Scan.
The enclosure will only handle 1TB discs in each bay.
I'd suggest you replace the one the enclosure's complained about first, wait for the resync process to complete. Then swap out the other drive. Backup before you start too, if feasible!
Good luck Steve
On 28/05/13 12:16, Chris Walker wrote:
So 2 questions. Does anybody have any recommendations for discs or are the Western Digital discs ok? And second, should I replace both or is one ok for the time being?
Depends on the hours on the drives. I find non NAS grade HDD's in an always on setup tend to bail (on average) after 3 years.
Then I have had drives where an array of 3 drives have all failed within 1000 hours of each other. Worse still the array rebuild can uncover problems with the remaining drives you didn't know you had.
If the data is/was important to you and you don't already have a good backup I'd take one now.
Recovery issues with RAID have gone up as the member drives have grown in size as the unrecoverable read error rate hasn't scaled by the same factor as storage. So for example a 8x2TB RAID 5 array built from regular SATA drives is statistically unlikely to be able to rebuild itself if a member fails.
On Wed, 29 May 2013 00:21:48 +0100 Wayne Stallwood ALUGlist@digimatic.co.uk wrote:
On 28/05/13 12:16, Chris Walker wrote:
So 2 questions. Does anybody have any recommendations for discs or are the Western Digital discs ok? And second, should I replace both or is one ok for the time being?
Depends on the hours on the drives. I find non NAS grade HDD's in an always on setup tend to bail (on average) after 3 years.
The enclosure and drives were bought in November 2009. I don't always have it on and tend to start it up each morning. I'm not sure if running 24x7 is better or worse than running 'as needed' though.
If the data is/was important to you and you don't already have a good backup I'd take one now.
I had a backup as I recently pulled everything off the NAS but I've now used that space for something else! I think I know where I can find another drive to handle *that* data though thus releasing a 1TB drive to use as backup.
On Wed, 29 May 2013 00:21:48 +0100 Wayne Stallwood ALUGlist@digimatic.co.uk allegedly wrote:
On 28/05/13 12:16, Chris Walker wrote:
So 2 questions. Does anybody have any recommendations for discs or are the Western Digital discs ok? And second, should I replace both or is one ok for the time being?
Depends on the hours on the drives. I find non NAS grade HDD's in an always on setup tend to bail (on average) after 3 years.
Then I have had drives where an array of 3 drives have all failed within 1000 hours of each other. Worse still the array rebuild can uncover problems with the remaining drives you didn't know you had.
I have been following this discussion with some interest (and concern) because I have just a week or so ago built my first debian based software RAID server. (Actually, my second, but the first one doesn't count because it was a series of trial runs on a retired box).
I built it to replace a disparate set of NAS boxes (D-Link DNS 320 and 313 and three NSLU2s) which between them probably consumed as much power as the single new "server" (which idles at 46-48 watts). It also centralises my network filestore in one box running a stock wheezy which I think is more in keeping with my needs.
I'm running two WD 2TB 3.5" SATA-III "Caviar Green" disks in RAID 1. I chose them for their power characteristics and because I had been very impressed with how quiet the disks are when I installed one in my desktop recently. I confess, however, that I have no idea of their reliability or longevity. I was hoping that RAID 1 would give me the security I need for my data - now I'm not so sure.
I am now investigating how best to monitor the disks using smartctl!
Mick ---------------------------------------------------------------------
blog: baldric.net gpg fingerprint: FC23 3338 F664 5E66 876B 72C0 0A1F E60B 5BAD D312
---------------------------------------------------------------------
On 29 May 2013 12:01, mick mbm@rlogin.net wrote:
I have been following this discussion with some interest (and concern) because I have just a week or so ago built my first debian based software RAID server.
Well the first thing to say on this would be: think about how different the RAID threads would have been had myself or Chris not used RAID. In Chris' case, his one lost disk would have meant a loss of his data; in my case the RAID appears to have mirrored some filesystem corruption between two disks. So for Chris, RAID has saved his bacon, and for me it's wither helped or made no difference - it certainly hasn't made things worse.
At the end of the day your data is safe until (a) the hardware fails or (b) it gets corrupted. RAID helps with (a) as long as you monitor it, otherwise when one disk fails you don't deal with it. RAID doesn't help with (b) (nor does it cause it) which is why backups are important. After all, having two perfectly mirrored copied of a virus isn't all that helpful!
I am now investigating how best to monitor the disks using smartctl!
First thing is to make sure you're getting notifications from RAID. This should just be a case of making sure your system can send emails and then putting your email address in /etc/mdadm/mdadm.conf (set the MAILADDR parameter).
Mine was left set to the default: "root", a mailbox which isn't checked. Make sure you can send emails from the command-line: mail <insert your chosen address here> .. and follow the prompts, making sure you're able to send (and receive!) the email. (Apologies for the sucking eggs lesson but there'll be someone reading this that might need it).
As an aside: I always use software RAID (mdadm) these days. When the host hardware fails, taking the disk out and putting it into any other Linux box is easy (eg via a USB caddy), which is not the case when using either hardware RAID, or the software RAID that comes on some motherboards and SATA cards. Genuine hardware RAID has its performance advantages but I don't generally work in environments where that's necessary.
Also, contrary to what many people suggest, I prefer to use non-identical drives in my RAID arrays. Two identical drives from manufacturer A may be very compatible but will also be at risk of failing at very similar times due to their identical manufacturing processes. A drive each from mfrs A and B makes more sense to me. Just be aware that two disks of the same "size" often aren't identical sizes so make sure the partition size you create fits on both disks.
Mark
-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG
On Wed, 29 May 2013 14:13:13 +0100 Mark Rogers mark@quarella.co.uk allegedly wrote:
On 29 May 2013 12:01, mick mbm@rlogin.net wrote:
I have been following this discussion with some interest (and concern) because I have just a week or so ago built my first debian based software RAID server.
Well the first thing to say on this would be: think about how different the RAID threads would have been had myself or Chris not used RAID. In Chris' case, his one lost disk would have meant a loss of his data; in my case the RAID appears to have mirrored some filesystem corruption between two disks. So for Chris, RAID has saved his bacon, and for me it's wither helped or made no difference - it certainly hasn't made things worse.
At the end of the day your data is safe until (a) the hardware fails or (b) it gets corrupted. RAID helps with (a) as long as you monitor it, otherwise when one disk fails you don't deal with it. RAID doesn't help with (b) (nor does it cause it) which is why backups are important. After all, having two perfectly mirrored copied of a virus isn't all that helpful!
Good point. My previous NAS setup had two separate backups for my crucial data. One to the D-Link NAS (a RAID 1 setup) and a second to one of the NSLU2s (non-RAID). Less important data (e.g. MP3s and MP4s where I have the CD or DVD) was backed up up to other devices. (See previous discssions about rsync...) Now that I have centralised my storage I think I had better keep another backup device going.
I am now investigating how best to monitor the disks using smartctl!
First thing is to make sure you're getting notifications from RAID. This should just be a case of making sure your system can send emails and then putting your email address in /etc/mdadm/mdadm.conf (set the MAILADDR parameter).
Mine was left set to the default: "root", a mailbox which isn't checked. Make sure you can send emails from the command-line: mail <insert your chosen address here> .. and follow the prompts, making sure you're able to send (and receive!) the email. (Apologies for the sucking eggs lesson but there'll be someone reading this that might need it).
No apology necessary. It prompted me to check that mail was operational on the new box (errm it wasn't, but it is now).
As an aside: I always use software RAID (mdadm) these days. When the host hardware fails, taking the disk out and putting it into any other Linux box is easy (eg via a USB caddy), which is not the case when using either hardware RAID, or the software RAID that comes on some motherboards and SATA cards. Genuine hardware RAID has its performance advantages but I don't generally work in environments where that's necessary.
The new box has a motherboard which offers "fake-raid". I decided (partly based on previous discussions on ALUG) that mdadm looked a better supported option. Having played with it on the aforementioned retired box I decided to take the plunge and build a new server. That turned out to take longer than it should because the new motherboard has UEFI rather than good old fashioned BIOS. This was my first build with UEFI.
Also, contrary to what many people suggest, I prefer to use non-identical drives in my RAID arrays. Two identical drives from manufacturer A may be very compatible but will also be at risk of failing at very similar times due to their identical manufacturing processes. A drive each from mfrs A and B makes more sense to me. Just be aware that two disks of the same "size" often aren't identical sizes so make sure the partition size you create fits on both disks.
Interesting (and plausible) theory.
Many thanks for the feedback.
Mick ---------------------------------------------------------------------
blog: baldric.net gpg fingerprint: FC23 3338 F664 5E66 876B 72C0 0A1F E60B 5BAD D312
---------------------------------------------------------------------
On Wed, 29 May 2013 14:13:13 +0100 Mark Rogers mark@quarella.co.uk allegedly wrote:
Well the first thing to say on this would be: think about how different the RAID threads would have been had myself or Chris not used RAID. In Chris' case, his one lost disk would have meant a loss of his data; in my case the RAID appears to have mirrored some filesystem corruption between two disks. So for Chris, RAID has saved his bacon, and for me it's wither helped or made no difference - it certainly hasn't made things worse.
Supplementary question. The mdadm FAQ says:
"When I do mdadm --query --detail /dev/md0 there's this line that shows up somewhere in the listing: Events : 0.xyz . I've crawled the net for documentation about it, but I have no clue what the number represents."
But it gives no answer. My queries show various "events" numbers. Should I be concerned?
Cheers
Mick ---------------------------------------------------------------------
blog: baldric.net gpg fingerprint: FC23 3338 F664 5E66 876B 72C0 0A1F E60B 5BAD D312
---------------------------------------------------------------------
On 29 May 2013 18:12, mick mbm@rlogin.net wrote:
But it gives no answer. My queries show various "events" numbers. Should I be concerned?
As I understand it, and based on https://raid.wiki.kernel.org/index.php/RAID_Recovery, events is a record of writes to the disks.
If I look at: mdadm --examine /dev/sd[a-z]1 | egrep 'Event|/dev/sd' on one of my machines, I see each of my 4 disks has an event count of 9760, which is good as it means all four disks are "in sync"; if one of them was lower than this it would have missed something. This is what mdadm uses to determine whether to automatically assemble an array.
If I look at: mdadm --query --detail /dev/md0 the value I see for Events is 0.9760, ie the same value (with the 0. suffix). So why the 0. suffix I have no idea but the events value should be non-zero and ever increasing, by the look of things.
Mark -- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG
On Thu, 30 May 2013 14:29:17 +0100 Mark Rogers mark@quarella.co.uk allegedly wrote:
As I understand it, and based on https://raid.wiki.kernel.org/index.php/RAID_Recovery, events is a record of writes to the disks.
If I look at: mdadm --examine /dev/sd[a-z]1 | egrep 'Event|/dev/sd' on one of my machines, I see each of my 4 disks has an event count of 9760, which is good as it means all four disks are "in sync"; if one of them was lower than this it would have missed something. This is what mdadm uses to determine whether to automatically assemble an array.
If I look at: mdadm --query --detail /dev/md0 the value I see for Events is 0.9760, ie the same value (with the 0. suffix). So why the 0. suffix I have no idea but the events value should be non-zero and ever increasing, by the look of things.
Mark
Many thanks for the continued replies. I'm learning. My results are different though. The number of "events" given for each disk seems ok (at 31 for each). I assume the very low number is simply a symptom of the short run time compared to yours. But the results I get for "mdadm --query --detail /dev/mdX" gives three different answers. md0 is 31, md1 is 19 and md2 is 38 (I have three partitions, /boot, swap and /) and, as you can see, no decimals.
I'll keep reading the manual.
Cheers
Mick
---------------------------------------------------------------------
blog: baldric.net gpg fingerprint: FC23 3338 F664 5E66 876B 72C0 0A1F E60B 5BAD D312
---------------------------------------------------------------------
On 29/05/13 14:13, Mark Rogers wrote:
First thing is to make sure you're getting notifications from RAID. This should just be a case of making sure your system can send emails and then putting your email address in /etc/mdadm/mdadm.conf (set the MAILADDR parameter).
Mine was left set to the default: "root", a mailbox which isn't checked. Make sure you can send emails from the command-line: mail <insert your chosen address here> .. and follow the prompts, making sure you're able to send (and receive!) the email. (Apologies for the sucking eggs lesson but there'll be someone reading this that might need it).
Re email address, my server uses a file called /etc/aliases
It has various entries like
postmaster: root
I've added an entry like root: user_who_should_receive_emails
Where user_who_should_receive_emails is replaced with the user name of the person who should receive root's (and any other system) emails. For that user, the email name and email address. Once you've edited this file, run
sudo newaliases
to inform the email system that aliases have been modified.
Just pointing out the obvious - you won't get any email notifications from your raid server if your raid server does not have some sort of email sending program on it.
Cheers Steve
On 30/05/13 17:31, steve-ALUG@hst.me.uk wrote:
Just pointing out the obvious - you won't get any email notifications from your raid server if your raid server does not have some sort of email sending program on it.
Oops - Mark already pointed this out - sorry.