On Sat, Oct 13, 2007 at 01:29:02PM +0100, Chris G wrote:
On Sat, Oct 13, 2007 at 12:50:16PM +0100, Wayne Stallwood wrote:
Yes and that is fine as long as you are monitoring such things. But in my experience it is far better to have systems that have a degree of tolerance to failures than to rely on something or someone noticing an impending failure (In a perfect world you have both)
The trouble then is that you don't know that half your fault-tolerant RAID array is dead until the other half dies as well, the fault toleration may mask the underlying failures.
I've found Linux MD support quite good at correctly kicking a dead disk out of an array, whether it's failed due to a cable fault, an electrical disk issue or a mechanical fault. Also mdadm then emails me to let me know the array is degraded. For bonus points it also runs an array check on a monthly basis, just in case a failure /has/ been masked. I've never actually seen this report a problem though.
J.