Hey List,
I have an Ubuntu 9.10 box on which I recently created a software RAID6; you can see what it looks like in rather good detail if I may say so my self ;) here: http://i48.tinypic.com/30jrbdl.jpg
After this set up was all done I shut the machine down for the night.
Now when I boot the machine up to start using the array it wants to run the file system check utility, fsck I believe (like it normally does, every N days or N mounts, "which ever comes first"). This is fine, I assume it wants to do this because its a new volume and I like to know everything is working fine so I would always let this run on any of my machines but on this machine, it gets to the desk top, I can see the process is running via top or the process monitor, but within 1 minute of being on the desk top the machine will locks up and is unrecoverable so I am forced to turn it off? This happens every time, so ever since a week ago when I first made this RAID array, I haven't actually used the damn thing yet!
I can't understand why this would be; I thought it may have been the power supply as it is only 250 watts but I have the machine connected to a watt meter and I added the drives one at a time watching the watts increase, we are only at 140 Watts? I had thought that maybe it was to many devices on one single power cable coming out of the power supply as I have had to use splitters on the power cables so I swapped the power supply out temporarily for one in another machine that is rated for 500 watts and had more power cables coming out of it but no...I also stood another machine next to it, turn them on at the same time and used the 2nd machines power connectors to power some of the drives so the power was shared between the two PSUs, but no it still locked up
Next I thought it was the south bridge being over run but surely it wouldn't of been able to sit there for 20 hours building the RAID and formatting it? Is it possible that the building of the RAID wasn't very intense and now its all set up, when it tries to boot to the desktop and run fsck this is now making the drives work hard and so that only now the systems weakness is showing? (I did try increasing the south bridge voltage but the options in the CMOS are very limited and so I think there was a difference, it stayed on the desktop for 2 minutes before locking up instead of 1, but I can't remember, this would need validating).
I would love to hear anyone's thoughts on how I can get this working again?
P.s. sorry for the length!
On 22/05/10 22:28, James Bensley wrote:
Hey List,
I have an Ubuntu 9.10 box on which I recently created a software RAID6; you can see what it looks like in rather good detail if I may say so my self ;) here: http://i48.tinypic.com/30jrbdl.jpg
After this set up was all done I shut the machine down for the night.
Now when I boot the machine up to start using the array it wants to run the file system check utility, fsck I believe (like it normally does, every N days or N mounts, "which ever comes first"). This is fine, I assume it wants to do this because its a new volume and I like to know everything is working fine so I would always let this run on any of my machines but on this machine, it gets to the desk top, I can see the process is running via top or the process monitor, but within 1 minute of being on the desk top the machine will locks up and is unrecoverable so I am forced to turn it off? This happens every time, so ever since a week ago when I first made this RAID array, I haven't actually used the damn thing yet!
There is a member of the list who has had lockups on a raid5 desktop since 10.04 but is stable on 9.10, so I don't think this is related. In any case I have had no problems whatsoever on either 9.10 or 10.4 on raid5 here.
Out of interest what is the status of the raid array reported via, cat /proc/mdstat before the lockup (or if you skip the fsck) ? If the fsck is aborted or avoided by changing the autocheck options in fstab then is the box stable ? If the array status says it is rebuilding then maybe try letting it complete this and then let it do the fsck ?
Are the drives known to be good, can you install smartmontools and run a smartctl -a /dev/devicename for each member of the array and do the results look good, look to the smartmontools website for help interpreting the results or post links to them here.
What storage controller chipset(s) are the array members on ? Have you googled or checked ubuntuforums for issues relating to your storage controller(s) ?
Assuming all the drives are SATA is your mainboard bios set to AHCI or legacy mode for those ports, in legacy mode it is likely that some of the SATA ports are sharing bandwidth which can cause unpredictable results with an array.
Have you run a memory test recently using memtest86+ ? Even if the machine appears stable otherwise the softraid code may be using a faulty area of memory as disk cache and hitting a bad address.
I can't understand why this would be; I thought it may have been the power supply as it is only 250 watts but I have the machine connected to a watt meter and I added the drives one at a time watching the watts increase, we are only at 140 Watts? I had thought that maybe it was to many devices on one single power cable coming out of the power supply as I have had to use splitters on the power cables so I swapped the power supply out temporarily for one in another machine that is rated for 500 watts and had more power cables coming out of it but no...I also stood another machine next to it, turn them on at the same time and used the 2nd machines power connectors to power some of the drives so the power was shared between the two PSUs, but no it still locked up
yeh PSU would be a favorite if you have added a good number of drives to an existing and presumably previously healthy machine. You can't really take overall power consumption as a definitive guide to used PSU capacity as it is more down to how the load is distributed across the rails (5v, 12v etc) and it is very possible you are hitting the capacity of a particular output of the PSU before you hit it's overall nominal capacity. But given you have substituted the PSU I think we can rule that one out. That said a 250W PSU (even a good one) is really pushing your luck for that number of drives. Was the 500W PSU a good one or a cheap and nasty free with the case job ?
For future reference you need to throughly check the earthing of both machines and the components you are connecting the second PSU to when you cross couple DC supplies like that. Failing to do so can cause strange symptoms and in extreme cases even damage to the devices themselves or their respective bus