On Mon, 2006-08-21 at 19:03 +0100, Adam Bower wrote:
On Mon, Aug 21, 2006 at 06:45:18PM +0100, Barry Samuels wrote:
smartd[6119]: Device: /dev/hda, 1 Currently unreadable (pending) sectors smartd[6119]: Device: /dev/hda, 1 Offline uncorrectable sectors
Do the above error messages indicate any serious (potential) problems?
My gut instinct is that the error message you've got is telling you "disk is fubar" If I were you i'd be backing up right about now "just in case" (if you can backup that is) before trying to fix the problem.
I'll second Adam's advice and add a bit more info.
What has happened is that the drive has detected a bad sector and it unable to remap it to an area reserved for replacing bad sectors without losing the contents of that sector (normally because the ECC data is unreadable)
If the drive gets to read the data even once it will then move from a pending count to a reallocated count and all data will be intact.
If the drive never gets to read the data then you can manually force the drive to remap the sector anyway (losing the contents naturally) by either using manufacturer specific tools or by following these (slightly scary) instructions http://smartmontools.sourceforge.net/BadBlockHowTo.txt
But in my opinion leaving it as an offline sector is the safest thing to do, using the manufacturer tools (or the above instructions) it is very easy to kill the whole file-system.
This doesn't automatically mean the disk is dead or even dying, a small number of reallocated sectors over the lifetime of a disk is almost expected and usually transparent. Unless as in your case you are running the monitoring daemon and the ECC data is damaged you would never know.
As Adam suggested backing up your data is a good first step, then I'd keep an eye for more problems in the logs and perhaps schedule regular on-line tests. If you see more of the same error or notice the reallocated sector count steadily increasing then it is definitely time to change the disk.
What I wouldn't do before verifying that you have a good backup, is run any of the extended tests. If there is a mechanical or thermal problem with the drive then the extra stress of running the extended tests may push it over the edge and it could fail completely.