Is this drive slowly dying?

Haole Boy

Active Member
Reaction score
190
I'm back with yet another question about possible disk failure. Customer called saying machine was running slow, so I set up an appointment went over and the machine seemed normal to me. Did my regular cleanup / update routine and found the following in the SMART output (full gSmartCtl output available here)

ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
194 Temperature_Celsius -O---K 094 084 000 - 53
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 4
198 Offline_Uncorrectable ----CK 200 200 000 - 0

But, system seemed to run normally at that point. 1 month later I get another call saying it's taking 10 minutes to logon. When I got there, the system was running normally. I'm *assuming* that this was the result of Windows trying to read a bad sector and it eventually got remapped to a good sector. Here's the SMART output from that visit

ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
194 Temperature_Celsius -O---K 090 084 000 - 57
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 4
198 Offline_Uncorrectable ----CK 200 200 000 - 4


So... the Current_Pending_Sector count remained the same, but the Offline Uncorrectable now shows 4 bad sectors. Does this mean that there are now 8 bad sectors - i.e.: the drive is deteriorating and should be replaced? Or are these the same 4 sectors showing up in both categories? And if this really was a bad sector being remapped, why is the Reallocated_Sector_Ct still 0?

Also, should I be concerned about the temperature? (57 Celsius). This is an HP all in one.

Mahalo for your assistance,

Harry Z
 
This could have been taken care of in one service call instead of multiple ones. You should have brought a bootable linux cd (Linux Mint Mate) and booted from it. Then fired up "Disks" and checked the drive. As soon as you see anything wrong with it then its time to backup data and replace drive.
 
I can't remember if I booted my Linux USB drive to run gSmartCtl or I just ran it from Windows. From reading this Forum for many years, there does not seem to be universal agreement on how many bad sectors there has to be before you replace the drive. Some folks do it with 1 bad sector, others have a tolerance of 10 or 20 bad sectors. Considering who the customer is (elderly, money would be a challenge, occasional use of the PC) I decided to monitor the situation. I realize other folks would have replaced the drive at the first service call.

In this instance, the SMART indicators have changed and I want to understand more about this before recommending replacing the drive.

Mahalo for your feedback,

Harry Z
 
How many bad blocks? 1 in my book. It's not like drives are rare or extremely expensive. Where there is smoke there's certainly going to be some fire. Just because you find 1 or 5 does not mean that is all there is. Personally I don't even need SMART, just a poorly performing computer is enough.
 
197 Current_Pending_Sector -O--CK 200 200 000 - 4
If it were my own drive, I'd run a badblocks write test to try to get the drive to reallocate those sectors. If it does, the drive can stay in service; or, if it's a failing drive, you'll end up with a lot more unreadable sectors. But for a customer who's paying by the hour, it's not worth doing that; the testing takes time, and the OS has to be reinstalled anyway, so better to just go to a new drive.

198 Offline_Uncorrectable ----CK 200 200 000 - 4
Usually nothing to worry about; the unreadable sectors are the real problem.

And if this really was a bad sector being remapped, why is the Reallocated_Sector_Ct still 0?
The drive only remaps sectors on a write operation, which will never happen for sectors occupied by stuff that doesn't change. Seems like a bad choice not to have repeated read failures trigger a remap, but what do I know?

Also, should I be concerned about the temperature? (57 Celsius). This is an HP all in one.
That's higher than I like to see, but may not be anything you can do about it in an AIO.
 
And if this really was a bad sector being remapped, why is the Reallocated_Sector_Ct still 0?

The sector is not being remapped. It is pending rewrite attempt. The data cannot be read, so there is no point in remapping the sector. There is nothing to put into a spare sector anyway. The write, if it ever happens, will be verified. If the write succeeds (clearing transient error) there will be no remap and Current Pending Sector Count will decrease. If the write fails, the sector will be remapped, Current Pending Sector Count will decrease, and Reallocated Sector Count will encrease. Until there is a write attempt to this sector, nothing is going to change.
 
Seems like a bad choice not to have repeated read failures trigger a remap

This is kinda difficult question. Lets' say the sector does not read. After three attempts we remap it. What data do we put into the remapped sector? The original can't be read (or there won't be a problem). Do we put in zeros? Let's say we do, and now whatever software request the sector, is going to get a bunch of zeros instead the original data without any notification that there was a data loss. Because the new sector reads OK, its just the data is gone. We can work around that and put some tag onto a replacement sector which says "if that sector is requested, return a read error" and then clear this flag after the first write, but that kind of negates the whole point of remapping.
 
This is kinda difficult question. Lets' say the sector does not read. After three attempts we remap it. What data do we put into the remapped sector? The original can't be read (or there won't be a problem). Do we put in zeros? Let's say we do, and now whatever software request the sector, is going to get a bunch of zeros instead the original data without any notification that there was a data loss. Because the new sector reads OK, its just the data is gone. We can work around that and put some tag onto a replacement sector which says "if that sector is requested, return a read error" and then clear this flag after the first write, but that kind of negates the whole point of remapping.
Makes sense when you explain it like that. I was just thinking from the standpoint of having repeated time-consuming read failures on a known-bad sector.
 
If this was a server raid set reporting bad sectors on two drives would you rush out and replace the drives or would you just go "Ah, Its only a few bad sectors".

Do your customer a favor and replace the drive and be done with it.
 
Did my regular cleanup / update routine and found the following in the SMART output
I use Crystal Disk Info and that program interprets the SMART statistics for you, and does it very well. If it says CAUTION or BAD then I replace the drive. If it says GOOD but I still suspect drive issues I do a full surface scan diagnostic (e.g. Seatools long test).
 
On a "slightly old" to an "older" rig...if it's running slow, and if it has a spindle drive, (especially a WD Blue spindle)...I don't even waste time or consume a few seconds of brain cells, simply and easily/quickly clone to an SSD.

Before SSD's...I just cloned to a new better spindle.

Rotating hard drives are the one single part of a computer that is guaranteed to degrade in performance over time. They "slow down and lose performance" as they accumulate hours of use.

Don't waste time thinking about it or questioning it. Just clone to new..and make that new one an SSD. Your client will LOVE you for it, they'll love the 12 second bootups and fast program launches....and long life span.
 
Customer called saying machine was running slow, so I set up an appointment went over and the machine seemed normal to me. Did my regular cleanup / update routine
One of the most common causes of a slow PC, is a failing hard drive, so checking SMART is one first things I do. Tuning up the PC might just drive it over the edge and ruin any chance to back up the user data, so it would come last, not first.
 
Rotating hard drives are the one single part of a computer that is guaranteed to degrade in performance over time. They "slow down and lose performance" as they accumulate hours of use.

That's the number one rule of troubleshooting failures. The parts that move the most fail the most. I do work for several companies that have interactive kiosks such as building directories, information displays, deposit takers, coupon dispensers, etc. Those types mostly run off of whats in RAM so the HD doesn't get much activity. In those situations motherboard/power supply failures outnumber HD failures like 5 to 1. With the drop in SSD prices many have been moving over to SSD's. For static environments like that 32gb is fine since most use the Embedded versions of Windoze.
 
From reading this Forum for many years, there does not seem to be universal agreement on how many bad sectors there has to be before you replace the drive.

That's easy - it's one, and now that hard drives are practically free and SSDs are affordable for most people there's a pretty good case for zero reported defects if you have any other reason to suspect the drive.

Hardware is cheap, data loss is very expensive. Why take the chance?
 
Last edited:
Back
Top