HDD Throughput Slow

HCHTech

Well-Known Member
Reaction score
4,196
Location
Pittsburgh, PA - USA
These days, anything that hits the bench with an HDD gets a pitch to replace with an SSD. I've had two almost identical cases in the past week, hardware diagnostics pass, including no flags in gsmartcontrol, BUT computer is very slow. When drive is pulled for backup, copying data is very slow, and if cloned to an SSD, the cloning process is very slow. Computer runs like new once SSD is mounted (and transfer rates are close to spec on the SSD benchmark), no infections found or concerning errors in the logs.

This means that likely the main cause of the extreme slowness before was lower-than-normal throughput to/from the HDD. To the drive experts here, I'm curious what part is failing that doesn't show up in the SMART logs or cause short-or-long test failures, but results in depressed throughput? I've run into this before, but usually by the time you reach that conclusion, it doesn't make any sense to try and document the actual throughput - it's just replace the thing and get on with it. Since we didn't replace the cable, and did a straight clone, it can't be the cable or the storage drivers - I would think, anyway. Mostly, I'm just looking for a name to give this symptom so I can say something more intelligent than "I guess it was broken somehow" - haha.
 
This is why I've never trusted SMART, because in my experience the device is simply failing, and the leading indicator of a platter drive going bad that's accurate to measure is its performance. Once it slows down, it's telling you its tired and it needs replacing.

So I just tell people it's just an old drive, because that's what platters do at the end of their lives but before they actually fail. If a customer refuses the SSD replacement in these cases I won't actually service the machine, simply return it with a note that the drive will fail soon.

From the technical side, I think the cache is what's failing in these cases. Platter drives without working cache revert to ancient PIO style access means, which reverts the drives to 1990s era performance levels.
 
With WD drives, it's usually firmware (slow-reponding) and with Seagates, it's background processes -- none of which show up in SMART. Reallocated and pending reallocation sectors counts is another culprit but at least they show up in SMART. For throughput, Victoria's speed test gives a quick assessment of the slowness of the drive, as the distribution of number of sectors read within time bands is spread out more into longer and longer read time bands vs bunched at the top in the short access time bands. I often benchmark the average time to read the first 100GB as that's where most of the action is.
 
The windows event log, does it show anything? In old days (parallel ATA and early SATA), if there are several errors on the hard drive, the Windows will step down DMA modes, eventually ending up in PIO. This did produce exactly the symptoms you describe. Back then, it was often caused by power supply deficiency. I remember being quite perplexed actually. Replacing the PSU and enabling DMA back in controller settings in Device Manager (or just deleting the controller for it to be re-detected?) did recover the performance completely.

Cache failure may (or probably may not) show up on SMART as End To End Error Count increasing.
 
SMART is just one indicator not an absolute. I've had quite a few drives show good in SMART and Windows complains or it fails to image or copy.
 
The windows event log, does it show anything? In old days (parallel ATA and early SATA), if there are several errors on the hard drive, the Windows will step down DMA modes, eventually ending up in PIO. This did produce exactly the symptoms you describe. Back then, it was often caused by power supply deficiency. I remember being quite perplexed actually. Replacing the PSU and enabling DMA back in controller settings in Device Manager (or just deleting the controller for it to be re-detected?) did recover the performance completely.

Cache failure may (or probably may not) show up on SMART as End To End Error Count increasing.
This. Windows will do a lot of remediation with out ever noticing the end user.
 
Kind of "as expected", thanks everyone. There really wasn't much in the logs at all - no red flags, certainly. I've never spent time on benchmarking since it doesn't mean much unless you have a result when the drive was working well to compare it to. There are so many variables that is hard to say "X MB/sec" is "bad". The DMA/PIO thing was IDE only as I recall - I definitely remember that as a problem when the incorrect mode (too slow) was being used.
 
I wasn't sure if I wanted to make a new thread or add to this but......

I just had a customer SSD (500 GB SanDisk Ultra installed by me several years ago) that exhibited the same symptoms as @HCHTech hard drive above. It would transfer data at full speed for the first few milliseconds then drop to ~75mbs after that. Given a rest it would again burst at full speed briefly and then drop back to almost nothing for sustained reads. Customer complained his system backups went from 40 minutes or so to 6 hours or longer. It took me 8+ hours just to get an image from it on a SATA bus. The image appears to be flawless and complete and if the SSD is going to fail I'm glad it did it this way and allowed me to image. I have no idea what in the controller would make it behave this way. This is the second SanDisk Ultra I've seen with problems but that's over several years.

I see that Samsung Evo SSDs don't have the price premium they once had over most other drives and I'm starting to use them again. I do have some customers that just won't pay for an SSD upgrade and in turn I'll use a small inexpensive PNY drive to upgrade them on-the-house. I hope that doesn't come back to bite me......
 
I wasn't sure if I wanted to make a new thread or add to this but......

I just had a customer SSD (500 GB SanDisk Ultra installed by me several years ago) that exhibited the same symptoms as @HCHTech hard drive above. It would transfer data at full speed for the first few milliseconds then drop to ~75mbs after that. Given a rest it would again burst at full speed briefly and then drop back to almost nothing for sustained reads. Customer complained his system backups went from 40 minutes or so to 6 hours or longer. It took me 8+ hours just to get an image from it on a SATA bus. The image appears to be flawless and complete and if the SSD is going to fail I'm glad it did it this way and allowed me to image. I have no idea what in the controller would make it behave this way. This is the second SanDisk Ultra I've seen with problems but that's over several years.

I see that Samsung Evo SSDs don't have the price premium they once had over most other drives and I'm starting to use them again. I do have some customers that just won't pay for an SSD upgrade and in turn I'll use a small inexpensive PNY drive to upgrade them on-the-house. I hope that doesn't come back to bite me......
Just for kicks, after you get it all backed up and you are 100% sure it is safe, use secure erase on the SSD (should take less than 10 seconds to complete) and then test its performance.
 
Just for kicks, after you get it all backed up and you are 100% sure it is safe, use secure erase on the SSD (should take less than 10 seconds to complete) and then test its performance.
Ahhh! Good thought. I mentioned the other SanDisk failure I had (1TB Ultra). I used SanDisk software then to wipe the drive before sending it in for warranty and it tested and worked fine after I did that. Still - I'd never trust it again.
 
...at the end of the day, most SSDs are made with very low quality NAND chips the require super complex ECC routines and sector remapping to keep up with the illusion that they are stable. But, just like hard drives, the firmware is stored on the same media as the data and just as vulnerable to failures.
 
These days, anything that hits the bench with an HDD gets a pitch to replace with an SSD. I've had two almost identical cases in the past week, hardware diagnostics pass, including no flags in gsmartcontrol, BUT computer is very slow. ... Mostly, I'm just looking for a name to give this symptom so I can say something more intelligent than "I guess it was broken somehow" - haha.

If we're working on a computer, and it's one of those remaining dinosaurs around with a spindle....and if we intend on fixing/repairing/tuning up to give back to the client...it gets an SSD right off the bat. Reason? Multiple reasons. But two main ones, 1) Computer will likely last a lot longer for the client now, since SSD's live as long as the computer could. Versus..spindles that slowly die and usually die before any other components on the computer would. Plus client WILL notice computer running MUCH better. 2) it saves us AT LEAST an hours time in doing cleanups/tuneups on the computer. Thus...the price of the SSD is more than covered in how much labor/bench time it saves the clients.

End result, better running computer, lower cost. Than if we stuck to leaving the spindle in there, and suffered through hours more bench time.

Anyways, the name for the symptom..."Spindle drives slow down over time, as they age and get wear and tear". Yup...totally expected behavior for a spindle.
 
Back
Top