Technical Overview of Popular Software Data Recovery Procedures - Technibble
Technibble
Shares

Technical Overview of Popular Software Data Recovery Procedures

Shares

In this article, Deepspar have written an advanced guide aimed at educating technicians on the pros and cons of popular software data recovery methods. DeepSpar’s primary line of business is manufacturing and resale of professional data recovery equipment. They have been doing applied research on various data recovery techniques of hard disk drives since 2001.

In this article, you can learn about:

  • Common symptoms and causes of read instability problems.
  • The risks involved when connecting drives directly to Windows or OS X, and a way to mitigate them.
  • Different operating modes, advantages, and disadvantages of logical recovery software.
  • In-depth benefits and limitations of GNU ddrescue.
  • Bad sector repair tools and why they should not be used for data recovery purposes.
  • The importance of customer communication.


 
In this guide we will outline the most important technical aspects that should be taken into account when recovering data from unstable hard disk drives with software tools. This is an advanced guide written for technicians who are already familiar with basic software data recovery procedures. We will explain the in-depth pros and cons of common software recovery methods to help technicians make informed choices with their drives. There are exceptions to virtually everything in data recovery, so we will be talking about average cases involving average drives.

First, we must explain what hard drive instabilities are. A drive is unstable or degraded when it is no longer capable of processing standard commands in a predictable manner. Every drive will present a unique mix of symptoms: it could have bad sectors; it could go offline and stop responding randomly; it could stop responding upon hitting a bad sector or receiving a particular command; it could respond many times slower than normal, or any combination of the above.

Bad sectors are the most common symptom shown by unstable drives. Intuitively, the name “bad sectors” implies that there are simply some bad areas on the drive’s platters; however, that is not necessarily the case. Bad sectors can be caused by a combination of different problems, such as

  • degraded read/write heads
  • electronic instabilities (PCB)
  • firmware exceptions
  • platter degradation

Physical degradation of the drive (heads and/or platters) is by far the most common reason for appearance of bad sectors on modern drives.

Unfortunately there isn’t any way to determine the underlying problem with software tools. Bad sectors will all be processed by software tools exactly the same way, regardless of what’s actually causing them to appear. It can sometimes be impossible to diagnose the cause even with a full lab of professional equipment.

No matter the underlying reason, once a drive develops bad sectors, it will only get worse the more it is used. Our main goal in working with these drives is to save the data to a healthy drive before the original drive suffers a more serious failure, such as a complete crash of the read/write heads (most common), severe platter damage, heavy firmware corruption, or electronic failure of the PCB.

Looking at SMART logs, or listening for odd noises coming from the drive are poor predictors of drive instability. There isn’t any reliable way to determine whether a particular drive has read instability problems prior to reading its entire surface, so if we haven’t done that yet then we have to assume that the drive we are working with could be unstable.

Now we are ready to talk about commonly used software data recovery procedures.

One of the most widely used recovery procedures: Connecting the drive to Windows or OS X through a USB drive bay and trying to copy files.

 

Pros Cons
  • Simplicity
  • Low time expenditure for the technician performing the recovery
  • Lowest success rate for data recovery
  • Carries many risks for the drive and customer’s data

 
Any experienced technician will know not to do this, however there are still countless IT service providers who perform recoveries this way on a daily basis. As soon as the drive is connected to Windows or OS X, its file system will be automatically mounted by the operating system (OS). This means that the OS will read a lot of highly fragmented file system information from this drive, which results in intense drive processing, quickly causing further drive degradation and exacerbating read instability problems. Worst of all, mounting processes write to the drive, causing permanent loss of some customer data.

What happens during the drive mounting process?

Last year we did a hardware trace of the exact file system mounting processes used by Windows 7 and OS X Yosemite. Here is how Windows 7 does it:

  1. Windows 7 begins its mounting process by reading the Master Boot Record (MBR) of the drive 9 times in a row.
  2. Then it will read the boot sector and write it back to the drive with one small change: it flips the ‘dirty’ bit, indicating that the mounting process has started, but did not finish yet.
  3. If successful up to this point, it will begin reading the Master File Table (MFT) section of the file system in blocks of 128 sectors, while simultaneously sending occasional write commands to update minor logs.
    • If a drive fails to read a block within the MFT, Windows will automatically try to read the same block again, and again, and again, up to 9 times in a row.
    • If all of those attempts do not work either, Windows will break down the problematic 128-sector block into smaller blocks equivalent to the cluster size being used by the file system (typically 8 sectors or 4KB). This same 128-sector block, which has already failed to read 9 times, will then be attempted 8 sectors at a time.
    • When any of those smaller 8-sector blocks fail to read, Windows will also try them 9 times each.
  4. If all of those attempts in Step 2 fail yet again then Windows will simply give up, reset the drive, and automatically restart the whole mounting process from scratch. (If allowed, Windows will restart the mounting process as many times as it takes for the drive to stop identifying altogether, so it should never be allowed to do this!)
  5. If Windows never manages to mount the drive, the ‘dirty’ bit in the boot sector will remain in place, which indicates to Windows that this volume has serious issues, prompting it to ask for a “check disk” run. (Do not allow Windows to run check disk on a drive that was sent in for data recovery. Check disk is not a data recovery tool. It was designed to repair the file system to get it into a state where it is usable by Windows. There are many situations where a check disk run will cause substantial harm by erasing partially corrupted file system entries. These entries could be parsed with specialized logical recovery software like R-Studio for better data recovery results, so losing them to check disk is not a good idea.)
  6. If Windows does successfully mount the drive, it will read the boot sector and write it back to the drive to flip the ‘dirty’ bit back. Every time the boot sector is altered, there is a risk it will become corrupted, which would complicate the recovery, since it’s an important part of file system metadata.

Even after the mounting process is complete, Windows will continue writing to the drive to update various logs. For example, every time a file is modified, Windows will update its attributes, which involves writing to the MFT. Naturally, all of these write commands are overwriting old data, causing permanent data loss and additional unnecessary drive degradation.

A little-known feature of Windows is that it automatically ‘cleans up’ any file system entries that it does not understand on mounted devices. In other words, partially corrupted MFT entries that Windows encounters on the mounted patient drive could very well be deleted without a single user prompt or notification, once again causing permanent data loss and unnecessary drive degradation. Even partially corrupted MFT entries can still be very useful when parsed with software that is designed for the task.

What about OS X Yosemite?

OS X Yosemite doesn’t try quite as hard as Windows 7 to achieve a successful mount. It does perform very similar reading and writing operations during the mounting process.

OS X Yosemite doesn’t try quite as hard as Windows 7 to achieve a successful mount. It does perform very similar reading and writing operations during the mounting process.

If OS X finds a block that it cannot read within a critical section of the file system (for example, within the root directory structure), then it will try to read it 5 times, at which point it will give up and stop trying to mount the drive altogether. Such a drive will remain invisible to the OS, but it would still be powered on and idling, which can be a cause for concern because all modern drives start damaging self-scan procedures after only 20 to 30 seconds of idling.

If the critical structures all read successfully and the less critical ones (such as catalog entries for specific user files) do not, OS X will try the less critical blocks up to 10 times before forgetting about them and mounting the drive without the data in those blocks. In this case, OS X will see and be able to work with the drive, but some files will be missing. The user will not be notified of this issue.

All mounting procedures in both Windows and OS X happen as soon as we connect the drive, and before we have a chance to even try to recover any data! The file system mounting process can easily fail due to a single bad sector, or slight corruption of critical file system elements, leaving the drive unrecoverable with this method. Ideally Windows or OS OX should never be allowed to mount drives sent in for data recovery. Given that the only upside of this approach is its simplicity, we would strongly urge against using it on a customer’s drive.

Let’s look at another recovery method: Connecting the drive to Windows or OS X and running data recovery software.

 

Pros Cons
  • Can still try to recover the drive if it’s file system fails to mount in the OS.
  • Carries the aforementioned risks of data loss and drive degradation associated with file system mounting processes
  • File system scan results are not saved anywhere besides RAM, so if the recovery is not an immediate success then we have wasted our time and caused unnecessary drive degradation

 

As soon as the drive is connected to Windows or OS X, its file system will be mounted, which carries all the risks we previously talked about. If you ever find yourself in a situation where you must connect a customer’s drive directly to Windows or OS X, there is a way to prevent the OS from mounting the drive, reducing the risk of this data recovery approach. As we mentioned previously, both Windows 7 and OS X begin their mounting processes by reading the Master Boot Record (MBR) located on the first sector of the drive (LBA 0). If the MBR appears to be corrupted, both OSs will not try to mount the drive. In OS X, a message saying that the drive is not readable will come up, and Windows will ask for such a drive to be formatted. (Do not, under any circumstances, format a drive sent in for data recovery!)

Thankfully, with one minor change, we can make the MBR appear to be corrupted to a standard OS, without actually harming its contents:

  1. Connect the drive to an OS which does not automatically mount the file system on every drive, like many Linux/Unix variants.
  2. Use a hex editor to change the last byte of the MBR (sector 0) from “AA” to anything else, for example “BB”.

Alternatively, our free, bootable drive testing tool can also make this change to deactivate the MBR. It’s an image file which will have to be restored to a USB stick. This USB stick will then become bootable and the MBR can be deactivated by pressing “d” from within the software.

This change involves writing one sector to the drive, which is not ideal, but it’s many times less harmful than allowing the drive to be mounted by a standard OS. This change won’t have any impact on the results of data recovery software – the outcome will be exactly the same either way because nothing of value is lost by corrupting the last byte of the MBR.

How does data recovery software work?

All common Windows and OS X data recovery software is essentially built for solving logical corruption issues, which is why it’s often called “logical recovery software”. Such tools use their own proprietary algorithms to parse file system metadata, instead of relying on the results of the standard OS. These tools will:

  • assemble missing file system metadata from copies,
  • make educated guesses on partially corrupted entries,
  • show deleted or orphaned files,
  • find files without file system records by looking for the hex signatures of their headers.

It is important to note that this kind of software is not at all designed to deal with hardware read instabilities, so if the drive has bad sectors, it will be quite inefficient. Such tools will typically read the data using large read block sizes consisting of thousands of sectors. This is a problem because a hard drive will only successfully process a read command if it can read every single sector that was requested.

In other words, if the software asks for 4096 sectors, and there is just 1 bad sector within that block, the entire read command will fail, and we will effectively lose 4096 sectors of data, instead of just the 1 bad sector. Naturally this means that fewer integral files will be recoverable. Large read block sizes are used because logical recovery software tools have a lengthy overhead for processing individual commands, so their speed would suffer greatly if they were to switch to a small block size.

How do logical recovery software tools work?

Different logical recovery software tools have different capabilities, but in general, there are three distinct modes of operation:

  • Filesystem repair: Recovery by patching metadata.
    The software looks for corrupted file system metadata and then overwrites it from a copy to make the drive mountable by an OS. This method should not be used for data recovery purposes because it involves writing to the original patient drive, which can easily cause further drive degradation and data loss. If the software makes an inappropriate choice, it will permanently erase some corrupted file system elements that could have been useful if they were parsed using a different method.
  • Quick scan: Virtual rebuild of the file system by parsing only recognizable file system elements.
    The software reads only recognizable file system elements and tries to rebuild the file system in RAM so that it can display a file tree and allow saving the files to another location. In this mode, the software does not make any changes to the patient drive and it only reads a small section of the drive.
  • Full scan: Virtual rebuild of file system and raw recovery by parsing the entire drive.
    The software reads the entire drive to look for anything it can find. It will find all file system metadata on every sector that it can read and then use it to build a more accurate file tree. It will also do raw recovery, which means it will look for files without file system entries by recognizing particular hex signatures in their headers to determine their start, and making an educated guess on where they end. All file names are located within file system metadata, so files recovered in this manner will not have proper names, and will often be corrupted, since raw recovery involves a lot of guessing.

In short, file system repair should not be done on drives sent in for data recovery. A full scan will take much longer to process and cause a lot more drive degradation as a result, but it will find more files. A quick scan will be much faster and lighter on the drive, but it could leave files behind.

The biggest problem with running logical recovery scans directly on a patient drive is that they don’t save what they read anywhere besides RAM. As we mentioned previously, there isn’t any reliable way to determine whether a drive is unstable prior to reading the entire surface. If we just had a drive come in for data recovery, it’s condition is unknown. It could be just fine, or it could be riddled with bad sectors. If it’s the latter and we start to run logical recovery scans directly on the patient drive, there is a good chance that the attempt will be a failure, which means that we’ve wasted time, caused additional drive degradation, and we have nothing to show for it.

A better recovery method: Connecting the drive to a Unix/Linux OS, imaging it with GNU ddrescue, and then scanning the image with logical recovery software.

 

Pros Cons
  • Doesn’t require the drive to go through the file system mounting processes of Windows/OS X
  • Features designed to jump past bad areas leave the most damaging recovery processes until the end
  • After obtaining the image, many logical recovery scans can be executed as needed without further damage to the customer’s drive
  • Improved success rates
  • Lack of targeted imaging means a lot of time spent recovering unnecessary data
  • Features designed to jump past bad areas sometimes leave the most critical data behind
  • Every imaging phase after the first is extremely harsh on the drive and can easily cause further damage

 

Ddrescue is a Unix/Linux-based software imaging tool designed to work with unstable drives. Unix/Linux OSs generally will not automatically mount the file system of every connected drive, which is a major improvement over previous methods.

The biggest advantage of this method is that it allows us to back up everything we recover to an image file on a healthy drive. Once we have an image file, we can use different logical recovery tools to do full scans on the image to get the best logical recovery result without subjecting the patient drive to additional stress with every scan. In contrast, if we ran logical recovery scans directly on the patient drive, we would not be saving the results anywhere besides RAM.

While ddrescue is designed for data recovery purposes, it’s still a software tool, and like any other software, it must work through many layers of generic hardware and system software (BIOS/OS). All of these layers are not designed for data recovery and do not allow software tools the necessary level of control over the drive to perform any kind of read instability handling procedures. At the end of the day, ddrescue will still send only standard read commands to the drive and wait indefinitely for a response, like any other software tool.

Click “Continue Reading” on the right for the next page.


Next page

  • Jeff Swallows says:

    Great stuff, I learned some new details.
    Thanks

  • Larry Sabo says:

    Excellent write-up!

  • >