Do you store data recoveries on a server?

wolfson

Member
Reaction score
10
Location
Seattle WA
Currently, we have a bunch of individual, bare drives to which we run data recoveries, then we just make a note for a job as to where the recovery is stored. This approach works pretty well for us, except for a few issues:
  • Drives will fill up and we can't tell that they're full until connecting them to a booted OS (which then requires the technician to find a drive with sufficient space, slowing everything down)
  • Drives get old, begin to subtly fail, and people don't notice
  • Naming conventions for drives are hard to maintain
  • We end up with many stacks of drives setup in somewhat haphazard manners
So we're considering using a server on which to store our recovered data instead -- probably in a JABOD arrangement, to avoid the potential dangers of RAID and drive failures. But we're not sure how to approach using a server for this, as we'd have to consider the cost of increased bandwidth throughout our network, and potential network issues compromising the integrity of recoveries.

Do you use a server to store your data recoveries? If so, how do you implement it? And are you satisfied with the arrangement?
 
Last edited:
Why a server? Why not a NAS? Why are you keeping them? There are legal risks to keeping end user data, especially if medical or financial data is stored.

I have one machine I use for data recovery. It could be connected to an isolated network branch thus avoiding bandwidth concerns and virus issues.
 
One word - FreeNAS.

Not sure why you think that JBOD is more reliable than RAID. In a JBOD if a drive goes down the data's gone. If you have some kind of striping setup, like RAID 5, you have much better protection.

Back to FreeNAS. You can build a nice NAS using a good sized DT case. Don't need real RAID cards, just regular SATA ports. And the drives do not even need to be the same size.
 
Why a server? Why not a NAS? Why are you keeping them? There are legal risks to keeping end user data, especially if medical or financial data is stored.
Sorry, to clarify I guess I meant more of a NAS-type setup. It would be a server that only really dealt with data storage, and accessible only from the local network.

We only keep data temporarily, just long enough to put on a target drive for clients and cover our tracks for a few weeks if they need warranty work.

One word - FreeNAS.

Not sure why you think that JBOD is more reliable than RAID. In a JBOD if a drive goes down the data's gone. If you have some kind of striping setup, like RAID 5, you have much better protection.

Thanks, I'll definitely check out FreeNAS. My only concern with RAID is that one drive in the array might fail slowly and mess up the parity? I haven't got that much experience with it so I'm just trying to be cautious.
 
What nline and Mark said. If you are going to image/clone to individual drives, you should be zero-ing the drives beforehand to prevent mixing residual data with recovered data for those LBAs that were unreadable. If you are doing recoveries, you should be doing sector-by-sector imaging/cloning (not just the readable data), so you can do the data recovery using the clone.
 
Thanks, I'll definitely check out FreeNAS. My only concern with RAID is that one drive in the array might fail slowly and mess up the parity? I haven't got that much experience with it so I'm just trying to be cautious.

That's not a concern so to speak. As long as you pay attention to the box. Real NAS servers will have indicator lights if a drive starts having problems. On my FreeNAS box, in addition to emailed notifications, the system starts beeping as well as flashing a light on the drive itself when bad blocks start popping up.

Screen Shot 2016-07-10 at 9.32.06 PM.png

But it does take some work to run it properly. While the setup and running is simple, somethings are not so simple. Like replacing a drive is a bit more than a mouse click or two. But it's an awesome system. iXsystems.com, the owner and operator of FreeNAS, also sells commercial solutions as well.
 
With a few exceptions, i always clone drive to drive and hold onto the clone for 2 weeks. After the project is officially closed, the clone is erased and tested before being added back into circulation.

It would be a seriously huge NAS or server need to hold a week's worth of recoveries, let alone anything more than that.

As you start to offer data recovery, you cannot be cheap and will soon realize that there is a reason why it isn't as profitable as you first thought.
 
My only concern with RAID is that one drive in the array might fail slowly and mess up the parity? I haven't got that much experience with it so I'm just trying to be cautious.

You've seen how often hard drives can fail. Say with 3 Drives, with JBOD you are 3x more likely to have a disk randomly fail and thus take down your server. I'm going to assume we're talking about RAID 5. With RAID 5, the risk is in the rebuild. One disk can fail and you're still going, but you have to make sure you have a verified backup for your server before you stick in a new disk and have it rebuild. What commonly happens (and what Luke probably sees all the time) is that the array is degraded, so a technician just pops in a new disk and lets it rebuild, but then the rebuild fails and it turns out there is not a current or functioning backup.
 
Typically, but you're probably right about the possibility of bit rot or corruption in the parity data. That's where the next generation of file systems come in (refs, btrfs, zfs)

For example FreeNAS uses ZFS and can detect and potentially repair bit rot. FYI, best practice would be to build your own FreeBSD fileserver than than use FreeNAS

If you would rather use Linux, you should look at btrfs instead.
 
Last edited:
Any particular reason to use FreeBSD over FreeNAS or some Linux distro?

FreeNAS is based on FreeBSD. It would be best practice to build your own ZFS fileserver using FreeBSD because you would actually have to have some knowledge of how it works in order to use it. FreeNAS is designed to make setup easier, but the problem with that is that you end up not really knowing much about how it works, so you have no troubleshooting foundation when something goes wrong. Also, when there is a security update to FreeBSD, it will take longer before it is available in FreeNAS.

But just because it is best practice, doesn't mean that's the way you should do it in your situation. For example, there's probably no sysadmin to manage the server, so no one probably would update the server before FreeNAS rolled out the update anyway. And when you are no longer working for the company, no one is going to know how the fileserver works anyway. (but you should make sure you document everything you set up)

As for FreeBSD vs Linux, FreeBSD has better support for ZFS than Linux (Ubuntu has added support for ZFS, but it will probably never be added to the Linux kernel). ZFS has been around longer than BTRFS. However, BTRFS has been stable for a while now, and I think it will become preferred over ZFS.
 
Here's what I do as far as the OP's question is concerned. Mind you this is doing full time data recovery, and only that.

In my PC-3000 workstation I have a RAID 6 made up of 8X 4TB SAS HGST drives (so 24 effective TB). For smaller drives (500GB or less) I just image the data to image files on the RAID which is convenient because when we are done with the project it's just a matter of erasing the file, no drive wiping required. As the RAID gets full, we just go in and delete the oldest cases. Generally anything older than a month gets wiped, if we are busy and have a lot of cases we may delete anything older than two weeks since it was closed out.

For larger drives, we generally still use individual drives so as not to overstuff the RAID but we have to wipe those after each case so as not to run the risk of accidentally recovering leftover data from a previous case. We use two systems to keep track of which drive is for which case. First we write the case number on the destination drive using wet erase marker, then we keep it in a bin together with the client drive when it's not being actively worked on. As soon as the case is closed out it goes into a wipe bin. Every week we start a new bin of drives to wipe, and we only wipe them after a minimum of two weeks has passed since the case closed out. That way the customer has time to review and backup the recovered data before we wipe our working copy.

For very small recoveries, we will sometimes dump the data onto a Synology NAS share at the end and give the customer login details to download the data. For all other recoveries we always put the data onto an external hard drive and give/mail it to the customer. Sometimes customers complain about the fact that we won't put 250GB of data onto our NAS or server for them to download, but the fact is we deal with huge amounts of data and it's simply not feasible to store it for any prolonged period of time. Even just moving it across a GiB network can take forever. Just last year we added to our paperwork conditions which require them to pay data storage charges if they don't collect their recovered data within 30 days. As it is I've easily got 10Tb of uncollected cases just sitting here tieing up drives of mine.
 
Last edited:
Thanks for all the info! I'm going to play around with FreeBSD and FreeNAS and see how that works for us.

@DataMedics we end up with a lot of drives that aren't failing too badly and we can do file recovery without imaging the drives, which means that we'll use multiple workstations for recovery simultaneously so long as it doesn't require specialized hardware. The idea is to get a system that can be accessed from many stations simultaneously. Thanks for the info on your setup though, I'll take that into account.
 
Thanks for all the info! I'm going to play around with FreeBSD and FreeNAS and see how that works for us.

@DataMedics we end up with a lot of drives that aren't failing too badly and we can do file recovery without imaging the drives, which means that we'll use multiple workstations for recovery simultaneously so long as it doesn't require specialized hardware. The idea is to get a system that can be accessed from many stations simultaneously. Thanks for the info on your setup though, I'll take that into account.
This is very bad practice and a huge risk to your client's data. You should never attempt file recovery without first imaging the drive, preferably with proper data recovery hardware imager or, at the very least, ddrescue.
 
Thanks for all the info! I'm going to play around with FreeBSD and FreeNAS and see how that works for us.

@DataMedics we end up with a lot of drives that aren't failing too badly and we can do file recovery without imaging the drives, which means that we'll use multiple workstations for recovery simultaneously so long as it doesn't require specialized hardware. The idea is to get a system that can be accessed from many stations simultaneously. Thanks for the info on your setup though, I'll take that into account.

Definitely a bad idea. Should always work from a copy, even if you think the drive is fine.
 
Back
Top