WAN Failover musings

HCHTech

Well-Known Member
Reaction score
4,367
Location
Pittsburgh, PA - USA
Most of my client are small enough that this topic almost never comes up. I do have a handful of larger clients, though so I've got 3 or 4 setups with dual WAN connections. In the past there was always the "good" one, i.e. Cable or Fiber, and the "bad" one, i.e. slow DSL. In this situation I've always setup just basic failover. Connection 2 just sits there doing nothing until Connection 1 goes down. Then the failover happens and workers get slow internet instead of being down.

Now that faster connections are the norm, it is more feasible for the backup WAN to be actually usable. I've got one client next week that is upgrading from Comcast business (150/25 if I recall) to FIOS business at 1Gb. They are keeping the Comcast connection to use as a backup WAN.

Because the Comcast connection is "fast" (at least compared to DSL), I don't think letting the connection sit idle is the best use of that expenditure.

In the Sonicwall universe where I live, the load balancing options are:

Basic Failover - this is what I've always used when the 2nd internet is slow and "emergency only"
Round Robin - Connections go out both WANs, either on a random basis or an alternating basis
Spillover - After a bandwidth threshold you set is used up on the primary, connections go out the secondary
Ratio - You choose the ratio. e.g. 75% of connections go to primary WAN, 25% go to secondary.

So this has me wondering about real-world pros and cons of the different choices. Which choice gives the best overall experience to everyone?
  • Round Robin seems wrong unless you had two identical-speed WANs. The speed any user gets for a task is based on luck of the draw, not need
  • Spillover would seem to give the best chance of the fastest connection available, but what threshold to choose? We all know that a 1Gb connection doesn't always mean 1Gb. So the failover threshold should really be "% of available", but you don't get that option. You have to choose a bandwidth number. Somehow.
  • Ratio sounds like the solution to the Spillover problem, but it places the speed of any individual connection at "luck of the draw", just like Round Robin.
What are your thoughts about this choice?

Finally, what about external services that need a fixed IP on your network. How do you configure them to still work when you are in failover (and have a different public IP)? In the old days, say you had an Exchange server. Would you just create an additional lower-priority MX record for the static IP of the failover connection? These days, I'm thinking about VOIP phone systems or VPN tunnels. I would guess you would need to use a dynamic DNS entry in the firewall and have the service use that instead of a static IP, but how long does it take for a DDNS address to update after the failover. Is this practical? Am I missing something?
 
In the Sonicwall universe where I live

Without going too far off topic - I'm frustrated with Sonicwall. I had a commercial customer's Internet go down this week because the DNS server specified as primary in their Sonicwall went down. It did not fail over to the secondary DNS. (I'm not a Sonicwall guru.) I changed DNS servers in the menu tab for DNS servers on the Sonicwall and no joy. Then changed DNS in WAN area of networking. No joy. I had to get a hold of the company that set it up to find out there was a third buried setting to change the DNS before things finally worked again. Left me with a bad taste for Sonicwall. Why wouldn't the DNS settings that it uses be under the DNS tab? Hours of wasted time as after changing the Sonicwall DNS I thought Charter's modem was the issue instead of the Sonicwall.
 
This is a feature I think Untangle excels in. Untangle has 2x modules that deals with this. WAN Failover. And..WAN Balancer.

WAN failover lets you take your WAN connections...and as is probably pretty obvious...tell Untangle to shift all traffic over to one, if the other is down. And you can put in your rules for how it defines when a WAN link is down. Typically pings to IP addresses that you designate (such as 8.8.8.8 or whatever).

WAN balancer can compliment the failover module, and let you get granular with how Untangle spreads the load. 95% of this one. 5% on that one. Or 50/50. Or..heck..create whatever percentage you want. You can also have Untangle always take specific traffic (based on many different criteria that you fancy)...and put it out one WAN or the other (like...in case you have an on-prem Exchange server and you really want outbound to always come from a specific pubby IP...instead of bouncing around).

The failover is very reliable and fast.
 
...and just for fun, they change the interface once in a while and move things around to keep you on your toes. This is why I appreciate having standardized on Sonicwall. Once you've setup a few dozen, you know where things are and pretty-much how they work. I still call support once in a while if I get a thorny issue.

Put me in front of a Cisco, though, and I'm useless. :confused:
 
Untangle's WAN Balancer is a ratio based system. It allocates IP ADDRESSES NOT SESSIONS. That's important... because everything from a given internal IP will go to a single WAN.

WAN Failover just takes care of figuring out if a WAN is up or down. A WAN configured in balancer to be used 0% of the time, will get 100% of the traffic if it's the only WAN left online. Unless NAT Policies say otherwise... and for mail servers they really need to.

Untangle's UI has evolved... but only once so drastically to warrant the above concerns... but it's also substantially more expensive than Sonicwall as a solution, so there's that. I for one can't tolerate Sonicwall because they literally go up in smoke in my presence... no joke Rob walks in, Sonic wall lets out its magic smoke... all I have to do is look at the things! It's a gift!

Sonicwall has changed ownership twice in that same interval, and I've lost count of the UI changes.

But the above probably tells you what you want to know in terms of your configuration. You want to use the Ratio mode, and spread things out as you can.

Oh, and as for external services don't think so hard. Each router has a static address, each of those addresses needs to be in DNS for the world to know about it, and each router forwards stuff to the service. How the service actually manages that public shifting is up to the service in question. For Exchange you have dedicated connectors for each WAN, I like having dedicated internal IP addresses too... makes the routing rules and such easier to read.
 
Untangle's WAN Balancer is a ratio based system. It allocates IP ADDRESSES NOT SESSIONS. That's important... because everything from a given internal IP will go to a single WAN.

That which WAN that that IPs traffic uses is chosen by the firewall based on an algorithm that do you do not control, which is why I'm hesitant to choose this model. If someone is downloading ISOs for some reason, I'd like them on the faster of the two connections.

I may be overthinking things, that's a general fault of mine, I'll admit. The answer is probably just to choose one and get on with it. Measure performance and try something else if there are problems.

Oh, and as for external services don't think so hard. Each router has a static address, each of those addresses needs to be in DNS for the world to know about it, and each router forwards stuff to the service. How the service actually manages that public shifting is up to the service in question.

I've spoken to the VOiP vendor since my post, and in this case they are saying they need a single static IP from their end - no DNS involved. Their solution for WAN problems is to send calls to cell phones if the client's internet goes down. I didn't accept that as a resolution, so I'm waiting on a callback from level 2 or engineering or whatever department knows the real answer on that.

In the interim, I can easily put a NAT policy in place to force traffic from their phone box to a specific WAN, which would override any load-balancing or failover. Which would also mean phones go down if that WAN goes down. It's the same situation they have been living with for years with a single WAN so that will be acceptable until we find a better solution.
 
If you want to use multi-wan you're going to have to deal with the fact that some systems will get slower results than others based on which WAN is utilized. There's no magic way to separate out the HTTP/HTTPs traffic that's going for a download from the rest.

If all you want is the 2nd WAN to act in an emergency then configure it for spillover, and run speed tests to determine the WAN speed. All of these systems make use of QoS to work, and QoS requires a bandwidth threshold to work.
 
If you want to use multi-wan you're going to have to deal with the fact that some systems will get slower results than others based on which WAN is utilized. There's no magic way to separate out the HTTP/HTTPs traffic that's going for a download from the rest.

Thanks - The client will follow my lead, it's just my own uncomfortableness with leaving that 150Mb pipe unutilized that has me arguing against a pure failover setup. With the upgrade to a 1Gb synchronous they are getting a way-fatter connection than before anyway, so I think where I'm landing on this is to separate out the phone traffic only to use the Comcast connection, and let everything else share the FIOS and be done. Setup a pure failover and get on with things. I don't want employees running speed tests and opening tickets because the needle isn't where they think it should be.
 
...and just for fun, they change the interface once in a while and move things around to keep you on your toes. This is why I appreciate having standardized on Sonicwall. Once you've setup a few dozen, you know where things are and pretty-much how they work. I still call support once in a while if I get a thorny issue.

TBH...I think Untangle has kept things pretty similar over the...14 or so years we've been doing it. The main dashboard changes...new features get added, but...when it comes time to roll up your sleeves and get to some certain core settings...those are all pretty much in the same places they were over a decade ago.

And yeah...key to any UTM is knowing it well, and you usually get most of that through volume over time.
 
TBH...I think Untangle has kept things pretty similar over the...14 or so years we've been doing it. The main dashboard changes...new features get added, but...when it comes time to roll up your sleeves and get to some certain core settings...those are all pretty much in the same places they were over a decade ago.

And yeah...key to any UTM is knowing it well, and you usually get most of that through volume over time.

There was a "modernization" of the interface on Sonicwalls with the v6 firmwares, but they retained (and still retain) a link to "classic" mode so you can go back to the old interface. It's been over a year now, but when you call in for tech support and they are looking at your settings, the very first thing they do is switch back to classic mode - haha. You would think there would be a company mandate to use the new interface when providing support, but I guess not. :p
 
Back
Top