If you’re a person that worked on a network in the past, you know how difficult it can be to make traffic flow the way that you want it to flow. Policies can impact your routes. Traffic doesn’t always enter and exit like you want. Little changes can have big impacts elsewhere or in systems outside of you control. Which typically leads to the magical solution of using a network tunnel to deliver traffic to the right destination.
Tunnels are what I have called “the duct tape of networking”. They are usually the first answer that people come up with when trying to solve complicated routing problems. They’re also usually the worst solution when it comes to just about any non-VPN traffic problem. Why’s that you might ask? Well, thanks to some of the bright folks from 128 Technology at Networking Field Day earlier this year, you can see some of the reasons that tunnels aren’t optimal traffic vehicles.
Problematic Pipelines
Tunnels have a myriad of issues when you look into the way they behave but it generally breaks down into two big categories: homogenous traffic and overhead. Either problem on their own would usually be enough to warrant not using them. But you almost always get both issues when you choose to use them.
Homogenous traffic refers to the way that all traffic in a tunnel looks the same no matter what. For the endpoints, that can mean some big problems. For one thing, traffic being routed over a tunnel from Point A to Point B all traverses the same tunnel. Which means all traffic flows into one endpoint, takes the same path, and arrives at the other endpoint. There’s no option to send packets down multiple paths. There is no possibility of utilizing other links to increase efficiency.
In addition, the traffic that enters the tunnel also can’t have polices applied to it on a hop-by-hop basis. This is great if you have traffic you want to ensure doesn’t get touched in transit. But it also means that traffic in a tunnel is immune from congestion management like QoS. Whatever goes in one end is going to come out the other end exactly the same way. And if there is delay-sensitive traffic like voice or video loaded into the same tunnel it’s going to be getting the same priority no matter what. Which means all the carefully constructed traffic management policies you’ve been spending years building are gone in a flash.
So Over The Overhead
Overhead is another big part of tunnel troubles. In this case, overhead refers to the management of the tunnel infrastructure absent of the content of the packets flowing down the wires.
For example, the amount of bandwidth stolen by the overhead of the typical tunnel is about 30%. Which means that when you start sending traffic down the tunnel you’re going to lose about 30% of the connection bandwidth for things like keep alive packets and such. That means you’re going to have less available space to launch those packets down the wire. In addition, each of the packets being sent down the wire is slightly smaller to account for the overhead of the tunnel. Most routers require you to shrink the MTU of the packet being sent to ensure the tunnel header can fit into the packet size limitations. Which leads to more packets being transmitted at smaller sizes. That means more data packets are fragmented.
Add in the fact that tunnels often need to be “nailed up” by sending initial traffic and you pay a penalty for retransmitted packets when they’re lost in tunnel instantiation. Also, if you’re using IPSec tunnels you’re paying a tax to have that traffic encrypted as it enters the tunnel, which leads to more issues.
Trashing The Tunnels with 128 Technology
How does 128 Technology do SD-WAN without tunnels? As explained in the above video, they are taking an innovate approach to their services. It still utilizes SD-WAN concepts, such as having a centralized control system to instantiate policy, while also combining some more traditional routing methods. Things like payload encryption instead of header encryption and utilizing things like NAT and PAT to help identify flows are just some of the ways that 128 Technology is trying to avoid using tunnels.
Basing their technology advantage on a simple concept – “no tunnels” means they have the freedom to try and figure out how to use existing technology to meet their service needs. There is no sense in reinventing the wheel for every new technology that comes along. Rather, by finding a method that works with existing technology and applying that across all of the traffic that is pushed across their mesh networks, 128 Technology can ensure that their users have peak performance and simplicity without the need to use tunnels.
Bringing It All Together
I’ve never been a fan of tunnels. They’re messy and complicated and cause way more problems than they fix. But, unfortunately, they’re also an all-too-convenient tool that people turn to to fix very specific problems. And SD-WAN is a prime place to use them for way more than they were intended to solve. Instead, I like seeing how 128 Technology is doing more with other protocols and ideas to create tunnel-free SD-WAN. It’s a great way to tackle the problem and provide a novel solution before you get locked in to a tunnel mess.
For more information about 128 Technology, make sure you visit their website at http://128Technology.com. If you’re interested in seeing more of 128 Technology and their unique SD-WAN solution, make sure you check up the upcoming Networking Field Day Exclusive with 128 Technology on July 23, 2019.
Looking forward to see some of the content from the tech field day! Great article, and an interesting approach. Also, generally the DSCP values get copied into the IPsec header which means that tunneled traffic can get the same preferential treatment for QoS. Sure, you can’t classify and mark encrypted traffic mid-way, but you can classify, mark prior to it hitting the tunnel and then queue, police, etc. end to end for each hop the tunneled traffic hits.
In order to utilize multiple circuits you just build a tunnel per-circuit and mp bgp-load balance or per-packet load balance across them. (SD-WAN standard behavior)
Agree on the tunnel overhead even though bandwidth is less of a constraint nowadays. The constraint is in the vendors lack of interest/ability to utilize FPGA/ hardware based offloading that would allow us to push speeds > 10gbps encryption. This especially rings true in the virtual router space. Why can’t I launch my virtual router on an f-series ec2 instance?
I hope that someday providers will bump up the MTU sizes and enable jumbo over the internet.
Keep writing. Love your stuff!