Do you have a Prime Day set up with Amazon? It’s a way for Amazon to queue your packages to arrive on one single day each week. There are some obvious benefits to you, such as your packages all arriving at a time when you’re home. But there are also huge advantages to Amazon as well.
Since Amazon has started running their own delivery drivers and vehicles, they’ve seen the need to optimize their routes and patterns. Having a driver running everywhere all day long is inefficient. But if you knew that you had to include a house on a certain day you can write software that groups those houses or businesses together to provide cost savings and a more consistent driver delivery rate.
Consistency is critical for delivery of things, whether they be toaster ovens or networking packets. The more consistent your delivery of data the more consistent your application can perform for your users. But packets aren’t consistent. By their very nature, they like to arrive in bursts. Delay is bad when it comes to packet delivery, but variation in that delay is even worse. The term for the variation in delivery delay is jitter.
In the world of unified communications, jitter is the enemy. Delay can be dealt with, as you might see or hear anytime someone is interviewed on the TV or radio via satellite link. Consistent delay can be planned for and dealt with. Jitter causes variations in the delay that make delivery unpredictable. For applications that rely on a seamless experience, such as voice and video, that variation causes massive problems.
Removing jitter from a network is a primary goal. But sometimes the methods to do that are counter-intuitive. One of the ones that I’m most familiar with is the QoS mechanism of Weighted Random Early Detection (WRED). WRED queues packets for transmitting and applies some logic to the queue. It looks for packets that have a higher probability of being dropped thanks to tagging policies. When there is congestion that could affect packet delivery, the WRED algorithm starts dropping those packets to bring delivery under the congestion threshold. The packets are retransmitted because of TCP reliable delivery mechanisms and arrive at their destination a few milliseconds later than they would have normally.
How does WRED enhance delivery if it’s dropping packets? The key comes from keeping the transmit queue almost full for every interval without causing it to back up. Think about having someone in front of you walking more slowly than you are walking. If you can walk around them, you really have two options. You can either walk your normal speed until you almost run them over and then stop until you can walk again. Or you can slow down just enough to stay behind them without running them over. Both solutions get you to your destination at the same time, but the second solution is much more consistent. WRED increases consistency because it slows things down just enough to be more efficient. The side effect of this mechanism is that it avoids the start-stop behavior of TCP called global synchronization which can really slow things down over time.
Managing Performance At The Source
WRED is a great workaround from a networking perspective. But it does require a lot of configuration and heavy lifting from your networking team. It also doesn’t work well in places where you don’t have control over your networking gear or where your provider doesn’t honor your QoS configuration. Public cloud is one place where this immediately starts coming up. You need a mechanism that works much better closer to the source of the traffic in order to ensure the kind of consistency that you want.
Intel is a huge name in the Ethernet market. They are doing a lot with innovations to enhance the way that servers and applications perform. As we create applications that are more and more reliant on the network to do their functions we need to ensure that their performance isn’t predicated on how the network handles their behavior. To jump back to the analogy above, we need the person to decide when to slow their walking pace rather than having someone telling them to slow down behind the person they’re about to run over. They need to be aware of what’s going on.
Intel has been doing a lot of work on its 800 Series Ethernet adapters to bring networking up to speed with the way IT and applications work in a modern world. One of those enhancements is the Application Device Queue (ADQ). Here’s a video discussing it in more detail:
ADQ is a great way to introduce WRED-style QoS consistency long before the packets get jammed up at the transmit queue. Remember how I said that Amazon can use consistent delivery schedules to build better driver routes? The same thing goes for consistent packet streams and data transfers. If I’m building an edge router to handle packet transmits, I have to plan for the peak amount of traffic that can go through the interface. If the peak only happens once a year, I’m wasting capacity the rest of the time. If I plan for less than the peak with no other mechanisms in place the packets can queue up and slow down application performance. Users don’t care about QoS or packet shaping. If their application isn’t working they’re going to be mad!
Intel ADQ, on the other hand, allows us to configure things so that packets are delivered in an orderly and consistent manner. Much like WRED, that consistency allows us to properly size the transmit equipment and keep it running at peak utilization at all times instead of having it running at 100% with queues or at 50% here and there. That kind of consistency means that we can plan for expansion and have an idea of the impact of our business growing over time instead of waking up one morning and realizing our data transfer bill has tripled because we didn’t plan for how much we would be transferring.
Intel’s ADQ is a great tool for developers to build around when they’re writing new applications. Rather than relying on the networking team to put arcane QoS in place to make applications perform more consistently, the developers can take matters into their own hands and make their software perform better for users. The side effect is that a network that has less logic built into it can operate more reliably for all other applications too. Not having QoS policies in place on the edge means there is nothing to get in the way or cause more interactions. Cleaner networks mean faster, more consistent applications
Bringing It All Together
The end goal of scheduling, whether it’s deliveries of packages or packets, is all about consistency. The more consistent you can make the delivery mechanism the better the overall experience for your users or consumers. Amazon is learning this just like UPS and FedEx had to learn it. Intel is giving developers the power to enhance their networking consistency when writing applications and taking the power to make things more reliable instead of waiting for the network team to do it for them. That kind of agency is a prime reason why companies like Intel are going to be successful with data center innovations for years to come.