Network operating systems (NOS) are about the most unexciting things ever. We’ve been working with them for decades now. Whether it’s the venerable Cisco IOS or upstarts like Arista or IP Infusion, there’s a certain amount of “been there, done that” in the NOS.
That’s because the basic level of what we consider a NOS hasn’t changed much. It has to move packets across interfaces. It has to move packets off the box if those interfaces are not directly attached. It has to provide some kind of command interface. And it has to have some kind of way to monitor what it’s doing. For years, we’ve tried to optimize the first two things on the list. The CLI for a box is either Cisco-like or something different. And the last point was left up to SNMP and heaven help you if you want to know more about it than that. That’s because, just like a racing car, the performance is all that matters. Who cares how comfortable the seats are as long as it goes fast.
Cloud is changing that. Performance is no longer the magical unicorn we are chasing through the forest. Whitebox hardware vendors are pushing performance as hard and fast as it will go. The switch won’t top out right away. Instead, the way we interact with the systems is becoming more and more important. Rather than just tossing the device into a corner and hoping that someone enabled telnet, we’re instead finding that things like automation and programmability require us to have more control over the hardware. And that control is exerted through the NOS.
Snap To It
During Networking Field Day 20, we got a chance to check out a relatively new NOS from SnapRoute. SnapRoute had previously launched a bit early, but they decided to come back to get the opinions of some very smart networking folks.
SnapRoute is taking a different approach to the way a NOS should be built. That’s because the founders of the company, Adam Casella and Glenn Sullivan, didn’t take a standard approach to networking. They came to it through Apple and Cisco TAC. So, they focused on a client-side network consumer and also on the dark side of supporting these archaic constructs.
Their approach is actually really simple: bust up the monolith that is the NOS. It’s an approach that is growing in popularity. Cisco IOS, up the the IOS-XE release was a huge monolith. How huge and how archaic? Well, if you’ve ever heard of DynaMIPS, you’ll know that the reason why it worked is because Cisco still had code in IOS to work on the 7200-series routers, which had MIPS processors. It took them a while to get that code out of the release trains, but IOS 15 still works on the DyanMIPS and GNS3 emulation platforms.
SnapRoute took a look at what companies like Arista have done with making the monolith less dependent on every process running in the kernel and took it to the logical extension of today’s environment. Rather than threading the operating system or making everything a process running on the kernel, instead they decided to make everything a Kubernetes container. Radical idea, right? If you’d have asked two years ago it might have been crazy. But the amount of things that are being containerized today doesn’t make it sound that far-fetched.
And yet, it is a radical shift. Because it’s no longer about retrofitting the NOS to have instrumentation and management capabilities. This is something that’s been seen at Facebook over recent years. They’ve been trying to shoehorn management and monitoring capabilities into their platforms for years. They’ve built everything around a Linux core, so trying to adapt that core philosophy and ensure that all the NOS components play well with it has taken a lot of engineering cycles.
On the other hand, SnapRoute picked Kubernetes specifically because it has the kinds of management and monitoring capabilities that they want in a NOS. Rather than reinventing the wheel, SnapRoute is changing the definition of what the car does. In this case, it’s just a Kubernetes cluster that moves packets really fast. Aside from that, you manage it just like any other cluster. You can get reports on it just like any other cluster. Suddenly, your switch isn’t on an equipment island all by itself. Instead, it’s something that looks just like any other endpoint in your network with a different tag on it.
The future of having Kubernetes as your go-to resource for the NOS has a lot of exciting potential. For one thing, it’s standard. That means you don’t have to invest a lot of extra time and effort into making things and then remaking things for your unicorn network devices. If it runs on your Kubernetes cluster for production servers, it’ll work on your production network too. It also means that you can have one team developing things for all your infrastructure instead of multiple silos working on things independently of each other. You can create standards for the data you want and get it where you need it when you want it.
Bringing It All Together
The only enemy of SnapRoute right now is time. They need to get their NOS working on multiple different platforms from other vendors. EdgeCore is a good start, but expansion is needed. The Kubernetes idea is sound for sure. But SnapRoute needs to push there to the forefront to get around the other networking OS vendors that are starting to enter the market. Running on whitebox switches is the new tablestakes. You need to differentiate yourself in a big way. For SnapRoute, Kubernetes is that differentiation. It’s time to tell the world about it and get moving.