I’ve just about recovered from my visit earlier this month to New York City’s Columbia University and the ONUG (Open Networking User Group) Spring 2015 conference. Looking at my pedometer results, one thing is clear: I don’t get nearly enough exercise when I’m not at ONUG. Was the ONUG conference everything I thought it would be? Well, it’s a step in the right direction but it’s clear there is a long way to go before we reach the open fields of unicorns and rainbows.
I was disappointed that, for various reasons, I could not attend the more in depth sessions a Tuesday’s ONUG Academy, not least because there were some great presenters sharing knowledge. However the Wednesday and Thursday sessions offered some highlights.
On Wednesday morning, Adrian Cockroft of Battery Ventures shared his thoughts on “Creating Business Value with Cloud Infrastructure,” a title which totally undersells what an interesting session this was. Based on his experience while working at Netflix, Adrian made a rather amusing observation about Enterprise IT adoption of new technologies compared to Netflix and the Rest of the World:
Adrian stated that everything – even at the microservice level – should be API driven because it means that product developers can make changes to tiny parts of their product and so long as the API remains the same to the neighboring systems, those changes are transparent to the rest of the system. He points out that when more features are added in a given software release it means an exponentially greater risk of bugs, so his suggestion is to release more often with fewer features in each release. In Adrian’s opinion, containerized microservices with APIs make upgrades much simpler and more reliable. As he points out, Docker has gone from “nothing” to “why isn’t it on your roadmap” in about a year, and the advantages are potentially huge.
Is this Open Networking? Yes, in the guise of DevOps; these same principles can be applied to pretty much any system that’s being developed. This raises some interesting thoughts about how applying APIs at a microservice level might change how we build systems in comparison to having a single API for the system at a macro level. I’m thinking about this from the perspective of a modularized open system where it could be possible to plug in your favorite components so long as they can support a common API. Whether you like the topic or not, Adrian is a great speaker and well worth seeking out if you get the chance.
I confess that Facebook’s Wedge switch doesn’t have me jumping up and down with excitement. However, Facebook’s Director of Technical Operations, Najam Ahmad got my attention when he asked how many people we thought were supporting Facebook’s server and network infrastructure worldwide. The answer? One.
While Facebook’s solution might not be entirely practical for other (smaller) companies, much of the underlying thought process can be utilized. Facebook set out with the goal of having just a single on-call engineer. To do that, first they needed to automate the process of wading through the millions of alarms that were being generated every week and filter them down to just the “important” alarms. Then they needed to recognize known issues automatically and then deal with them without human input.
As an example, Najam described the issue of a memory leak in some of the Facebook code. As a highly distributed service, you can bet that if there’s a memory leak on one server, that leak will crop up on an awful lot of other servers sooner or later and cause a massive issue. The solution was to monitor available RAM on the servers, and when a threshold was hit, an automated tool migrates traffic away from that processing node, shifts control off the processor with the issue, and reboots it. Then, when it’s ready again, the tool shifts traffic back on to the node. As Najam put it, Facebook can pretty much write code with memory leaks and (almost) not care – at least from a support perspective. There is of course still a feedback loop to the developers who can then work on figuring out the problem, but until then the issue can be handled with no downtime and no human resource investment.
Applying a similar logic to other areas has allowed Facebook to almost do away with on call support and have a mostly self-healing infrastructure. Does that sound good to you too?
The SD WAN Workgroup – one of six ONUG working groups – reported back on their testing with Ixia. Truthfully I’m not surprised that pretty much everybody aced the tests, but what got my attention more than anything else is that there were no fewer than eight vendors sitting on that stage. I could think of at least two more companies that weren’t represented there, which is pretty impressive for a technology that didn’t have broad recognition a year ago.
As I’ve said elsewhere, SD WAN for me is what SDN needs to be. Software Defined Networking in and of itself is meaningless; it gives users the ability to do things, but it’s just a set of really neat capabilities, none of which the average person can put together into something they can support (or would want to support). It has been said that SDN is a solution looking for a problem, and what SDN has needed for a while is a solid use case where the product can be sold on the benefits and not on the feature checklist. In my opinion, SD WAN fits in that spot very neatly; it’s a clever solution to WAN problems that just happens to use SDN to accomplish its goals. Now, if we can only find a similar exciting use case for IPv6…
The Long Path
So with all these exciting things going on, why do I say that there’s a long way to go before the fields and rainbows?
Adrian Cockcroft’s adoption rates chart was amusing, but related to a more general issue; that different verticals will adopt these technologies at different rates. It should be evident from the list of people on the ONUG Board and the panelists in the Fireside Chat on Thursday morning (e.g. JPMC, Credit Suisse, Fidelity Investments, Morgan Stanley, Citigroup, Bank of America) that the financial services industry is hugely engaged in these cutting edge technologies. One attendee told us that “the products you’re seeing here are already two years behind where we are today,” and it’s clear that this is one vertical that is actively engaged in driving the vendors in the direction they think is the right one. Then there are the hyperscale hosting companies (the Facebooks, Amazons, Googles) of the world who are probably in a position to just code whatever they want to do anyway. The rest of us back here on the planet surface are in a kind of no-man’s land where we want the flexibility offered by SDN at a price offered by open networking solutions, but we may not yet have the organizational, operational and technical skills to make it a reality. That’s where the vendors come in, packaging products neatly for us so we can reap some of the benefits without having to get too in-depth with the technology.
Software Defined Networking, it is said, is networking finally catching up with the server and storage worlds in terms of automation and orchestration. However, I question whether we can succeed while we remain isolated from other disciplines. Looking at software development, one of the tenets of the Scrum methodology is that you must build cross-functional teams to execute sprints. Similarly, ONUG attendees heard repeatedly that to create the kind of networking environment where unicorns will graze happily, we have to get out of the silo mindset and build an infrastructure team that works as a truly cross-functional group.
There’s a joke out there that whenever something goes wrong with an application “it’s the network.” In the network team of course, “it’s the firewalls.” Funny perhaps, but it’s indicative of a mindset and organizational paradigm that needs to be taken out back and shot. Why are we finger pointing? From the application’s perspective, infrastructure (server, network, storage) needs to be a reliable service upon which unreliable code (see? I can dig at coders too) can be deployed. If there’s an infrastructure problem we should be working together to fix it. When we look at open solutions, we shouldn’t be looking at what’s best for networks or what’s best for the server team, but instead for a solution that might benefit both down the line. If we are supporting Linux on the servers, perhaps we can look at supporting Linux on our switches too. More importantly — as Ethan Banks discusses in the context of operationalizing open networking — networks are complex and require specialists, and the idea of letting some DevOps team try to automate it is terrifying. Perhaps, however, we need to look at how some of the new products and technologies might allow us to resimplify the network and let it be automated. Either way, none of this will happen unless we break down the silos and start working together to support a unified infrastructure. The move away from proprietary solutions is a large part in the enablement of this transformation.
If programming the network seems too large a step, how about just monitoring it? How many of us have an Infrastructure Management System that unifies our view of the service we provide? My experience is that most companies have a Network Management System (NMS) and then the server team has their own monitoring (maybe Splunk, maybe something vendor-sourced), and storage … well, nobody really understands them, so I’m not sure, but I imagine they monitor something or other.
Adrian Cockcroft nailed it when he said that DevOps is a re-org, but the changes need to go way beyond how we write code.