“Carrier-grade” is a term that is often used and sometimes misused in the world of IT to highlight a mindset of 24/7/365 operations. In the world of telecom, lack of uptime can have serious consequences in the realm of public safety and emergency services. Accordingly, service provider equipment is designed to be extremely resilient to physical and logical failures as well as single points of failure.
In enterprise IT, carrier-grade is an idea we often strive to reach but find difficult to attain. In early August, I along with other delegates attended NFDx and got some unique insight into Aruba/HPE’s vision of a “carrier-grade” campus core which debuted with the announcement of the Aruba 8400 campus core switch.
I’ll admit, coming from a service provider and telco background, I’m often a bit skeptical when I hear enterprise network equipment described as being in the same league with service provider equipment. After all, we put ISP networks outdoors to endure the torture of mother nature, people and bored squirrels.
However, as I got into the nuts and bolts of the 8400 and spoke with the people who helped to design and bring it to market, it was very clear they took the label of “Carrier-Grade” seriously and meeting that goal was an integral part of the core design and development philosophy.
The 8400 system architecture: A “No Excuses” platform
Mike Frey with Aruba shared the system architecture goals with us. A few that stood out to me are:
- “Carrier Class Availability – Non stop operation after any single failure, every component hot field replaceable, 99.999% system uptime goal, every system with redundancy”
- Longevity – designed to last for 3 generations worth of hardware
- A new switch operating system that’s as resilient as the hardware
- No Excuses Platform
The part about “No Excuses” intrigued me and I so asked Mike to clarify a bit. He responded with:
“We designed this box around a set of ASICs and then threw the design away because we found that we didn’t like the capabilities…we chucked them and started over.”
He goes on to explain that Aruba wanted to design a platform that wouldn’t require a mountain of excuses and caveats to a potential customer as to what the switch wouldn’t or couldn’t do in the context of campus networking. No excuses.
Hardware designed for high availability
Digging further into the hardware reveals a switch that’s had some serious thought put into it. From power supplies that can be replaced without disturbing power cords to fabric modules that can be replaced if there’s ever a dataplane issue – without having to unrack the entire chassis and take down the network.
Paying attention to the details in hardware is critically important to meeting the stated 99.999% availability goal.
Some of the features that really stand out on the 8400 are:
- Separation of PSU and power connector
- 2+2 power redundancy – half of the power supplies can be lost and still fully power the chassis
- Independent fabric modules
- linecards that are covered to protect the components and board of the card.
- chassis design that allows for hot swap of virtually every component without impact to the service level
Mean Time Between Failures (MTBF) is another factor that’s used to measure hardware but often overlooked and worse, not even published.
Aruba’s lowest MTBF number on all of the components of the switch still exceeds 30 years of life expectancy. This is an area that network vendors have been making tradeoffs on for too long and so it’s refreshing to see Aruba display MTBF so prominently in the data sheets for the 8400.
A switch designed around software
Hardware is only part of the equation to designing a box that’s bulletproof. Aruba took a very clear and definitive stance here on how software development would affect the design of the 8400.
Mike Frey elaborates on what made this switch different from others he’s been involved with in his career:
“I’ve been developing switches for us for a long time and I’ll tell you this development has changed much more from ‘here’s hardware, make the software run’ to ‘here’s our software vision…enable that software’.”
And that’s absolutely the right approach, we get so excited about speeds, feeds and ASICs as practitioners of network engineering that we often forget the hardware is useless without a software platform that is purpose built to complement the hardware and not bolted on after the fact.
ArubaOS-CX – A shift in network operating system design based on a single source of truth about current network state
In one of the sessions, we were discussing what would happen inside the OS during a typical convergence event such as an OSPF graceful restart.
Because the OS is designed around a central database and not individual daemons maintaining their own set of databases, it allows processes such as OSPF to be less impactful to network downtime when a process restart occurs.
Tom Black with Aruba made it a point to stop and address why this was an important and conscious design choice. It is also a clear departure from the model of previous network operating systems in the industry.
- ArubaOS-CX is based on distributed systems and cloud programming fundamentals
- Everything is in the database – process restart is just an artifact of the architecture – it’s not net new code
- High availability and resiliency is an outcome of the architecture
Using a current state database is the central component that makes the software such a shift in design because it holds the current state of everything in the system in the database. As a result, any agent that fails synchronizes from the database.
During the NFDx presentation, I specifically asked what “Carrier Class” means as it relates to the 8400. Michael Dickman answered and said:
“The short answer would be that you have a lot of redundancy built into every layer and it’s done in a way that’s more robust than some of the older approaches…in the campus. The reason we think that’s important to customers in the enterprise is the mission criticality of the network. it’s been high for a long time but we really want to emphasize that we paid a lot of attention to that table stakes piece around resiliency and reliability…it cannot go down.”
Impressions and closing thoughts
There is a recognition by Aruba that enterprise customers want a box that really just works and not a sales pitch. Every facet of the 8400 design reflects a commitment by Aruba to understand what causes network outages and develop a switch that solves those issues.
It’s clear that Aruba’s development team for the 8400 switch really took the idea “it cannot go down” to heart. High availability and resiliency surrounds every component of the design from hardware to software and I think we might even see a new standard by which enterprise network equipment is judged.
This is a definite thought shift in how we should approach building a platform for campus connectivity and core. I’m excited to see if the work done on the 8400 makes its way into the access layer with different form factors offered in the future.
I for one would be happy to see an entire product line of “no excuses” switches for enterprise networking.
[…] Designing a Campus Switch with a “Carrier-Grade” Mindset […]