This blog describes a presentation and demo from Pluribus Networks on their various approaches to building data center overlay fabrics during a recent Tech Field Day Showcase where I participated as a panelist.
In this blog, I’m going to briefly state what was presented and demonstrated by Pluribus, provide some follow-up links for you, provide some background on Pluribus Networks, and then dig into the Pluribus approaches a bit more.
What was Presented/Reviewed?
Pluribus Networks presented on their Automated BGP EVPN VXLANcapabilities that became available in Netvisor ONE OS R6.1 in April of 2021.
- Pluribus Networks Blog on BGP EVPN
- Pluribus Networks Showcase Presentation 1
- Pluribus Networks Showcase Presentation 2
I was first introduced to Pluribus Networks at a Networking Field Day event a couple of years ago. For reasons such as long days/short on sleep, I was a bit slow catching onto what Pluribus does at that time. Perhaps because of that, I think it is useful to start by understanding what Pluribus does.
Pluribus provides a unique take on SDN Automation for datacenter underlay and VXLAN overlay fabrics. First, you build a physical leaf/spine “fabric” of lower-cost white box switches running the Pluribus Netvisor ONE network operating system (NOS).
The “Adaptive Cloud Fabric” can then be activated which federates all of the switches in the fabric into a peer-to-peer mesh – every switch has a TCP connection to every other switch. Think of this as a management fabric and where each switch has a state database of the entire fabric and is aware of the state of every switch. Next, the NetOps team can configure, from the CLI (or UNUM), the underlay by connecting to one member switch (seed switch) and issuing the topology commands – typically the underlay is BGP. This SDN control plane intelligence is embedded in the network OS and runs on the switches – no external controllers needed. Next up is the overlay, similarly the entire VXLAN overlay fabric can be deployed from a single switch or single SSH session. The result is that It provides L2 / L3 underlay connectivity and overlay services across the fabric of participating switches.
Alternatively, you can instead use the most recent Pluribus Netvisor ONE R6.1 release to build a standards-based BGP EVPN fabric by issuing CLI commands to each switch. This was a key part of their presentation and demo. Pluribus does automate BGP EVPN reducing the number of lines of config but a factor of 6 or so, but it is still deployed box-by-box. External automation tools like Python scripting and Ansible can be used to automate it further.
The unique aspect of Pluribus’ SDN approach is that you can enter configuration commands on one switch, and then reliably apply to groups of (or all) switches in the fabric with rollback and roll-forward. If one switch in the fabric can’t deploy the new config it is automatically rolled back across all switches to maintain consistency, until the switch in question can be remedied.
There is a GUI named “UNUM” for reporting and analytics, but it is not a central controller. This is because the SDN control plane intelligence literally runs inside the switches leveraging the multi-core CPUs found in most high-performance white box data center switches today – the GUI is really just a portal with workflows to make the fabric even easier to manage.
Pluribus achieves this ease of use and command replication by using an abstracted approach where you configure objects at the fabric level. These objects are mostly familiar entities such as VLANs, subnets, etc. Effectively this is a type of intent-based networking that is declarative – declare you want to deploy a VLAN with fabric scope and the SDN control plane takes care of it.
This is different than the imperative approach which requires repeating multi-line configs box-by-box. The declarative approach can be used either with or without BGP EVPN. Pluribus allows the declarative service objects to be instantiated as BGP EVPN service configurations on each switch and the services are established across the fabric via the BGP EVPN control plane. This greatly simplifies the job of getting service configured on every switch in the fabric. Alternatively, Pluribus can employ its SDN control plane to instantiate the services across the fabric in a protocol-free approach, which simplifies things further by avoiding the BGP EVPN complexity altogether.
An example may help with that. When you look at configuring one VLAN and SVI in Cisco VXLAN, it takes a fair number of commands. Pluribus reduces that to a couple of commands, which can be automatically applied across all of the leaf switches in the fabric rather than one at a time. In effect, all the leaf switches become one logical switch (see the last diagram in this blog for illustrative purposes). The following diagram illustrates this:
I’m told people generally use their 100 / 400 G switches when issuing commands. Or the UNUM GUI’s forms.
The Pluribus OS includes integrated telemetry, accessible locally on any switch or via the other UNUM software offering which is called UNUM Insight Analytics and which uses ElasticSearch database as a data store for up to 30 days worth of per-flow data.
Additional technical capabilities include multi-tenant “slicing”, virtual routers (more than VRF’s), BGP EVPN, multivendor integration, VXLAN overlays, “extensive” QoS, Netflow/IPFIX, control plane protection “CPTP”, “advanced security”, flow tracing across the fabric, integration with RedHat OpenStack, integrated telemetry, etc.
Which switch hardware? Pluribus Freedom Series, Dell, Celestica, Edge-Core and other ONIE-compatible switches.
This approach has pros and cons. I happen to like CLI because you can do diff and change control on it. While Pluribus does have CLI it is typically used for fabric wide commands which is different than the box-by-box CLI with which many of us are familiar and comfortable. One limitation and consideration is that if part of the fabric is not responding, you may have to apply a command to a group excluding the “missing” switches, then “catch them up” (roll-forward) when connectivity is restored. Or even when a new switch is added to the fabric! The reliability aspect is relevant since you can build fabrics that span multiple datacenters.
Pluribus has been around for a while. They deploy to enterprise and SLED but a solid portion of its customer base is telecom / mobile providers. Ericsson is a strategic partner, using Pluribus in 100+ Tier 1 customer sites. An Internet Exchange uses Pluribus to control management networks at 15 sites across Europe.
Competition: Cisco ACI is really the main competitor offering an SDN fabric, but of course requiring a lot of external controllers. Then there are also the other OS providers that support BGP EVPN like Cumulus (now part of Nvidia) and Arista and also external automation solutions like Apstra (now part of Juniper), etc.
Main Use Cases
Here’s Pluribus’ list of main use cases:
- Leaf and Spine Fabrics (L2, L3, EVPN)
- Edge DCI fabrics (Data Center Interconnect)
- Multi-Site Datacenter Unification
- Edge Compute
- Layer 1 Automation (configuration driven cabling patch panel, in effect)
- Metro Ethernet Cloud Services
Pluribus told the author that they’ve seen a lot of growth lately in the DCI space, e.g. one CSP with 4 (soon 6) datacenters with multi-tenant slicing and distributed L2/L3 services.
The first presentation focused on 3 different scenarios during the showcase:
- Building a fabric with Pluribus’ standards-based but highly automated BGP EVPN implementation.
- Interoperating with other 3rd party fabrics/switches using a standards-based BGP EVPN gateway function for brownfield insertion/expansion.
- Building a fabric and deploying networks services in the overlay with Pluribus’ Adaptive Cloud Fabric SDN control plane to reduce fabric operations complexity
Pluribus automates the provisioning of BGP EVPN by turning it into a “fabric object”. This allows a NetOps team to deploy BGP EVPN on border nodes that connect a Pluribus ACF fabric to an external 3rd party fabric with standards-based BGP EVPN. The border nodes handle translation EVPN information between the two formats (BGP and Pluribus’ SDN control protocol). Alternately if you wish, you can also build the Pluribus fabric using their automated approach to BGP EVPN, and while this results in a reduced set of commands by a factor ~6, that would still have to be done one node at a time.
The second presentation was a live demonstration on how to set up connectivity between a Pluribus fabric and a Cisco Nexus switch running BGP EVPN. Once the VXLAN tunnels were established stretched L2 and L3 services were deployed and then verified via ping and SSH connectivity, respectively. The idea of this demo was to show all 3 elements from the first presentation. Namely 1) what a standard BGP EVPN config looks like on a Cisco Nexus, 2) the Pluribus automated (and more compact) BGP EVPN config running on a gateway enabling EVPN interop with the Nexus and finally 3) the simplified operations of the full Pluribus SDN approach with Adaptive Cloud Fabric. The demonstration also included some exposure to the UNUM Fabric Manager and UNUM Insight Analytics. Time was short, so doing interop to Pluribus’ Cumulus switch was omitted. See the video for more details!
Pluribus Networks’ EVPN now makes it easy to connect a Pluribus EVPN fabric to other standards-based BGP EVPN fabrics. This could provide a way to migrate to less costly white box switches or to manage a mixed-vendor EVPN network with pods based on different vendors.
The following slide compares three ways of building EVPN fabrics:
- Non-Pluribus CLI (one VNI) – the most typical way EVPN is deployed
- Automated Standards-based BGP EVPN on Pluribus
- Full Pluribus SDN control plane available with their Adaptive Cloud Fabric
Note the different numbers of CLI commands that would have to be pushed out for each of the three cases and the number of SSH sessions.
Comments are welcome, both in agreement or constructive disagreement about the above. I enjoy hearing from readers and carrying on deeper discussion via comments. Thanks in advance!