- Verify, Or Die Trying: Observations on Change Management
- Assure Network Security Policy and Compliance in the Data Center with Cisco Network Assurance Engine
- Change Doesn’t Have To Be a Four Letter Word
- Configuration and Hardware Assurance in the Datacenter with Cisco Network Assurance Engine
- Hands On with Cisco Network Assurance Engine
- Cisco Network Assurance Engine: From Download to Value in 60 Minutes (or less)
- Networking Has Changed, Have You?
Previous posts in this series have discussed the theory behind Cisco’s Network Assurance Engine (NAE) and how it can help detect issues stemming from changes in the network. This post, in contrast, is mostly hands-on with the software. It includes screenshots showing just how thorough NAE’s network analysis is.
Network Assurance Engine Working With ACI
The first release of NAE starts with assuring and diagnosing Cisco’s Application Centric Infrastructure (ACI). Some may ask why ACI – an intent-based networking solution – should have a need for a tool to identify problems. The software is only as good as the humans responsible for programming it, and ACI offers the opportunity to create some very complex configurations which can easily stretch beyond what most people can manage to keep straight in their heads.
As a reminder, the underlying ACI programming model looks like this:
While new to network engineers at first glance, this model provides the flexibility to control and manage network flows between groups of devices (or to external paths) by specifying higher level intent or policies. It’s certainly possible, however, to configure what appears to be the correct policies and still make mistakes. With great power to automate network behavior at scale, comes the power to make mistakes with large blast radius!
Cisco NAE extracts information from both the ACI network controllers and the actual network configurations and state, so it has insight into both the planned connectivity and the actual one. One of the highly effective visualizations NAE produces shows the contracts in place between end point groups (EPGs) in ACI, as well as their enforcements down in hardware. In the example below, it should be clear that all EPG connectivity stays within the same tenant (e.g. between EPGs in prod, and between EPGs in non-prod).
As it turns out, in this particular example it was intended that the petstore database tier in the prod tenant should have a contract with the petstore database tier in the non-prod tenant, but when the contract was set up it was configured with a scope which allowed the contract only to be visible within the tenant, so it could not be used between prod and non-prod, which are two different tenants. NAE infers the connectivity intent and notices the inherent policy conflicts within the contracts. It instantly highlights this contract issue and raises two related SmartEvents:
And here’s one of the SmartEvents, describing the problem and suggesting ways to fix it:
The event shows which objects are affected and it’s clear that in order for the contract to be able to work between prod and non-prod, the scope of the contract “NP-PS_DB-P-PS_DB-contract” has to be changed to global. An administrator can go in and make a change to fix the contract, and as a result the EPG connectivity diagram now shows a contract between non-prod and prod:
Drilling down by clicking on the contract line, it’s possible to confirm the details of any contract:
So, has everything been fixed now? Well, not necessarily. In this case, the contract has been corrected, but another unexpected issue has raised its head now that connectivity is allowed between prod and non-prod. This is where the continuous analysis of Cisco’s NAE is useful. Each periodic network state snapshot is displayed as a selectable “epoch” on a timeline, and it’s fast to switch from one epoch to another and see what has changed.
The dashboard identifies the SmartEvents, and the third one here is new in the latest analysis:
Drilling down, the single aggregated event entry expands to two individual SmartEvents:
Drilling down into one of the SmartEvents, the problem is again clearly spelled out:
The SmartEvent shows there is a contract between the prod db tier and the non-prod db tier for the petstore, but in order to allow that contract to function it is necessary to have a route between these EPGs. Unfortunately, the same subnet – 10.65.0.0/24 – has been used in both the prod and non-prod environments. Reasonably enough, the suggested next steps are to confirm if this connectivity is needed, and if so to fix the IP duplication problem. If not, then the contract between the two should be removed.
Cisco’s unique core of engineering, advanced services, and technical services has powered NAE to recognize over 5000 issues, and just browsing a small percentage of the event categories, it looks like the analysis is pretty thorough. For example:
It’s not that humans can’t find these kinds of errors on their own, but it can be hugely onerous. With the breath of intent that can be expressed within Cisco ACI, and the scale of modern networks, tracking down a communication problem could be a very time-consuming activity. Cisco NAE is like an virtual expert, that aggregates and reasons through the massive state and policy space on your behalf. It can identify those issues quickly by checking the network regularly for thousands of configuration or dynamic state issues.
Finding the Unknown
I know someone years ago who was a pre-sales engineer for a packet capture and analysis product. Their sales pitch was along the lines of “Give me 15 minutes to capture some traffic and I’ll tell you 10 things you didn’t know about your network.” The challenge was usually met with success. Cisco Network Assurance Engine is a product that would likely work with a similar approach: let NAE run for a few analysis cycles and I’d wager that almost every network has issues that network administrators were unaware of, yet NAE can identify.
Network configuration continues to grow more complex. At the same time, administrators move further away from the underlying configurations as multiple abstraction layers sit between them and the hardware. The need for, and utility of a tool that performs a sanity check of the network, indeed exhaustively VERIFIES it – as it has been implemented – will likely grow and become the de facto way we build and operate modern networks. Cisco’s Network Assurance Engine provides that verification today and continues working on ingesting an increasing range of networking and controller devices for analysis.