Across the Internet, there are thousands of network outages each year. Many of these are caused by delayed mitigation of issues. No matter the size of the business, or the duration of downtime, outages are hugely expensive. According to Gartner, the average cost is roughly $5,600 per minute. That amounts to a loss of $300,000 per hour for businesses, and a detriment to the customers’ digital experience.
About 12 years back, Cisco started to dabble in AI/ML with the mission to enhance its technologies. Having tried their hand at it for several things, last year Cisco applied it to networking to abate unplanned downtime with proactive troubleshooting.
At the Tech Field Day Extra at Cisco Live EMEA 2023 event hosted in Amsterdam, Cisco presented Predictive Networks, a Cisco solution that proactively identifies potential anomalies and issues in the network with the help of AI/ML. JP Vasseur, VP of Engineering, ML and Data Science, showed off the solution with a live demo.
Divergent Attitudes
For the past 30 years or so, operators have run networks reactively, where issues are resolved after they show up. But the modern audience has an especially low tolerance for uneven digital experience, and delays in remediation, no matter how minuscule, is synonymous with poor digital experience. Although, as Mr. Vasseur pointed out, reaction time in recent years has reduced to a matter of nanoseconds, yet we are distant from offering users a consistent network experience around the clock. AI/ML unlocks that possibility by taking remediation to the point of predictive.
But when it comes to AI/ML, Mr. Vasseur noted that people are highly polarized. There are believers who think AI/ML to be the only way to build intelligent systems, and heretics who remain unconvinced that AI/ML is realistic, and has real world use cases. Their skepticism is often inflamed by ludicrous theories bopping around the Internet like AI/ML is a ploy of big tech companies to replace humanity with robots, and is a straight up threat to the society.
The Truth is Somewhere in the Middle
Cisco believes that the truth is in the middle. Having an early start than many of its peers, Cisco has been able to test out multiple AI/ML approaches during its decade-long journey. Through a series of trial and error, in the process of which it rolled out several AI-based technologies, the company has acquired the expertise to figure out what works and what doesn’t, and determine if AI/ML is the right technology for a certain solution.
For some solutions, it is the right pick, for others, not so much, said Mr. Vasseur. So, at Cisco, they are very pragmatic with their application of AI/ML.
Reliable Predictions
In the early phases, Cisco selectively applied AI/ML to build IoT, cloud and security solutions. In the next phase, it is focused on implementing the technology in networking.
Three years back, Cisco started working on a project, the goal of which was to predict network issues. It faced some early resistance because of the general skepticism around the practicality of predicting the behavior of something as vast and dynamic as the Internet. Nevertheless, the teams pushed ahead.
The project had three clearly defined goals – predictive diagnostics, deep telemetry and 100% accuracy.
Teams at Cisco spent a year analyzing paths across the Internet and service provider networks in search of signals that could be used to predict issues ahead of their occurrence. Once they were able to triangulate those, they honed in on the gap that exists between precision and recall, and closed it.
“We all know that there’s a lack of trust sometimes, people are on the fence. So, we thought if we are making predictions, we want to make sure that we don’t have any false positive,” said Vasseur.
Through training and retraining of ML models, the technology was made to learn patterns and identify them from historical data to make accurate predictions about the network behavior.
But proactive networking is not a one-and-done approach. Nor is it the answer to every potential issue. That’s why Cisco advises piggybacking it with the traditional reactive mechanism to leverage its full capabilities.
“The goal was by no means to replace the reactive mechanism. It’s to say for some issues, can I predict them before they happen. If not, then I can fall back on reactive mechanism. So, this is really complementary,” said Vasseur.
Cisco narrowed down the scope to achieve perfection. Instead of building a machine learning algorithm that can predict all issues, but with relatively low accuracy, it designed a system that can concentrate on a small percentage of errors with a high level of accuracy.
Cisco Predictive Networks
With the mandate to make right recommendations all the time, Cisco built Predictive Networks. It’s a SaaS-based solution that requires neither hardware, nor software. It has a simple architecture constituting a predictive engine in the cloud that gathers telemetry from different data sources that it is connected to, and outputs recommendations.
“You will have a tab that says Predictive Networks. When you click on it, you see all the recommendations. In the second step, you will be able to apply the recommendations that sound good to you, and it will close the loop automatically. It does that for all sites at the same time, looking at the holistic view,” explained Vasseur.
With the help of multiple ML models, Predictive Networks makes sense of the data it ingests. The algorithm looks at all possible paths to the applications, and calculates the probability for a congestion or an SLA violation, for every single path. Based on what it sees, it makes traffic rerouting recommendations to alternate paths.
The models also consume information about application behavior from Layer7 every 10 minutes and categorize them into Good, Degraded and Bad, and use it to make the final predictions.
Predictive Networks’ common data model and algorithm are independent of the telemetry sources. That makes scaling easy. Users can decide how far and wide they want it to look in the network, and change the sources of telemetry without retraining the models.
Cisco is in the process of building an outsize data lake for Predictive Networks that can be fed with its in-house platforms like Viptela, Meraki and ThousandEyes.
Wrapping up
Predictive networking technologies are highly effective in detecting and analyzing issues before they happen. Cisco’s Predictive Networks is one of the early specimens of that technology that have a fascinating number of use cases. It is a hero product in its power to identify future issues with zero false positives. Users get structured reports of all recommendations of appropriate actions that makes for easy consumption. Last but not the least, the kind of predictive observability it offers translates to improved digital experience, and enables averting potential risks and losses.
Be sure to check out all of Cisco’s presentations from the Tech Field Day Extra at Cisco Live EMEA 2023 to learn about everything Cisco is currently doing in networking.