As networks have grown bigger, operating them has become another worry. Operators in charge running the networks run into all sorts of performance and security issues, and those amount to a lot of problem-solving over the course of a workweek.
Last week at Networking Field Day event, Selector Software addressed this situation in their presentation. VP of Solutions, Debashis Mohanty, showcased the Selector AI platform in an introductory session. Mohanty explained the key capabilities of the solution, and walked the audience through some real-world deployed customer outcomes.
Living with Complexity
Network problems all seem to emerge from a common root – an obsolescent technique of operation. In the past, network operators had to navigate through a thicket of unorganized data to find issues to debug. These days, it only takes looking at dedicated data sources to spot an anomaly. There are a ton of monitoring tools in the market that bring this information up to the fingertips of operators.
But in a quite unexpected way, this opens a whole can of worms. Teams running the networks began reporting operational complexity of a never-before kind. With modern networks having far too many problems than those before, operators already spend big chunks of their workdays browsing through data – logs, config, events, metrics, alerts, what have you. Supplied by an array of disparate tools, these data are siloed like their sources. There are as many dashboards to look at as there are monitoring solutions deployed in the network.
But navigating through these disparate sets of data is only the morning mist. The real pain starts with the manual analysis and extrapolation.
“As soon as you get to the right dashboards, you have to write a structured query language or a SQL construct to get the data back, and the folks who are in the frontline trying to figure out what’s happening may not know the exact sequel language to get that insight. So, they go back and forth between these multitude of dashboards and try to figure out what’s happening.”
Once operators have passed that test, the information needs to be handed off to the teams to whom it may concern, and take it from there. This process from start to finish can take anywhere between days to weeks, and during an outage, every second wasted is dollar lost.
Selector Reimagines the Process with AI
Selector AI is an AI-based observability platform that provides actionable operational intelligence for multi-domain network infrastructures in real time. Using machine learning algorithms to automatically identify abnormal events and failures, and correlate them to the root causes with pre-built workflows, it reduces the Mean Time to Repair (MTTR). In doing so, it eliminates data silos, dial down the alert noise and overall reduces downtime.
“The three main pillars of our product are – collect, correlate and collaborate,” explained Mohanty.
Selector AI coalesces data from heterogenous sources over a variety of protocols. With an AI-based data analytics approach, it speedily performs the grunt work of studying the data and identifying the warning signs arriving at insights. Without wasting any time, it passes the insights on to the teams using collaboration channels like Slack and Microsoft Teams.
ML-Based Data Analytics
Selector AI seeks to make the journey from data to insights short, swift and less cumbersome. To that end, it has a set of well-defined, realistic goals. The first goal is to provide answers to the roll of questions that engineers may have.
“Our primary focus when we built the product was to provide curated, contextual, effective answers when someone asks the questions.”
The data it ingests includes, but is not limited to metrics, configs, alerts, event logs, flows and tables. Using its robust data integration, Selector AI ingests data in all formats from pre-integrated sources, data lakes, and databases including Splunk, InfluxDB, GitHub and more.
“When we started the company in the middle of the pandemic, everyone was remote, and all company operations were happening remotely. So, we integrated with Slack and Microsoft Teams so that engineers can ask questions and get responses easily. Our bot is a part of these channels, and users can do the debugging collaboratively,” said Mohanty.
Automatic baselining with tons of metrics and KPIs makes it possible to monitor metrics behavior closely and send out alerts quickly with automatic alert notification.
So that operators don’t have to go through their day tackling an alert storm before getting to triage and troubleshooting, the platform ranks the alerts in order of priority. It alerts the teams with the most critical ones for the day, and everything else that is a category behind comes after that.
Selector AI stores the insights so that users can access them later if they need to.
“It’s like a DVR, you can go in the past, see what happened, what things are correlated, and go back and forth.”
Selector AI is vendor-agnostic and comes in a combination of deployment models including cloud, on-premises, hybrid infrastructures, and customer VPC. It is also delivered as a SaaS solution. With the exception of a few dashboards, it is fully managed by Selector.
Selector AI can be bought on a flexible monthly or annual subscription. For anybody interested, there is a free 30-day trial available on their website.
Selector AI ends the struggle of IT teams managing the network by replacing the manual steps of searching, analyzing and communicating with built-in automation. It visibly improves the process by introducing functions like proactive anomaly detection, identification of actionable correlations, prioritization of alerts, among others. When teams don’t have to spend a bulk of their time looking at data across a plethora of dashboards scattered across the screen, and insights are ready-to-use, it leads to faster and better decision-making. With Selector AI dialing down the complexity that hinders organizational success, teams can take actions and resolve outages much faster, leaving customers happy.