If you ever meet an IT professional who says they have never had an infrastructure outage in their career, they are either lying or in the first week of their career. Let’s face it, despite our best intentions and tremendous attention to detail, bad things happen in IT. Sometimes its due hardware failure or a software bug. Sometimes it was human error that we didn’t anticipate or design for in our systems. Regardless, we know that outages are inevitable and the true measure of an IT practitioner is how quickly they can recover from an outage and how well they can minimize the impact of an outage on the business they support.
Outages on the Rise?
As modern companies become more reliant on the technology that IT provides, the more apparent the effects of IT outages are. It’s difficult to say if the frequency of outages has increased, or simply that they are more noticeable due to the increased reliance on IT infrastructure and applications. Regardless, when an AWS region goes down, or an airline suffers a data center outage, the effects are felt far and wide and IT personnel and business decision-makers are left to deal with the fallout, identify root causes, and plan to mitigate the impact of future outages.
A Logical Response
When it comes to understanding and preventing outages LogicMonitor knows a thing or two. Founded in 2007 by former IT practitioners who used to monitor infrastructure themselves, LogicMonitor is a SaaS monitoring platform for both on-premises and cloud infrastructure monitoring. The platform is highly extensible with over 2000 integrations for everything from operating systems to networks to storage and more. In an effort to enhance its platform further and understand customer needs, the company recently performed and survey of customers.
The results of the survey are frankly not surprising. Companies who have suffered more outages or “brownouts” (service degradation) identified higher costs as a result. There were six primary costs identified by survey respondents:
- Lost revenue
- Lost productivity
- Compliance costs
- Mitigation costs
- Damage to the brand
- Lowered stock price
What is more surprising than the results of outages is the frequency of outages. According to the survey, a typical organization has experienced five outages and five brownouts within the past three years. Even more stunning is the fact that the survey found that 51% of outages and 53% of brownouts were avoidable.
In an era of digital transformation, is IT doing a poor job of providing reliable service, or are other factors at play? Could lack of budget, incomplete visibility, or outdated operating procedures be to blame for such brittle applications and infrastructure? Regardless of the cause, it is incumbent to understand risks and identify a strategy to mitigate them where possible and recover from them when it is not possible to prevent outages to begin with. This is the problem that LogicMonitor seeks to resolve.
Ken’s Conclusion
It may not be possible to eliminate outages entirely, but that doesn’t mean IT practitioners should stop trying to do so. The better an organization understands its infrastructure’s dependencies and potential weaknesses, the better they can avoid outages that can be prevented, and the better they can recover from the outages they cannot. LogicMonitor would like to be the partner that enables IT to do just that.