In IT we swear by two sets of numbers; Recovery Time Objective (RTO) and Recovery Point Objective (RPO). These numbers define how long it should take to recover a system in the event of a failure, and therefore, the backup times.
The issue becomes when IT professionals stop basing their recovery on the RTO and RPO and they only focus on the backup settings. The RTO and RPO should be used for more than driving the backup schedule for the system. The entire point of backing up systems is to be able to restore those systems in the event of a failure.
People working in IT need to get back to thinking about the time that it takes to recover a system. All too often IT professionals focus on the amount of time that is needed to back up the SQL Server database, when they should be focusing on the amount of time to restore the system and to bring the system back online. These end up meaning that people focus on how often backups should be taken, and how long those backups take to be completed.
In many cases this can be detrimental to the system as all too often backups that are being quickly written to a deduplication device will take a massive amount of time to restore those same backups due to the time spend rehydrating the backups so that they can be restored. This leads companies to be down for longer periods of time then their RTO allows for.
Missing the RTO is a major issue when doing a restore of a system. Missing the RTO means that the system is taking longer to bring back online than the IT department has agreed is an acceptable amount of time for the restore to take. If that RTO can’t be met, then the IT department is out of compliance for the restore. Depending on the system that is in place this could lead to financial or legal implications depending on the sector that the company is in.
All of this comes down to not planning for the length of time to do the restore so that the RTO can be met. As IT professionals we need to ensure that when we are planning our backup efforts, we take into account how long it will take to process the restore.
Being able to set realistic expectations and have everyone accept these expectations is a key part of deciding on an RTO for a system. Often Management will simply dictate what the RTO’s will be, based on how long the backup processes in place will take. IT people need to push back against these effectively random RTOs so that realistic deadlines are set for system restores. To ensure that the restore process can be met, test restores should be done on a regular basis in order to make sure that the RTO can be met when an actual failure (and restore) happens. If the test restore of the system takes longer the RTO, then it can be assumed that a production restore, were it to actually happen, would take the same amount of time, and therefore miss the RTO.
Setting and meeting realistic RTO can benefit the entire company. By setting realistic goals instead of arbitrary goals IT has recovery windows that they are able to meet, and the business has a realistic view of how long systems will be down for in the event of a failure.
- Development Should Get the Same Care as Production - November 18, 2019
- RTOs and RPOs Should Be Driving Recovery, Not Just Backups - November 14, 2019
- Employees Need Proper Data in Order to Make Good Decisions - November 1, 2019
- Keeping Down-Level Environments Populated - October 10, 2019