Development should get the same care as production… after all, it’s production for someone.
IT usually has a few different environments: Production, Pre-Prod, Staging, User Acceptance Testing, QA, and Development. Normally these environments are treated with descending levels of importance as they go down the list of environments. Production gets regular backups throughout the day, typically has a disaster recovery environment, and has SLAs on how long systems will be down for.
We should treat our development environments just like our production environments, only without the log backups in most cases. The logic behind this is that while development (and other down-level environments) aren’t customer-facing, they are what some of the employees at the company use for the day-to-day work, making those systems production for those users.
While the development systems aren’t customer-facing, the developers that use these systems can’t have these systems be down for long periods. We can quantify the costs that go with this based on the average hourly salary per developer multiplied by the number of developers in the company. Let’s say that the average developer salary in your company is $50 an hour (that’s a blend of Jr. Developers, Developers and Sr. Developers) and you have 50 developers on staff. Some quick math says that if a development system is down for one hour, then the outage costs the company $2,500 in lost productivity as the developers aren’t able to work. If the development system is down for a day, then the company has a loss of productivity of $20,000. These numbers can add up quickly, even faster for companies that have even more developers working on a single system.
Because of these costs, it makes sense to treat development systems similarly to production systems. This way losses in the event of a system outage can be minimized while letting the developers get back to their work as quickly as possible.
You should have processes in place to have backups of down-level systems so that these systems can be restored in the event of an outage. You should also have agreements put in place with the system owners for the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) of the development systems. While these systems do not need to have a 5-minute RPO and RTO, a reasonable RTO and RPO for these systems need to be set so that the developers aren’t sitting around waiting in the event of a system outage.
The end goal for all systems is to have 100% uptime. The reality is that we won’t have that for any system. We have to have proper RPO and RTO for all systems, including those down-level, non-production systems in order to keep systems online as much as possible so that our co-workers are able to keep being as productive as possible; with the end result for the business being that as little employee salary is spent while bringing a non-production system back online. By treating our down-level systems like we treat our production systems we can increase the company’s utilization, decrease outage loss and increase the availability of the systems that our co-workers need to get their work done.