The development of databases is hard and challenging work. As someone who makes a living tuning database systems, I frequently see SQL code and database schemas that have clearly not been optimized. This usually indicates to my trained eye that when the application was tested, only a small subset of the production data volume was used. This likely led to acceptable application performance in testing, and then the application started to meltdown as data volume increased. This is just one of many problems associated with testing in database development. In this post, you will learn about the challenges and costs of managing data in non-production environments, along with three major tenets—the cost of infrastructure required for development, process concerns, and the security requirements to limit production data access.
The better way to perform application and database testing is to populate testing environments with full copies of production data. There are arguments against this practice, namely security and cost, but many times the only way to generate proper explain plans and understand the impact of application changes is to test your application against the full set of data. As mentioned above, testing with a small volume of data can lead to dramatic performance anomalies when code is moved to production which leads to headaches for the IT organization, such as infrastructure costs and security risks.
Costs In IT
When speaking about IT costs, two of the biggest line items go hand and hand – storage and database licensing costs. Developers and application teams need many environments for testing, development, quality assurance (QA), and ultimately for production. Even for medium-sized databases, this adds a great deal of infrastructure expense. For very large databases (VLDBs), it can add up to massive costs. In the case of one Fortune 500 telecommunications firm, a single application’s databases totaled multiple petabytes of data and represented nearly 10% of the IT budget. Since databases depend on fast storage, enterprise-class storage is required and costs can be as high as $3,000-5,000 per terabyte.
While public cloud storage is often cheaper per unit than on-premises, the recurring monthly costs for large amounts of disk are a line-item that will garner management attention. Cloud storage has the benefit of being infinite; you don’t have to worry about buying more hard drives, but you do have to have to continue paying for it.
The infrastructure costs for lower environments also mean that high availability and disaster recovery options like Real Application Clusters (RAC) and Data Guard are not configured for lower environments. While those environments typically do not need that level of availability, applications are not regularly tested for failover, which can lead to unexpected behavior in production.
There is also a great deal of overhead associated with managing these databases. This process, even when automated, requires the DBA to monitor time-consuming backups and restores using RMAN. The restore process alone, especially for larger databases, can take many hours, or even days, for very large databases.
Supporting the Infrastructure
Another important note is that in most organizations the DBA is responsible for supporting the data infrastructure for all applications, while the development teams usually only support their given application. This leads to conflicting goals and priorities—the DBA is trying to optimize their workload and costs across the data estate, while the development team wants what’s best for their application. To balance these needs, a process is typically put in place where the development team opens a work ticket for the DBA team, that starts with taking a backup (or maybe restoring last night’s backup) and goes through a process before delivering the database to the development team. Again, this process could also be time-consuming as well as limiting the flexibility of the developers to test against recent data, if they are troubleshooting a data-related problem.
Another challenge for developers with these environments is that because of the high overhead of the infrastructure, the entire development team has to share the same copy of the data. Conflicts can happen with data and schema changes, as multiple developers may try to make changes at the same time. In smaller databases, each developer may get a copy of the schema in the database, however, with larger data sets, it’s not practical. Similarly, reducing the size of the database in development is challenging due to referential integrity and relationships between data values. The widespread adoption of DevOps processes across development organizations has just increased these pressures.
Database Security
Security is a major concern around lower environments. In many cases, some regulations prevent data from being viewed in lower environments. In the specific case of GDPR, since customers have a “right to be forgotten,” delete processing needs to occur across environments, requiring additional development. As a general rule, lower tiers have more relaxed security roles, with more people having slightly higher roles than they do in production.
Many organizations use data masking solutions either from Oracle (a potentially costly licensing option) or third-party vendors to obfuscate sensitive personal data using a variety of techniques. Data masking applies rules based on column values to prevent unauthorized users from seeing full records. It can be used either to completely anonymize data or allow users to see a subset of the data value. For example, allowing call center personnel to see the last four digits of a credit card number.
Conclusion
While virtualization, and even containers, have made the deployment of multiple database environments faster, the limiter to the process is the time to restoration of backups. It limits the ability to quickly redo unit tests. Some storage arrays and virtualization solutions offer the ability to take snapshots of disks or volumes, and this often bypasses data masking and role-based access control. These snapshots are not “application-aware” and commonly require the database to be shut down to achieve a readable state. Another potential issue with VM snapshots is runaway growth—since they are not actively managed, as data changes in the development environments, the size of the snapshot grows on disk.
DBAs feel the brunt of this pain. Even with automation of restore processing, they are typically the ones making the request to the storage team for more storage area network (SAN) space, getting paged if a restore fails, and dealing with the increased complexity of more environments. From a security perspective, both the DBAs and the development team are subject to audits that could identify unmasked data or unnecessary access. Resolving these challenges should be a goal of every IT organization.
I am evaluating Actifio along with other solutions. One thing I have not clear is when Dev schema gets applied. After a restore or in the restore stage?