Recently, I have been talking to many customers about data center resiliency as part of ConvergeOne’s Data Center Foundation Workshops. Before beginning an implementation, it’s important to know where customers are at, so I start by asking them if they would be able to recover from a data center failure in their current state.
After that, the next step is to broach the subject of recovery time objective (RTO). A customer’s data center may be recoverable, but how long does it take to bring data and systems back to their previously functional production state? The answer could be minutes, hours, days, weeks, or more. The longer the recovery time, the more terrifying it is for the business in terms of its capability to generate revenue or accomplish its mission. This builds the business case for investing in technology focused on resiliency, as doing so will make the recovery time proportionately shorter.
There are several ways that systems can be interrupted in the case of a failure: data corruption, data erasure, power and cooling failure, system hardware failure, site failure, regional catastrophic events, fire, flooding, and cyber and physical intrusion. I could go on and on. Any one of the events could happen to an organization, so it must be prepared to act when (not if) it occurs. That way, when it does happen, it can predict—or even know definitively—how long it will take to bring its data and systems back to their last known good functional state.
Many organizations have backup and recovery systems that are so advanced and sophisticated that they are able to protect data, or even whole systems, in a very short backup window to very fast backup storage media that archives everything to a secondary location or the cloud. For some organizations, this qualifies as the entire disaster recovery plan. This works well when an organization needs to restore a small subset of data, as recovery operations could be mere clicks away. Everyone’s happy with that.
In a grander scale, when there’s a complete site failure, recovering to the original state may be more challenging and tedious. It could require restoring data to an alternate compute and storage system in another site or in the public cloud. Now we are talking true disaster recovery, right? Let’s think this through. Let’s say your organization has an alternate site and equipment to run the latest version of the data in its consistent state. Can the users access the applications they need to work on to continue your organization’s mission?
Some organizations have a true disaster recovery solution, rather than a plain backup and recovery system. They replicate their data to a secondary compute, storage, or hyper-converged system in a secondary site or in the public cloud. They undergo replication in real-time, synchronously or asynchronously, with the ability to failover on-demand for planned and unplanned events. Complex networking across the enterprise allows them to shift workloads between sites, and the users follow the apps and data seamlessly without a blip. The result: A recovery time that is TRULY zero! Business continues as normal, as if no event occurred.
Where does your organization fit in on the RTO spectrum? Do you know your recovery time objective? Is your data recoverable immediately, or in minutes, hours, days, or weeks? Knowing the answer to these questions will help you answer the most important question of all: Could your organization survive a data center failure?
The ConvergeOne Data Center Foundation Workshop focuses on the resiliency of your data center infrastructure. During this workshop, our expert team will analyze your data center infrastructure and provide you with a consolidated list of recommendations and next steps to improve its resiliency.