Businesses are increasingly using cloud services (SaaS, PaaS, and IaaS) in their IT environments. They provide more flexibility on costs and can be more attractive than using conventional IT infrastructure. In 2016, in France, 48% of companies employing more than 250 people used it—an increase of 12 percentage points, compared with 2014. The greater availability of cloud infrastructure is often identified as an opportunity. However, the risk of failure of a service provider’s data center is rarely addressed, even though its services rely on data centers that are decidedly physical and not in the cloud. Such data centers face the same threats as traditional data centers: natural disasters, human error, etc. How, therefore, can backup be provided for these cloud infrastructures?
SAAS COMPUTER BACKUP: THE SERVICE PROVIDER’S RESPONSIBILITY TO PUT IN PLACE
SaaS (Software as a Service) is software that is made available on, and consumed directly from, the internet. It is managed by one or more providers. The customer does not have the wherewithal to carry out the backup activities is case of disaster (no access to raw data, source codes, applications that could duplicate the infrastructure, etc.), so it has to rely on the provider’s goodwill.
Levels of disaster recovery are variable for SaaS, depending on the provider’s degree of maturity
Three major trends are emerging:
- Providers who offer an inclusive disaster recovery plan. As part of their standard offering, the provider offers recovery at a remote data center, usually augmented with outsourced backup. However, they rarely offer commitments on recovery times.
Examples are the big SaaS players (such as: Office 365, SalesForce, and SAP), as well as some intermediate players (such as Evernote, and Xero);
- Suppliers who offer outsourced backup only. In their case, there is no clearly established disaster recovery plan, as such. The customer then has to question the ability of the provider to restore backup files in the event of a disaster at the main site.
Examples are intermediate suppliers (such as Zervant and Sellsy);
- Suppliers who don’t mention the issue or do not have anything in place. The subject of backup doesn’t even get raised, so it’s better to assume that nothing is being done.
Small players are usually in this situation.
Getting contracts right is key
In the vast majority of cases, SaaS providers have no provisions in their contracts on how they will manage disaster recovery, even though they might stress their ability to handle that risk. In fact, contracts usually include default Act of God clauses stipulating that the supplier is not liable for a breach of contractual obligations if this is caused by an event beyond their reasonable control. The legal risks must therefore be addressed when framing the agreement, and these types of clauses should be removed to ensure an appropriate level of cover.
Just as they do when framing conventional contracts, customers must ensure that clear service level agreements are in place, in particular for disaster recovery. These need to cover:
- Recovery times (Recovery Time Objective – RTO) and data loss (Recovery Point – RPO) in the event of a disaster;
- The provider’s disaster recovery plan, including crisis management procedures, as well as the obligation to carry out conclusive tests every year with real-world scenarios, as part of the plan, with the customer having the option to review the test report;
- Financial penalties and the right to terminate the contract (in particular, with a provision to recover usable data) if commitments are breached.
IAAS/PAAS disaster recovery: THE CUSTOMER’S RESPONSIBILITY TO PUT IN PLACE
Infrastructure as a Service (IaaS) is a standardized, automated offering of computing, storage, and network resources owned and hosted by a provider, and made available to the customer on demand. A Platform as a Service (PaaS) offering is similar to an IaaS offer, but it is different in that it only applies to software development stack (database, EDI, business process management…) according to Gartner’s definition. Unlike SaaS, disaster recovery remains the customer’s responsibility in both cases: IaaS/PaaS providers make services available in various data centers, and the customer is responsible for their use and configuration. Two solutions are available to customers using these services: to entrust things to a provider, or manage it themselves.
The market for cloud disaster recovery is not a mature one
Cloud disaster recovery providers are referred to by the acronym DRaaS: Disaster Recovery as a Service. Initially, DRaaS providers offered cloud-based IS disaster recovery of an “on premise” datacenter. But, today, they also offer to provide recovery for infrastructure already in the cloud, such as AWS or Azure. Levels of maturity remain very variable, depending on the provider and which cloud is used. Some DRaaS providers require that their own cloud is used for recovery, which means they cannot offer a PaaS recovery service.
As with SaaS, there are no default contractual provisions. Therefore, any guarantees required for data loss or recovery time will need to be negotiated. Suppliers generally promise to be able to tailor their offer to the customer’s requirements! To ensure that the recovery performs correctly, the customer must plan for disaster recovery tests to be carried out regularly (we recommend once a year).
Operating your own disaster recovery plan, using tools offered by the supplier
For “on-premise” infrastructure, you will need to think about, and define, your DRP strategy right from the design phase. This strategy must include the option of performing tests to ensure a sufficient level of confidence in your plan.
Implementation can be simplified by the tools offered by cloud providers, and the high levels of standardization in cloud environments. The major players have set out, in white papers, the key guidelines to follow in pursuing such a project (for example, AWS and Azure).
Conceptually, these DRP strategies remain close to those used in “on-premise” data centers.
There are four main ones:
- backup and restore: simple backups of data and images of machines on a remote site, which are restored if an incident occurs;
- pilot light: replication of databases and the provision of machines, in the form of images, ready to be used if an incident occurs;
- warm standby: full replication of the main site (data and machines); the recovery site is undersized in performance terms but ready to scale up if an incident occurs;
- multi-site (or active-active): the two sites are identical and share the load from users. If an incident occurs, the remaining site can scale up to cover all users.
Hybrid solutions that are better designed to take account of recovery time requirements, and cost and complexity considerations, can also be considered.
The real contribution that the cloud can make to DRP is the numerous tools that it can offer to simplify its implementation and activation.
As a result, data replication can be simplified for asynchronous geo-replication options (where multiple copies are replicated to other regions). The RPO varies, depending on the types of data and tools involved. Aside from this option, local data redundancy is almost always included.
The high degree of standardization also makes it possible to automate the recovery: the scripts or APIs made available by providers make it possible to automate deployment of infrastructures, resize instances (according to previously defined configuration), distribute loads and traffic, carry out IP addressing, etc., in order to considerably speed up a backup site’s activation time.
The monitoring and alert tools, which are also on offer, are intended to facilitate in-service support and can be used to detect an incident in the shortest possible time, or in some cases, partially automate the activation of a backup site.
Lastly, this ability to provision new resources within a few minutes enables the associated OPEX to be minimized. By using such a strategy, it’s possible to make gains of 40 to 70% on the cost of DRP infrastructure.
Toward greater support by providers?
During 2017, Azure is planning to offer an option to provide recovery for virtual machines hosted on its platform by enhancing its “Site Recovery” service. In fact, “Site Recovery”, in its current form, offers to support traditional site backup, by using the Azure cloud to host the secondary site, but Microsoft wants to extend this service to provide a Recovery as a Service option. This tool would allow the automatic deployment of the secondary site (of the active-passive type), automatic data replication, and easier testing.
This option was available as a “public preview” at the end of May 2017. There is no equivalent project in train from the other main IaaS/PaaS providers.
THE CLOUD AND PROVIDER SYSTEMIC RISK
Backup of cloud-based services is dealt with differently, depending on the type of service used. SaaS recovery must be managed through contracts and are the responsibility of the provider, while IaaS/PaaS recovery, simplified by the tools available, remains the responsibility of the customer.
There is a risk of the widespread failure of a provider’s hosting region as recent incidents have shown. Even though these incidents have been short-lived, or have had minor impacts, the possibility of widespread failure cannot be ignored. The issue of cyber-resilience, then, must still be dealt with. Using a second cloud provider can cover the risk of destruction, or a major outage of a first provider’s infrastructure. This solution is very complex because portability between providers is a difficult issue. For now, there are few companies that have risked it, although Snapchat is an example: it uses Google’s cloud for its production, and plans to use Amazon’s for its DRP within five years.