Azure App Service zone and region disaster recovery

Introduction

This article provides a high-level description of Azure App Service zone and region disaster recovery options. When running a single instance of an Azure App Service, you are at risk of downtime if your service goes down for any reason. You need at least two instances of the Azure App service to be covered for Azure zone-level disasters, since each instance will be running in a different zone. You are however covered in cases of Azure region disaster by utilizing Web App snapshots. Recovering from a region-level failure is a manual process and involves separate manual steps for each of your App Services (Web Apps).

Zone redundancy for each Azure services is discussed at: https://docs.microsoft.com/en-us/azure/availability-zones/az-region. Zone redundancy for Azure App Service is discussed at: https://docs.microsoft.com/en-us/azure/app-service/how-to-zone-redundancy.

Azure App Service Recovery cases

Zone-level failure

For this scenario you need to configure your App Service Plan for zone-level redundancy (single point of failure). Currently, you need to use an ARM template to create a zone redundant App Service. Once created via an ARM template, the App Service plan can be viewed and interacted with via the Azure portal and CLI tooling. An ARM template is only needed for the initial creation of the App Service plan. You need a minimum of three (3) App Service Plan instances to be covered for zone-level disasters. Otherwise, if you only have one or two instances provisioned without zone-level redundancy, you will need to manually restore your App Service Plan to a new plan in another zone in your region, if your current zone is impacted. You can do this by utilizing App Service backups or App Service snapshots. Bear in mind that snapshots do not support taking backup of the linked SQL database, which means that you should separately plan for backing app your database to be able to restore from a snapshot afterwards.

The only changes needed in an ARM template to specify a zone redundant App Service are the new zoneRedundant property (required) and optionally the App Service plan instance count (capacity) on the Microsoft.Web/serverfarms resource. If you don't specify a capacity, the platform defaults to three. The zoneRedundant property should be set to true and capacity should be set based on the workload requirement, but no less than three. A good rule of thumb to choose capacity is to ensure sufficient instances for the application such that losing one zone of instances leaves sufficient capacity to handle expected load. To decide instance capacity, you can use the following calculation:

Since the platform spreads VMs across 3 zones and you need to account for at least the failure of 1 zone, multiply peak workload instance count by a factor of zones/(zones-1), or 3/2. For example, if your typical peak workload requires 4 instances, you should provision 6 instances: (2/3 * 6 instances) = 4 instances.

With the assumption of three or more App Service instances, the Azure App Service offers Azure zone-level redundancy at the application layer. During normal operations, network traffic is load balanced between app instances in different zones of the region. If any of the zones becomes unavailable, traffic is routed to the other zones in the region. Also, application backups must always be configured as per the customer requirements.

You can start with one instance in your App Service Plan. If concurrent users or hardware resource requirements increase, the pricing plan can be upgraded accordingly (scale up or scale out). Independently of the zone-level redundancy configuration, each App service plan includes the following features:

  • Custom domains / SSL. Configure and purchase custom domains with SNI and IP SSL bindings.
  • Scale out. Up to x number of instances. Subject to availability. Remember that you need at least three (3) instances to have zone-level redundancy.
  • Staging slots. Up to x number staging slots to use for testing and deployments before swapping them into production.
  • Daily backups with agreed backup frequency and retention window.
  • Traffic manager. Improve performance and availability by routing traffic between multiple instances of your app, in case of two or more instances.
  • Storage. X number of GB disk storage shared by all apps deployed in the App Service plan.

Please note that you cannot convert an existing non-highly available (zone redundancy) App Service Plan to a zone-redundant one. You need to create a new App Service Plan via the ARM template mentioned above and migrate your App Services to that new App Service Plan.

Please also note that you can make use of App Service snapshots to restore an App Service after a zone failure, by following instructions at https://docs.microsoft.com/en-us/azure/app-service/manage-disaster-recovery . The emergency mode mentioned in this article applies to both zone-level and region-level disasters (see below). Only requirement is that you have a premium App Service plan configured (which supports App Service snapshots).

Region-level failure

Ideally you should plan for a multi-region design of your Azure App Service, by following Microsoft architecture best practices, as discussed in the following articles by using paired regions: https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/app-service-web-app/multi-region and https://docs.microsoft.com/en-us/azure/best-practices-availability-paired-regions. If due to cost or other restrictions you cannot have a multi-region design but rather you opt in for a single region App Service design, you should be aware that there is support for App Service (Web App) recovery in case of a region failure. This is regardless of whether you have one or two App Service instances without zone-level redundancy or three or more instances for a zone-redundant App Service Plan.

There is an option for regional-level redundancy in all these cases. Disaster recovery in case of a region failure is feasible based on application snapshots but requires manual effort and will incur downtime. When a disaster brings an entire Azure region offline, all App Service apps hosted in that region are placed in disaster recovery mode. Features are available to help you restore the app to a different region or recover files from the impacted app. App Service resources are region-specific and can't be moved across regions. You must restore the app to a new app in a different region, and then create mirroring configurations or resources for the new app.

To restore a Web App to a different region in case of region failure, follow the process below:

  1. Create a new App Service app in a different Azure region than the impacted app. This is the target app in the disaster recovery scenario.
  2. In the Azure portal, navigate to the impacted app's management page. In a failed Azure region, the impacted app shows a warning text. Click the warning text.
  3. In the Restore Backup page, configure the restore operation. In a disaster scenario, you can only restore the snapshot to an app in a different Azure region. For the option to Restore site configuration - choose Yes.
  4. Configure everything else in the target app to mirror the impacted app and verify your configuration.
  5. When you're ready for the custom domain to point to the target app, remap the domain name.

Last but not least, remember to always apply the following Microsoft recommendations:

Sources

https://docs.microsoft.com/en-us/azure/app-service/how-to-zone-redundancy

https://docs.microsoft.com/en-us/azure/app-service/manage-disaster-recovery

https://docs.microsoft.com/en-us/azure/app-service/web-sites-restore

https://docs.microsoft.com/en-us/azure/app-service/app-service-web-restore-snapshots