Azure database for MySQL zone and region disaster recovery
The Azure database for MySQL instance can be scaled up and down depending on your organization requirements without downtime, as described at: . This article describes all possible failure events (except for region-level failure), which are automatically managed by the Azure database for MySQL service, without admin intervention. Azure database for MySQL offers zone-level redundancy at the database layer. The following sections discuss Azure database for MySQL zone and region disaster recovery options.
- Point in Time Restore from backup.
- Geo-restore from geo-replicated backups. The Azure database for MySQL service offers geo-redundant storage (GRS) backups, which in turn allow for region-level redundancy with some expected downtime for restoring database backups from another region in case of a region failure. Azure Database for MySQL automatically creates server backups and stores them in user configured geo-redundant storage. Backups can be used to restore your server to a point-in-time. Azure Database for MySQL takes backups of the data files and the transaction log. These backups allow you to restore a server to any point-in-time within your configured backup retention period.
- Read replicas.
Azure database for MySQL disaster recovery cases
In case of a zone-level disaster, there is no administrator intervention needed. All necessary steps are carried out automatically by the Azure database for MySQL service to ensure service recovery from another availability zone in the affected region. Azure Database for MySQL is suitable for running mission critical databases that require high uptime. Built on Azure architecture, the service has inherent high availability, redundancy, and resiliency capabilities to mitigate database downtime from planned and unplanned outages, without requiring you to configure any additional components. Azure database for MySQL accounts for both planned and unplanned downtime and has built-in mechanisms to mitigate both planned and unplanned downtime.
For MySQL zone-level failures and for planned service downtime, Azure customers are fully covered and MySQL instance automatically recovers without admin intervention. The Azure database for MySQL is running as a container service within Azure ecosystem, each consisting of multiple VMs and is continuously being monitored by our Azure Infrastructure and whenever there are unhealthy instances found, they are automatically taken out of usage for repair. When this occurs, databases hosted on that VM are failed-over to their secondaries to continue processing requests. This process is completely automated, and doesn’t require any user intervention.
For better illustrating this, we can consider the following scenario. You have a single MySQL server in the West Europe region. A MySQL zone-level disaster is triggered and the corresponding failover is illustrated in the diagram below.
If there is an issue with the machine where the MySQL instance is running, Azure will automatically migrate to another MySQL server, bring up a new MySQL instance and attach it to the storage. If data is corrupted, Microsoft always keep 3 copies for the MySQL service in geo-redundant storage, so the MySQL instance is attached to another storage. These processes normally take a couple of seconds.
In case of a region-level disaster, you should apply one of the aforementioned recovery options, i.e. geo-restore from geo-replicated backups or usage of read replicas. Geo-restore is only possible if you provisioned the server with geo-redundant backup storage. If you wish to switch from locally redundant to geo-redundant backups for an existing server, you must take a dump using mysqldump of your existing server and restore it to a newly created server configured with geo-redundant backups. Failure of a region is a rare event.
However, if you need protection from a region failure, you can configure one or more read replicas in other regions for disaster recovery (DR). Details about creating and managing read replicas can be found in this article. In the event of a region-level failure, you can manually promote the read replica configured on the other region to be your production database server. Bear in mind that if using read replicas, you will be charged with additional costs.
For MySQL region-level failures, when MySQL read replicas are not available and you are running with MySQL single server, the restoration process is manual, and it works just like the point in time restore, here’s the documentation that you should follow for a manual restoratoin: Backup and restore – Azure Database for MySQL | Microsoft Docs. As per https://docs.microsoft.com/en-us/azure/mysql/concepts-high-availability, MySQL replica is not required, however, the geo-restore is a process that takes a very long time, so in case you have a big MySQL instance, Microsoft recommends that you have a cross region replica running, so you will not get a very long downtime, while performing the restore to that region. In this case, when the original region goes offline, you should trigger a geo-restore and this will rebuild all the data in a new Azure region and attach it to a new MySQL instance.
For MySQL region-level failures, when MySQL read replicas are running, all you have to do is stop MySQL replication. As soon as you stop replication, the replica will become a standalone server, and your MySQL operations can continue, so there is no need for a geo-restore.
In case of logical/user errors in the database schema or data, such as accidentally dropped tables or incorrectly updated data, involves performing a point-in-time recovery (PITR), by restoring and recovering the data until the time just before the error had occurred. If you want to restore only a subset of databases or specific tables rather than all databases in the database server, you can restore the database server in a new instance, export the table(s) via mysqldump, and then use restore to restore those tables into your database.
In Azure database for MySQL, there are two types of backup and restore available:
- Point-in-time restore is available with either backup redundancy option and creates a new server in the same region as your original server utilizing the combination of full and transaction log backups.
- Geo-restore is available only if you configured your server for geo-redundant storage and it allows you to restore your server to a different region utilizing the most recent backup taken.