Syncing SQL databases
There are various ways to keep SQL Server databases and SQL Database/SQL Managed Instance (SQL MI) databases in-sync asynchronously that I will discuss in this blog. The main use cases for syncing databases are to distribute reads (i.e. a second copy for reporting), for a hybrid application (having some data on-prem and some in Azure), for a globally distributed application (reduce network latency by having data in a region close to you), or to keep a test or development database in sync with a production database. The options below are not for disaster recover (DR), unless indicated otherwise.
Some of these options where discussed in my blog Distributed Writes, which focused on scaling out database writes.
- An ETL tool such as Azure Data Factory (ADF) or SQL Server Integration Services (SSIS). This will require building of the solution, which could be complex due to having to determine what records have changed in the source database (see Methods for populating a data warehouse)
- Azure SQL Data Sync: This is a Microsoft Windows Azure web service that provides bi-directionally data synchronization capabilities for SQL databases. It allows data to be synchronized between on-premises SQL Server databases and Azure SQL databases; in addition, it can also keep multiple Azure SQL databases in sync. It does not support Azure SQL Managed Instance at this time. Note there is a minimum 5-minute frequency between syncs
- SQL Server Always On availability group (which supports Hybrid, Cross-OS and Distributed), SQL Database Active Geo-Replication, or SQL Managed Instance Auto-failover group: One-way replication that automatically copies your data to one or more secondary replicas. Note this could be used for DR
- SQL Server Replication: There are six types of replication to copy data from one database to another. SQL Server supports all six. Azure SQL Database can only be used as the push subscriber in a one-way transactional or snapshot replication topology (see Replication to Azure SQL Database). The SQL Server instance acting as publisher and/or distributor can be an instance of SQL Server running on-premises, an Azure SQL Managed Instance, or an instance of SQL Server running on an Azure virtual machine in the cloud. In preview is the ability of SQL MI to publish to an on-prem SQL Server as well as allowing an on-prem SQL Server to publish to SQL MI
- SQL Server backup/restore, export/import of a bacpac file for SQL Server/SQL Database/SQL MI, or a database snapshot for SQL Server are options with the drawbacks of doing a full copy instead of just an incremental copy, and having longer periods where the destination database is not available. This could be used as an inexpensive solution for DR, with the tradeoff of possible data loss and long downtime
- Use a 3rd party product: Attunity Replicate for SQL Server (now Qlik Replicate), striim, ScaleArc for SQL Server, StarQuest Data Replication, RedGate SQL Data Compare. Or product-specific replication, such as DBAmp for Salesforce to SQL Server or bryteflow for SAP to SQL Server
- Azure Database Migration Service (DMS) has an online (continuous sync) option for copying data from SQL Server to Azure SQL Database or SQL MI with minimal downtime
- If you need to copy data for latency and/or data sovereignty reasons, Azure Arc might be a solution (see Azure Stack and Azure Arc for data services)
- Create a messaging and queuing application in SQL Server Service Broker where all writes are placed on the queue and sent to different servers. But a warning that the broker is slow and is rarely used by customers
- Create a message queue using an asynchronous Azure Event Hub or Azure Service Bus or Azure Event Grid (see Choose between Azure messaging services – Event Grid, Event Hubs, and Service Bus)
- All of the above options require no changes to your source database. If you are in a position that would allow you to spend the time to change your database to another solution, then Cosmos DB could be used to greatly reduce the complexity of copying data since it can transparently replicate data to all the regions you specify nearly instantaneously with no downtime
When discussing which of these options may be best for your situation, make sure you understand, for each option, if it is push/pull, if it requires new infrastructure, the data latency, and the cost.
Great content, May I know the difference between snowflake and ADF?