Distributed Writes

In SQL Server, scaling out reads (i.e. using Active secondary replicas via AlwaysOn Availability Groups) is a lot easier than scaling out writes. So what are your options when you have a tremendous amount of writes that scaling up will not handle, no matter how big your server is? There are a number of options that allow you to write to many servers (instead of writing to one master server) that I’ll call distributed writes. Here are some ideas:

Peer-to-Peer transactional replication (or Multi-master replication) with SQL Server. See Peer-to-Peer – Transactional Replication
Sharding in Azure SQL Database via elastic database tools which requires coding. See Building scalable cloud databases and the video Horizontal Scaling in Azure SQL Database. You can also implement sharding in code for SQL Server
Using Azure SQL Data Sync for Azure SQL databases and SQL Server instances
Merge replication in SQL Server. See Merge Replication
Create a messaging and queuing application in SQL Server Service Broker where all writes are placed on the queue and sent to different servers
Create a message queue using an asynchronous Azure Event Hub
Use a 3rd party product: Attunity Replicate for SQL Server (now Qlik Replicate), ScaleArc for SQL Server, StarQuest Data Replication
Instead of using SQL Server, use a NoSQL or multi-model database service like Azure Cosmos DB (no coding involved – think of it as auto-sharding), which now supports multi-master

Merge replication vs peer-to-peer replication

What is the difference between using merge replication and peer-to-peer replication? The main difference is that for merge replication there is only one publisher and one or more subscribers, but in peer-to-peer replication all nodes are both publishers and subscribers.

Secondly, peer-to-peer replication is transactional which means it transmits transactionally consistent changes. In contrast, merge replication is trigger based. In the background implementation they also use different agents.

Merge replication has conflict resolution (you can specify conflict resolution priority), but peer-to-peer doesn’t. During a conflict peer-to-peer generates an alert if conflict resolution is enabled and stops replication while allowing both instances to work independently till the conflict is solved. In production, it is advisable to do schema changes only from the original node.

In peer-to-peer replication all nodes are identical while in merge they can differ (subscribers can get different data from the publisher).

They both are basically doing the same job – providing scale-out, disaster recovery, and in some cases where updates are rare and locks do not bother that much, also high availability by providing data redundancy. Sometimes, peer-to-peer is related as the replacement for the merge replication.

In SQL Server 2005 (9.x) and later versions, this topology is also supported by peer-to-peer transactional replication, but bidirectional replication can provide improved performance. See New Whitepaper: Deciding Between Merge and Peer-to-Peer Replication – SQL Server Best Practices (sqlbp.com) and Considerations on Bidirectional Transactional Replication.

Cosmos DB

The one option out of all the above options that does not require coding and can support a large number of writes per second is Azure Cosmos DB. All the other options can require significant coding and/or can only handle a limited amount of writes per second. This is because Cosmos DB uses documents (JSON files) where all the information needed is included in that document so no joins are needed and documents can be spread on multiple servers (see Partition and scale in Azure Cosmos DB and A technical overview of Azure Cosmos DB). This is opposed to relational databases that use multiple tables that must be joined. If the tables are on different nodes that will cause a lot of data shuffling causing performance problems.

To go into greater detail on the benefits of Cosmos DB over SQL Server for distributed writes:

Consistency
- Peer2Peer SQL Replication introduces pains around data consistency and conflict resolution
Availability
- Sharding with SQL introduces pains around maintaining availability when increasing/decreasing the degree of scale-out. Frequently, downtime is involved due to needs to re-balancing data across shards
- SQL requires rigid schemas and indices to be defined upfront. Every time schema and index updates are needed – you will incur a heavy operational cost of running Create Index and Alter Tables scripts across all database shards and replicas. Furthermore, this introduces availability issues as schemas are being altered.
Handling sustained heavy write ingestion
- Putting queueing mechanisms in front of SQL only gives you a buffer for handling spikes in writes, but at the end of the day, the database itself needs to support sustained heavy write ingestion in order to consume the buffered events. What happens if events come in to the buffer faster than you drain it? You will need a database specifically designed for heavy write ingestion

Azure Cosmos DB solves these by:

Providing 5 well-defined consistency models to help developers tune the right Consistency vs Performance tradeoffs for their scenario
Scale on demand and support for flexible data model while maintaining high availability (99.99% availability SLA). Scaling out and partition management is taken care of by the service on behalf of the user
Use of log-structured techniques to be a truly latch-free database to sustain heavy write ingestion with durable persistence

In the end, eliminating schema, index management, and JOINs are a necessary byproduct of scale out that Azure Cosmos DB provides.

After the initial post of this blog, I received the question “Why not just use SQL 2016 in-Memory tables for heavy write systems (info)?” and received a great reply from a Cosmos DB product manager:

SQL in-memory is only viable when:

Request and data volume are small enough to fit on a single machine. You still have the fundamental problem of hard limits due to scale-up model.
Scenario does not need durability, reliability, or availability – which are requirements for >99% of mission critical customer scenarios.

Durability

If data is kept in only in-memory, you experience data loss upon any intermittent failure that requires computer to restart (e.g. os crash, power outage, os decides it wants to reboot to update, etc.). In order for data to be durable, it needs to be persisted to disk. In order to offer resiliency against disk failures, you will need to replicate to a number of disks
For durable scenarios – memory only acts as a buffer to absorb spikes. In order to achieve sustained write ingestion – you will need to flush the buffer as fast as you input into the buffer. Now you have a bottleneck on disk i/o unless you scale-out
This is why they immediately have to address that this is for “applications where durability is not required”; durability is a requirement for >99% of data scenarios
Data loss and data corruption should be treated as cardinal sin for any data product

Scale

This model is still a scale-up model – in which there are many hard limits
What happens for data volume that doesn’t fit in memory (which tends to be very small size relative to disk storage)? You need to scale-out
What happens for request volume that memory bandwidth is inadequate? You need to scale out
This is why the throughput numbers in the blog are orders of magnitude smaller than what customers are doing everyday on Cosmos DB, and talking about storage size is quietly ignored

Expensive

Memory is 100x more expensive than SSD. Achieving high storage in a scale-out system will yield not only better scale and durability characteristics – but incur much lower costs for any large-scale scenarios

More info:

Database Sharding

I received a great reply to this question from a Cosmos DB product manager (and have added to the blog post):

SQL in-memory is only viable when:
• Request and data volume are small enough to fit on a single machine. You still have the fundamental problem of hard limits due to scale-up model.
• Scenario does not need durability, reliability, or availability – which are requirements for >99% of mission critical customer scenarios.

Durability
• If data is kept in only in-memory, you experience data loss upon any intermittent failure that requires computer to restart (e.g. os crash, power outage, os decides it wants to reboot to update, etc.). In order for data to be durable, it needs to be persisted to disk. In order offer resiliency against disk failures, you will need to replicate to a number of disks.
• For durable scenarios – memory only acts as a buffer to absorb spikes. In order to achieve sustained write ingestion – you will need to flush the buffer as fast as you input into the buffer. Now you have a bottleneck on disk i/o unless you scale-out.
• This is why they immediately have to address that this is for “applications where durability is not required”; durability is a requirement for >99% of data scenarios.
• Data loss and data corruption should be treated as cardinal sin for any data product.
Scale
• This model is still a scale-up model – in which there hard many hard limits.
• What happens for data volume that doesn’t fit in memory (which tends to be very small size relative to disk storage)? You need to scale-out.
• What happens for request volume that memory bandwidth is inadequate? You need to scale out.
• This is why the throughput numbers in the blog are orders of magnitude smaller than what customers are doing everyday on Cosmos DB, and talking about storage size is quietly ignored.
Expensive
• Memory is 100x more expensive than SSD. Achieving high throughput in a scale-out system will yield not only better scale and durability characteristics – but incur much lower costs for any large-scale scenarios.

Reply ↓

Comments

Distributed Writes — 9 Comments

JRStern on September 21, 2017 at 12:27 pm said:

Very thought provoking, thanks. Y’know, it occurs to me that another strategy is possible, which is writing much of your transaction to a simplified write cache table instead of spreading it across ten normalized tables. That way you stay within a primary Azure SQL database but the simplified logic gives you more throughput. You clean it up minutes or hours later, as the case may be.

It does seem that SQL Server / Azure SQL has some absolute bottlenecks on writes having to do with logging, and on heavy write systems these can become an issue, but the numbers are not well documented, especially on newer systems using SSDs.

Reply ↓
- James Serra on September 22, 2017 at 7:27 pm said:
  
  Glad it was thought provoking, as that was my main goal with the post. Your idea is similar to the queue’s I mentioned, which have the problem of a limit on how much the queue can hold plus the delay in “cleaning up the queue”. Thanks for your comment!
  
  Reply ↓
STomms on September 22, 2017 at 11:58 am said:

Why not just use SQL 2016 in-Memory tables for heavy write systems? Its even more powerful for non-durable tables. https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/12/02/transaction-commit-latency-acceleration-using-storage-class-memory-in-windows-server-2016sql-server-2016-sp1/

Reply ↓
- JRStern on September 22, 2017 at 8:31 pm said:
  
  I’ve been very disappointed with all my proof of concept attempts to use in-memory tables. The speed increase has been very modest, and the bother to avoid the various limitations much larger than I dreamed.
  
  I’m not even clear on whether using memory tables avoids any bottlenecks on logging, only on delayed persistence of the main db.
  
  Reply ↓
- James Serra on September 22, 2017 at 8:48 pm said:
  
  I received a great reply to this question from a Cosmos DB product manager (and have added to the blog post):
  
  SQL in-memory is only viable when:
  • Request and data volume are small enough to fit on a single machine. You still have the fundamental problem of hard limits due to scale-up model.
  • Scenario does not need durability, reliability, or availability – which are requirements for >99% of mission critical customer scenarios.
  
  Durability
  • If data is kept in only in-memory, you experience data loss upon any intermittent failure that requires computer to restart (e.g. os crash, power outage, os decides it wants to reboot to update, etc.). In order for data to be durable, it needs to be persisted to disk. In order offer resiliency against disk failures, you will need to replicate to a number of disks.
  • For durable scenarios – memory only acts as a buffer to absorb spikes. In order to achieve sustained write ingestion – you will need to flush the buffer as fast as you input into the buffer. Now you have a bottleneck on disk i/o unless you scale-out.
  • This is why they immediately have to address that this is for “applications where durability is not required”; durability is a requirement for >99% of data scenarios.
  • Data loss and data corruption should be treated as cardinal sin for any data product.
  Scale
  • This model is still a scale-up model – in which there hard many hard limits.
  • What happens for data volume that doesn’t fit in memory (which tends to be very small size relative to disk storage)? You need to scale-out.
  • What happens for request volume that memory bandwidth is inadequate? You need to scale out.
  • This is why the throughput numbers in the blog are orders of magnitude smaller than what customers are doing everyday on Cosmos DB, and talking about storage size is quietly ignored.
  Expensive
  • Memory is 100x more expensive than SSD. Achieving high throughput in a scale-out system will yield not only better scale and durability characteristics – but incur much lower costs for any large-scale scenarios.
  
  Reply ↓
  - JRStern on September 22, 2017 at 9:07 pm said:
    
    So then is the idea of Cosmos DB that it helps to scale in-memory? I had no idea that was the point. But what about when it’s then time to query the data, does it then also need to be persisted back to a relational database? Or does it all have to be memory resident across N servers in Cosmos DB? I think Microsoft has not made any of this clear – certainly not to me! And I did look, somewhat, a few months ago.
    
    I also looked back at the link SThomms posted, and is SCM mentioned actually available? I don’t think it is, Intel has pushed off theirs in Purley until next year and I’ve heard nothing about the non-Intel alternatives.
    
    Reply ↓
    - James Serra on September 22, 2017 at 10:27 pm said:
      
      To clarify – the previous comment illustrates why SQL’s in-memory solution is not a general-purpose solution to the problem context for this blog post (how to build a general-purpose solution to support a sustained high rate of write ingestion).
      
      Azure Cosmos DB is a general-purpose distributed database, that happens to address supporting a sustained high rate of ingestion incredibly well. In fact, Azure Cosmos DB had its birth out of the same product engineering team as SQL – in which one of the goals for the team was to solve the scale-out problem more holistically (as opposed to addressing scale-out as an afterthought as has been done in the traditional RDBMS space). By getting a clean slate to re-design our solution, we were freed from the burdens of having to support a legacy design. To read more, check out this blog post: https://azure.microsoft.com/en-us/blog/a-technical-overview-of-azure-cosmos-db/
      
      Reply ↓
Pingback:Distributed Database Writes – Curated SQL
Pingback:Use cases of various products for a big data cloud solution | James Serra's Blog

James Serra's Blog

Big Data and Data Warehousing

Merge replication vs peer-to-peer replication

Cosmos DB

Comments

Distributed Writes — 9 Comments

Leave a Reply Cancel reply

Merge replication vs peer-to-peer replication

Cosmos DB

Share:

Leave a Reply Cancel reply