Understanding Cosmos DB | James Serra's Blog

Cosmos DB is an awesome product that is mainly used for large-scale OLTP solutions. Any web, mobile, gaming, and IoT application that needs to handle massive amounts of data, reads, and writes at a globally distributed scale with near-real response times for a variety of data are great use cases (It can be scaled-out to support many millions of transactions per second). Because it fits in the NoSQL category and is a scale-out solution, it can be difficult to wrap your head around how it works if you come from the relational world (i.e. SQL Server). So this blog will be out the differences in how Cosmos DB works.

First, a quick comparison of terminology to help you understand the difference (see Work with databases, containers, and items in Azure Cosmos DB):

RDBMS	Cosmos DB (SQL API)	Cosmos DB (Gremlin/Graph API)
Database	Database	Database
Table(s), view(s)	Container/Collection	Graph
Row	Item/Document (JSON)	Node/Edge
Column	Property	Property
Foreign Key	Reference	Edge
Join	Embedded document	.out()
Sharding Key	Partition Key	Partition Key

From Welcome to Azure Cosmos DB and other documentation, here are some key points to understand:

You can distribute your data to any number of Azure regions, with the click of a button. This enables you to put your data where your users are, ensuring the lowest possible latency to your customers
When a new region gets added, it is available for operations within 30 minutes anywhere in the world (assuming your data is 100 TBs or less).
To control exact sequence of regional failovers in cases of an outage, Azure Cosmos DB enables you to associate a priority with various regions associated with the database account
Azure Cosmos DB enables you to configure the regions (associated with the database) for “read”, “write” or “read/write” regions.
For Cosmos DB to offer strong consistency in a globally distributed setup, it needs to synchronously replicate the writes or to synchronously perform cross-region reads. The speed of light and the wide area network reliability dictates that strong consistency will result in higher latencies and reduced availability of database operations. Hence, in order to offer guaranteed low latencies at the 99th percentile and 99.99% availability for all single region accounts and all multi-region accounts with relaxed consistency, and 99.999% availability on all multi-region database accounts, it must employ asynchronous replication. This in-turn requires that it must also offer well-defined, relaxed consistency model(s) – weaker than strong (to offer low latency and availability guarantees) and ideally stronger than “eventual” consistency (with an intuitive programming model)
Using Azure Cosmos DB’s multi-homing APIs, an app always knows where the nearest region is and sends requests to the nearest data center. All of this is possible with no config changes. You set your write-region and as many read-regions as you want, and the rest is handled for you
As you add and remove regions to your Azure Cosmos DB database, your application does not need to be redeployed and continues to be highly available thanks to the multi-homing API capability
You have to configure the preferred location array or current location on the connection policy to inform which set of regions the client connects to. The multi-homing API is actually the mechanism for the client knowing which region in that array is valid. So if you configure the client to connect to [West US, East US, East US2] in that order, the multi-homing API informs the client things like East US and East US2 are online, however West US was not configured on the server or there is an outage, so the client automatically performs failovers to an online region. The app is going to be deployed to an app service, such as AKS, VM, or some sort of compute infrastructure. In order to deploy the app to any of the above azure services, you must define a region (e.g. app will be deployed in AKS cluster in West US). You can deploy apps to multiple regions (e.g. stand up AKS in East US and West US). East US AKS should prefer talking to Cosmos DB East US. West US AKS should prefer talking to Cosmos DB West US. The end user of the app should be routed to East US AKS vs West US AKS based on something like Traffic Manager
It supports multiple data models, including but not limited to document, graph, key-value, table, and column-family data models
APIs for the following data models are supported with SDKs available in multiple languages: SQL API, MongoDB API, Cassandra API, Gremlin API, Table API
99.99% availability SLA for all single region database accounts, and all 99.999% read availability on all multi-region database accounts. Deploy to any number of Azure regions for higher availability and better performance
For a typical 1KB item, Cosmos DB guarantees end-to-end latency of reads under 10 ms and indexed writes under 15 ms at the 99th percentile within the same Azure region. The median latencies are significantly lower (under 5 ms). So you will want to deploy your app and your database to multiple regions to have users all over the world have the same low latency. If you have an app in one region but the Cosmos DB database in another, then you will have additional latency between the regions (see Azure Latency Test to determine what that latency would be, or go to see existing latency via the Azure Portal and choose Azure Cosmos DB then choose your database then choose Metrics -> Consistency -> SLA -> Replication latency)
Developers reserve throughput of the service according to the application’s varying load. Behind the scenes, Cosmos DB will scale up resources (memory, processor, partitions, replicas, etc.) to achieve that requested throughput while maintaining the 99th percentile of latency for reads to under 10 ms and for writes to under 15 ms. Throughput is specified in request units (RUs) per second. The number of RUs consumed for a particular operation varies based upon a number of factors, but the fetching of a single 1KB document by id spends roughly 1 RU. Delete, update, and insert operations consume roughly 5 RUs assuming 1 KB documents. Big queries and stored procedure executions can consume 100s or 1000s of RUs based upon the complexity of the operations needed. For each collection (bucket of documents), you specify the RUs
Throughput directly affects how much the user is charged but can be tuned up dynamically to handle peak load and down to save costs when more lightly loaded by using the Azure Portal, one of the supported SDKs, or the REST API
Request Units (RU) are used to guarantee throughput in Cosmos DB. You will pay for what you reserve, not what you use. RUs are provisioned by region and can vary by region as a result. But they are not shared between regions. This will require you to understand usage patterns in each region you have a replica
For applications that exceed the provisioned request unit rate for a container, requests to that collection are throttled until the rate drops below the reserved level. When a throttle occurs, the server preemptively ends the request with RequestRateTooLargeException (HTTP status code 429) and returns the x-ms-retry-after-ms header indicating the amount of time, in milliseconds, that the user must wait before reattempting the request. So, you will get 10ms reads as long as requests stay under the set RU’s
Cosmos DB provides five consistency levels: strong, bounded-staleness, session, consistent prefix, and eventual. The further to the left in this list, the greater the consistency but the higher the RU cost which essentially lowers available throughput for the same RU setting. Session level consistency is the default. Even when set to lower consistency level, any arbitrary set of operations can be executed in an ACID-compliant transaction by performing those operations from within a stored procedure. You can also change the consistency level for each request using the x-ms-consistency-level request header or the equivalent option in your SDK
Azure Cosmos DB accounts that are configured to use strong consistency cannot associate more than one Azure region with their Azure Cosmos DB account
There is not support for GROUP BY or other aggregation functionality found in database systems (workaround is to use Spark to Cosmos DB connector)
No database schema/index management – it automatically indexes all the data it ingests without requiring any schema or indexes and serves blazing fast queries. By default, every field in each document is automatically indexed generally providing good performance without tuning to specific query patterns. These defaults can be modified by setting an indexing policy which can vary per field.
Industry-leading, financially backed, comprehensive service level agreements (SLAs) for availability, latency, throughput, and consistency for your mission-critical data
There is a local emulator running under MS Windows for developer desktop use (was added in the fall of 2016)
Capacity options for a collection: Fixed (max of 10GB and 400 – 10,000 RU/s), Unlimited (1,000 – 100,000 RU/s). You can contact support if you need more than 100,000 RU/s. There is no limit to the total amount of data or throughput that a container can store in Azure Cosmos DB
Costs: SSD Storage (per GB): $0.25 GB/month; Reserved RUs/second (per 100 RUs, 400 RUs minimum): $0.008/hour (for all regions except Japan and Brazil which are more)
Global distribution (also known as global replication/geo-redundancy/geo-replication) is for delivering low-latency access to data to end users no matter where they are located around the globe and for adding regional resiliency for business continuity and disaster recovery (BCDR). When you choose to make containers span across geographic regions, you are billed for the throughput and storage for each container in every region and the data transfer between regions
Cosmos DB implements optimistic concurrency so there are no locks or blocks but instead, if two transactions collide on the same data, one of them will fail and will be asked to retry
You can set up a policy to geo-fence a database to specific regions. This geo-fencing capability is especially useful when dealing with data sovereignty compliance that requires data to never leave a specific geographical boundary
Backups are taken every four hours and two are kept at all times. Also, in the event of database deletion, the backups will be kept for thirty days before being discarded. With these rules in place, the client knows that in the event of some unintended data modification, they have an eight-hour window to get support involved and start the restore process
Cosmos DB is an Azure data storage solution which means that the data at rest is encrypted by default and data is encrypted in transit. If you need Role-Based Access Control (RBAC), Azure Active Directory (AAD) is supported in Cosmos DB
Within Cosmos DB, partitions are used to distribute your data for optimal read and write operations. It is recommended to create a granular key with highly distinct values. The partitions are managed for you. Cosmos DB will split or merge partitions to keep the data properly distributed. Keep in mind your key needs to support distributed writes and distributed reads
Until recently, writes could only be made to one region. But now in private preview is writes to multi regions. See Multi-master at global scale with Azure Cosmos DB. With Azure Cosmos DB multi-master support, you can perform writes on containers of data (for example, collections, graphs, tables) distributed anywhere in the world. You can update data in any region that is associated with your database account. These data updates can propagate asynchronously. In addition to providing fast access and write latency to your data, multi-master also provides a practical solution for failover and load-balancing issues. To compare scaling out writes with Cosmos DB versus SQL Server check out Distributed Writes
New Azure Cosmos DB Explorer is in public preview – A full screen standalone web-based version of the Data Explorer many of you already use in Azure Portal for Cosmos DB
A big difference is with Cosmos DB you will create a denormalized data model. Take a person record for example. You will embed all the information related to a person, such as their contact details and addresses, into a single JSON document. Retrieving a complete person record from the database is now a single read operation against a single container and for a single item. Updating a person record, with their contact details and addresses, is also a single write operation against a single item. By denormalizing data, your application typically will have better read performance and allow for a scale-out architecture since you don’t need to join tables. Check out this video Data modelling and partitioning in Azure Cosmos DB: What every relational database user needs to know
Embedding data works nicely for many cases but there are scenarios when denormalizing your data will cause more problems than it is worth. In a document database, you can have information in one document that relates to data in other documents. It is not recommend to build systems that would be better suited to a relational database in Azure Cosmos DB, but simple relationships are fine and you can create a normalized data model for them, with the tradeoff that it can require more round trips to the server to read data (but improve the efficiency of write operations since less data is written). In general, use normalized data models when: representing one-to-many relationships, representing many-to-many relationships, or when related data changes frequently
When using a normalized data model, your application will need to handle creating the reference document. One way would be to use a change feed that triggers on the creation of a new document – the change feed essentially triggers an Azure function that creates the relationship record
When using a normalized data model, your application will need to query the multiple documents that need to be joined (costing more money because it will use more RUs), and do the joining within the application (i.e. join a main document with documents that contain the reference data) as you cannot do a “join” between documents within Cosmos DB. Since every time you display a document it needs to search the entire container for the name, it would be best to put the other document type (the reference data) in a different container so you can have different partition keys for each type. Another option is to create a separate collection to satisfy specific queries. For example, having a collection for products based on category and another collection for products based on geography. Both of those collections for my query/app are being sourced from one that is my “main” or “source” collection that is being updated (front end, or another app) and the change feed attached to that pushes out to my other collections that I use for my queries. This means duplicating data, but storage is cheap and you save costs to retrieve data (think of those extra collections as covering indexes in the relational database world). Be aware that while each read is guaranteed under 10ms, if you have to read multiple documents to get all the information you need and join it together, then to display a record it could take a lot longer
The key in deciding when to use a normalized data model is how frequently the data will change. If the data only changes once a year it may not be worthwhile to create a reference document and instead just do an update to all the documents. But be aware that the update has to be done from the client side spread over the affected documents, doing it in batches as one big UPDATE statement does not exist in Cosmos DB. You will need to retrieve the entire document from Cosmos DB, update the property/properties in your application and then call the ‘Replace’ method in the Cosmos DB SDK to replace the document in question (see CosmosDb – Updating a Document (Partially))
Because there is currently no concept of a constraint, foreign-key or otherwise, any inter-document relationships that you have in documents are effectively “weak links” and will not be verified by the database itself. If you want to ensure that the data a document is referring to actually exists, then you need to do this in your application, or through the use of server-side triggers or stored procedures on Azure Cosmos DB

Azure Cosmos DB allows you to scale throughput (as well as, storage), elastically across any number of regions depending on your needs or demand.

Azure Cosmos DB distributed and partitioned collections

The above pictures shows a single Azure Cosmos DB container horizontally partitioned (across three resource partitions within a region) and then globally distributed across three Azure regions

An Azure Cosmos DB container gets distributed in two dimensions (i) within a region and (ii) across regions. Here’s how (see Partition and scale in Azure Cosmos DB for more info):

Local distribution: Within a single region, an Azure Cosmos DB container is horizontally scaled out in terms of resource partitions. Each resource partition manages a set of keys and is strongly consistent and highly available being physically represented by four replicas also called a replica-set and state machine replication among those replicas. Azure Cosmos DB is a fully resource-governed system, where a resource partition is responsible to deliver its share of throughput for the budget of system resources allocated to it. The scaling of an Azure Cosmos DB container is transparent to the users. Azure Cosmos DB manages the resource partitions and splits and merges them as needed as storage and throughput requirements change
Global distribution: If it is a multi-region database, each of the resource partitions is then distributed across those regions. Resource partitions owning the same set of keys across various regions form a partition set (see preceding figure). Resource partitions within a partition set are coordinated using state machine replication across multiple regions associated with the database. Depending on the consistency level configured, the resource partitions within a partition set are configured dynamically using different topologies (for example, star, daisy-chain, tree etc.)

The following links can help with understanding the core concepts better: Request units in Azure Cosmos DB, Performance tips for Azure Cosmos DB and .NET, Tuning query performance with Azure Cosmos DB, Partitioning in Azure Cosmos DB using the SQL API, Leverage Azure CosmosDB metrics to find issues.

The following links will help with migrating SQL Server to Cosmos DB: Understanding the differences between NoSQL and relational databases, Data modeling in Azure Cosmos DB, How to model and partition data on Azure Cosmos DB using a real-world example, Migrate relational data into Azure Cosmos DB using Azure Data Factory.

You can Try Azure Cosmos DB for Free without an Azure subscription, free of charge and commitments. For a good training course on Cosmos DB check out Developing Planet-Scale Applications in Azure Cosmos DB and Learning Azure Cosmos DB.

More info:

Relational databases vs Non-relational databases

A technical overview of Azure Cosmos DB

Azure Cosmos DB vs. SQL Server: Scalability Differences

Introduction to SQL for Cosmos DB

Overview of Azure Cosmos DB

Cosmic drawings illustrating the main concepts of Azure Cosmos DB

How does consistency levels affects the latency of Azure CosmosDB

Data Modeling and Partitioning for Relational Workloads