Data Mesh Topologies
As a follow up to my blog Common Data Mesh exceptions, I wanted to discuss various types of data mesh topologies I am seeing being built. I put them into three categories, but there are many variations to these three (variations mentioned in my exceptions blog). Ole Olesen-Bagneux has just posted a similar discussion on LinkedIn about data mesh types that I encourage you to check out.
On the left of the figure below, the architectures have the most centralization, and the architectures become more distributed as you move to the right:
Mesh Type 1
In this setup, all domains use identical technology, restricted to a single cloud provider’s offerings. Each domain maintains its own infrastructure but shares a central enterprise data lake, where each domain has a dedicated container or folder. This setup, common due to its performance benefits and simplified security, monitoring, and disaster recovery, involves minimal product variation within domains. Mesh Type 1 is more prevalent due to its practicality in data and integration, and is the approach usually taken with customers using Microsoft Fabric.
Mesh Type 2
Similar to Type 1 in technology use, but here, each domain possesses its own data lake. This fully decentralized approach faces challenges in linking data lakes and maintaining performance when integrating data across domains, and why I see most customers using Mesh Type 1. In my opinion, Mesh Type 2 fits closely with what Ole calls the pragmatic data- and integration mesh. Some customers are starting to use Microsoft Fabric with OneLake shortcuts to build a Mesh Type 2.
Mesh Type 3
Domains in this architecture have the freedom to choose any technology and cloud provider, with each having its separate data lake. This leads to a diverse tech environment with varying security protocols, a need for expertise in multiple products, and challenges in governance and infrastructure automation. Combining data across these varied systems is complex. Due to these challenges, Mesh Type 3, though visionary, is considered impractical for widespread adoption. I have not seen any company even attempt it. This fits into the category of what I call the pure data mesh or what Ole calls the visionary data mesh.
The bottom line is you should design an architecture that works best for your use case, based on the size, speed, and type of data. And that architecture could be a combination of certain features of a data mesh along with features of a data fabric or data lakehouse. I go into much more detail about this in my book Deciphering Data Architectures: Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data Mesh which you can order on Amazon.
James, mostly i have seen the type 1 implementation in most organizations. Typec2 and type 3 would be cost intensive