Power BI as a enterprise data warehouse solution?
With Power BI continuing to get many great new features, including the latest in Datamarts (see my blog Power BI Datamarts), I’m starting to hear customers ask “Can I just build my entire enterprise data warehouse solution in Power BI”? In other words, can I just use Power BI and all its built-in features instead of using Azure Data Lake Gen2, Azure Data Factory (ADF), Azure Synapse, Databricks, etc? The short answer is “No”.
First off, as far as Datamarts, I do not recommend using this new capability as a compete or replacement for an existing data warehouse. Datamart is a self-service capability recommended for business users/analysts that can connect to many data sources or any data warehouse to pull data to create a Datamart for business use. On the other hand, IT departments build and manage a data warehouse. I think in many cases you will see Datamarts pulling data from a enterprise data warehouse, but I can also see Datamarts pulling from data sources that are not yet in the data warehouse (in which case a Datamart could then possibly be used as a source to a enterprise data warehouse).
The bottom line is a Datamart is an evolution of the Power BI service best used for the “departmental citizen analyst” persona and are ideal for smaller workloads, while DBAs, data engineers, and data architects should use Azure Synapse and other powerful Azure tools for larger workloads.
Here is a summary of the major reasons why Power BI should not be used for enterprise data warehouses:
ETL: Power BI allows for self-service data prep via Dataflows, but it is not nearly as robust as ADF. ADF has two kinds of data flows: Mapping data flows and Wrangling data flows. ADF wrangling data flows and Power BI dataflows are very similar and both use the Power Query engine. However, ADF wrangling data flows are more powerful – see How Microsoft Power Platform dataflows and Azure Data Factory wrangling dataflows relate to each other. And ADF mapping data flows are way more powerful than Power BI dataflows – see Power BI Dataflows vs ADF Mapping Data Flows.
Data size – Datamarts support a max data size of 100GB
Performance tuning – There are no knobs to turn to improve performance in Power BI, which is fine for small workloads, but essential for large workloads (yes, a DBA is still needed for things that you can do in Synapse, like index creation, workload management, adjusting SQL syntax, etc.)
Inconsistent performance: While Power BI offers a central modeling layer that can include disparate data sources, it will fall back on a data source’s capability – for example, whether they can support the predicate pushdown for DirectQuery or not. Hence reporting/model building efforts will be an inconsistent performance experience. Trying to put an entire corporate strategy around this model gives little space for citizen ETL/ELT developers who don’t want visualization/dashboards as a final outcome (for example, the data should be surfaced in the CRM system or a one-time answer).
Computing resources: Since Power BI is model centric, the compute will have limitations: Model size limits have upper bounds (1Gb, 100 Gb, 400 Gb) and capacity SKUs ranges: EM1 to EM3, P1/A4 to P5/A8) with total vCores (1 to 128). On the other hand, Synapse supports Massively Parallel Processing and is capable of processing and storing huge volumes of data at scale (tera/peta bytes). It also has a resource model via workload classifiers and static/dynamic resource allocation. It addresses complex scenarios where heavy data processing can occur at the same time as high concurrency for reporting – all under different workload isolations. This level of compute-selection complexity is not present in Power BI. Then there is the additional features you get with ADF: Different groups can have their own TCO-based compute for performing their data processing and cleansing, or tie it in with other components such as AI/ML workbooks who’s net output can be Power BI. ADF has much more compute available than Power BI by way of clusters, and has compute environments to process or transform data, all of which will shut down once it’s inactive.
Different universes: If you are using Power BI to build your solution, then obviously you are limited to just the features within Power BI. Azure, on the other hand, is designed for scale and most importantly, integration within the Microsoft platform and other platforms. There is a supermarket selection of services to choose from in the Azure marketplace (even such options as a VM with SQL Server). In the Power BI world, you are limited to what’s provided for you, which can be a blessing or curse depending on what you want to build. Azure options allow you to integrate with first-party services (i.e Databricks) and 3rd-party services (i.e Profisee for Master Data Management), which is quite important for an enterprise solution, which is very different than the strategy Power BI offers.
Who will use it? Just one department, or the entire company? Power BI is focused on use for a department. Synapse is focused on use for the entire company, and has features to accommodate this such as source control to support a large amount of people using it.
Finally, in the Power BI Datamart scenario you also lose all the goodness that you get from the data lake and data warehouse, such as:
- Data exhaust: You are only really storing the curated datasets in the Datamart – you lose all of the raw data if you go straight to the Datamart
- Flexibility: Anytime you are accessing the data for data science or ad-hoc workloads, you are accessing Power BI compute. This is either going to be expensive or throttled and most likely both
- Features: You lose out on the ability to do complex transformations or near-realtime datasources
In the end I think Datamarts are a great solution as a presentation layer for small (under 100GB) curated datasets which were built from data that had been ingested and integrated in Synapse or Databricks. Also, Power BI can be a great option for building a POC and/or getting a quick win, then using that POC as the blueprint/requirements for building an enterprise solution.
Hopefully this blog helped you to understand when is it better to build a solution via “self service” in Power BI versus having IT build an enterprise solution using tools like Synapse. Would love to hear your thoughts!
More info:
What is the point of a data warehouse if Power BI has ETL Capabilities?
James, you have a typo, Datamarts are capped at 100GB, not 100TB.
This was helpful – thanks James!
Pingback:Power BI as an Enterprise Data Warehouse – Curated SQL
Hi James,
Thank you for sharing.
This is the true food of mind: well structured, clear, and to the point.
Thank you
Alberto
Hi James,
Great article! Clearly articulated thoughts. Love how you took complex content and communicated it clearly.
Great work,
Gaspar
James… nice analysis. This is what we were telling our customers as well. Thanks for the validation.