The need for having both a DW and cubes
I have heard some people say if you have a data warehouse, there is no need for cubes (when I say “cubes” I am referring to tabular and multidimensional OLAP models). And I have heard others say if you have OLAP cubes, you don’t need a data warehouse. I strongly disagree with both these statements, as almost all the customers I see that are building a modern data warehouse use both in their solutions. Here are some reasons for both:
Why have a data warehouse if you can just use a cube? (i.e. is the trade-off of duplication of data and more ETL worth it?)
- Breaking down complex steps so easier to build cube
- Cube is departmental view (cube builder not thinking enterprise solution)
- Easier to clean/join/master data in data warehouse
- Processing cube is slow against sources
- One place to control data for consistency and have one version of the truth
- Use by tools that need relational format
- Cube does not have all data
- Cube may be behind in data updates (needs processing)
- Data warehouse is place to integrate data
- Risk of having multiple cubes doing same thing
- Data warehouse keeps historical records
- Easier to create data marts from data warehouse
- Need a relational database to track Slowly Changing Dimensions (SCD)
- Security in a data lake is file/folder
- Use the data warehouse as a place to store the metadata since people are more familiar with a relational database
Why do I also need a cube if I have a data warehouse? (or, reasons to report off cubes instead of the data warehouse) (a summary from my prior blog post of Why use a SSAS cube?):
- Semantic layer
- Handle many concurrent users
- Aggregating data for performance
- Multidimensional analysis
- No joins or relationships
- Hierarchies, KPI’s
- Row-level Security
- Advanced time-calculations
- Slowly Changing Dimensions (SCD)
- Required for some reporting tools
The typical architecture I see looks like this:
Pingback:#BI101: The need for having both a DW and cubes – Welcome to Ray Kim's 'blog
Hi James, what would be the limitations of using normal Azure SQL (obviously without polybase) instead of DW?
DW isn’t cost effective for smaller orgs.
Hi Simon,
If you database is “small” (under 4TB) and you don’t need the power of a MPP solution, then go with Azure SQL Database. And with Managed Instance in public preview and near 100% SQL compatibility, it’s a great option with it’s biggest limitation is that SQL DW can be 20x-50x times faster.
Awesome, done! Preview request is pending.
Azure SQL Database Managed Instance looks like it on parity with SQL Server 2017 but, once again, it excludes Polybase! Why does MS keep excluding it?
They should just replace polybase with Power Query imho. It can federate equally to Hadoop, plus do a million more things.
Excellent commentary;
Let the DW ‘house’ the data and OLAP interpret the data.
Doug
James, I’d like to see more exploration into the use of ROLAP, now that the backend databases are getting more sophisticated/faster. If your onsite DW can be supported using SSD drives, ROLAP may offer a faster SSAS cube build out, with only a single data update to the DW. Or online via Azure SQL data warehouse, with MPP, that may offer a great alternative. That provides both a traditional DW with a rapid SSAS build out. And I am thinking multi-dimensional, not tabular. I may be that last holdout, but multi-dimensional still seems far superior when the business needs get complex.
James,
I see that now we can build AAS TAbular Cubes on top of ADL. In this case what are the scenario you see SQL DW or SQL Azure is needed to provide a Semantic layer with AAS ?
https://azure.microsoft.com/en-us/blog/using-azure-analysis-services-with-azure-data-lake-storage/
Hi Raghu,
Those reasons where in the blog under “Why have a data warehouse if you can just use a cube?”. And I’ll add that if you are going against ADL, you will need to make sure the data is cleaned and transformed in the data lake before pushing it to a cube.
Thanks for a great commentary. I now have a second source for some of the comments I have been espousing at work. Great graphic!! It clarifies a lot for many.
colleagues.
Pingback:Power BI: Dataflows | James Serra's Blog