Create quick queries quickly with SSAS in Azure
In a perfect world a data warehouse would always return all queries very quickly, no matter what the size or complexity of the data warehouse. But the reality is that sometimes we won’t get the performance we need from the data warehouse, and the solution is to use a SQL Server Analysis Services (SSAS) cube, which also has other benefits (see Why use a SSAS cube?). Thanks to Azure, creating a cube can be done in a matter of hours, giving you quick results for just about any query no matter how big the data warehouse gets.
If you want to create a cube quickly you can do so by creating a VM in Azure and installing SQL Server and SSAS (or choosing a VM from the marketplace with SQL Server and SSAS already installed) and assuming the following restrictions:
- No high-availability needed (can be accomplished in the future by creating another VM in an availability set and syncing cubes and using a Azure load balancer)
- No disaster recovery needed (can be accomplished in the future by creating another VM in another region and syncing cubes). Solution for now would be to backup the cube and restore if needed, or just rebuild and reprocess the cube
- Number of concurrent connections under a hundred. To handle more connections in the future can be accomplished by scaling up the VM and/or creating multiple VMs and syncing cubes and adding a load balancer (keeping in mind VMs can be shut-down off hours to save cost). The number of concurrent connections is highly variable based on the type of querying and caching. See SQL Server Best Practices Article
- Using SSAS tabular model, and the size of the cube is less than 180GB. Cube compression is between 10-100x and the 180GB number is with a conservative average of 10x and using 60% of the servers memory. A larger cube can be accomplished via a SSAS multidimensional model as it does not require the cube in memory like Tabular. The multidimensional SSAS cube can use HOLAP storage mode to support more source data, as the cube will not have a copy of the detail records from the source system, greatly reducing cube size and processing time. Drilling down to the detail records can still be made by the cube which will accomplish this by querying the source system
- Available maintenance window at night to process the tabular cube (in the future can add a “cube processing VM” to process cube and then sync the cubes to avoid queries hitting a cube when it is being processed and slowing the queries down)
- Only need to update the cube once a day (in the future can use VM to process cube and sync as well as use partitioning)
- Future expansion can be via Azure Analysis Services (in preview) as well as creating multiple cubes by subject areas and using a true scale-out architecture
- It’s a balance between using extra developer hours to save cost and choosing more hardware up-front
To choose the appropriate size VM, see Hardware Sizing a Tabular Solution (SQL Server Analysis Services) and Sizes for virtual machines in Azure and Largest VM in the Cloud and How Much RAM Do I Need For My SSAS Tabular Server?
More info:
SQL Server 2008 R2 Analysis Services Operations Guide
Why a Semantic Layer Like Azure Analysis Services is Relevant (Part 1)
Hi James,
Thanks for info. When you say high availability can be done in future, does that means shared storage for ssas fci cluster will be released soon? Otherwise azure ilb wouldn’t have a listenre IP address for HA, am I right or am I missing a trick? Though we already looked into third party shared storage stuff.
Thanks
Hi Ahsan,
High availability for the cube involves creating another VM with SSAS installed, and syncing a cube on each VM (so two copies of the data). There is no shared storage involved.
Hi James,
I’m curious as to why you have suggested the IaaS approach for this when we now have Azure Analysis Services in PaaS. Surely this would be a quicker route?
Cheers
Hi Steve,
I did mention Azure Analysis Services in the blog, but it’s still in preview so I did want to suggest to use it now. When it’s GA I’ll update the blog.
Yes but how would u handle 2 different server names as these will be two different machines e.g. we have 3500 users who use power bi desktop, connecting to ssas tabular cube, where do they connect to machine1 or machine 2? As there is no cluster and no single listenre doesn’t this create massive problems and trouble some for users as they dont know which machine to connect to? Again tool power bi desktop is not specific it can be any tool, could you please put some light on it. Thanks
Any type of load balancer will do the trick.