Azure SQL Data Warehouse
Analytics Platform System (APS) is Microsoft’s massively parallel processing (MPP) data warehouse technology. This has only been available as an on-prem solution (see video Overview of Microsoft Analytics Platform System). Until now. At the recent Microsoft Build Developer Conference, Executive Vice President Scott Guthrie announced the Azure SQL Data Warehouse (SQL DW). This is a cloud data warehouse-as-a-service (DWaaS) that will compete with Amazon’s Redshift.
But it has some additional benefits over Redshift:
- With Redshift you must scale your data warehouse by increasing both the compute and storage units. With SQL DW, compute and storage is decoupled so you can scale them individually. This is a very different economic model that can save customers a lot of money as you don’t have to purchase additional storage when you just need more compute power, or vice-versa
- The ability to pause compute when not in use so you only pay for storage, as opposed to Redshift in which you are billed 24/7 for all the VM’s that make up the nodes in your cluster
- With Redshift you have to pick a pre-defined size and it can take hours to days to resize. With SQL DW you can start small and grow or shrink in seconds
- And keep in mind with Microsoft you can have a hybrid architecture that can use an on-prem APS combined with a SQL DW, allowing you to keep sensitive data on-prem and non-sensitive data in the cloud if you wish. With Redshift you only have the option of keeping all your data in the cloud
- There is also a lot more compatibility with SQL DW as it supports many features that Redshift does not, such as indexes, stored procs, SQL UDFs, partitioning, and constraints
SQL DW is built with the same technology as APS, except that instead of using SQL Server 2014 it uses version 12 of Azure SQL Database. It also includes PolyBase. PolyBase allows APS and SQL DW to query data in a Hadoop cluster, either directly or by pushing some of the work to Hadoop itself so the query is actually run using the Hadoop clusters CPU’s. The Hadoop data is made to look as if it were local to the data warehouse, so that end-users can use their existing skill sets to query it via SQL or any reporting tool that using SQL (like Excel, SSRS, Power BI, etc). PolyBase can integrate with Hadoop in this manner via a Microsoft HDInsight cluster that can either be inside APS or in the cloud, or via a Hortonworks or Cloudera cluster.
SQL DW will work with existing data tools including Power BI for data visualization, Azure Machine Learning for advanced analytics, Azure Data Factory for data orchestration and Azure HDInsight.
The preview for Azure SQL Data Warehouse will be available later this calendar year. You can sign up to be notified when the Azure SQL Data Warehouse preview becomes available.
I will have a lot more blogs about this new service in the coming months, so stay tuned!
More info:
Microsoft BUILDs its cloud Big Data story
Microsoft Announces Azure SQL Database elastic database, Azure SQL Data Warehouse, Azure Data Lake
Introducing Azure SQL Data Warehouse
Short introduction video SQL Data Warehouse – YouTube
Top reasons why enterprises should choose Azure SQL Data Warehouse
Video on new SQL Server 2016 features and SQL DW (minute 59 with demo): The SQL Server Evolution
Hi James
We are looking into using the azure DW as dev/test environment for our on-prem APS.
Can tsql, ddl and ssis/pdw destination adapter be used directly without any rewrite?
Thanks
Peter
Hi Peter,
I can’t say for certain yet, but there should be no changes needed in tsql and ddl. Not sure of the SSIS adaptors yet, but shoot me an email and I’ll find out for you.
Hi James,
Couple of questions for you, maybe you’ve come across some information that can shed light on this 🙂
1. I’m guessing that since Azure SQL Data Warehouse includes PolyBase, it is implied that it can simultaneously query traditional star-based warehouse data AND data residing in an Azure Data Lake, and combine it’s results and visualize it in e.g. PowerBI? That would be so sweet.
2. Have you seen anything with regards to migrating an existing on premises MS SQL 2008 warehouse directly into Azure SQL Data Warehouse, and treat it as a known baseline for subsequent analytics?
3. Also, what about combining traditional warehouse data, raw data in an azure lake, and ad-hoc data from third party APIs? I’d like to be able to perform ad-hoc analytics based on not only my own data in a warehouse and a lake, but also the data coming from multiple third party aggregators, but _not_ have to write custom API clients to copy the third party data into my lake first.
Cheers,
Linus
Hi Linus,
1. PolyBase can query Azure SQL Data Warehouse, Hortonworks HDP, Cloudera CHD, and Azure HDInsight and the results combined in Power BI. It seems very likely that PolyBase can query Azure Data Lake but I have not heard for sure yet
2. The migration from a data warehouse in MS SQL 2008 would be the same to either APS on-prem or Azure SQL Data Warehouse
3. This sounds like a good future use case for PolyBase – being able to query data from third party APIs. Unfortunately it’s not there yet
Hi James,
Recently I learned fundamentals of Machine Learning on Microsoft Azure when I had 1 month free access.
The question I wan to continue to use the platform for Learning but it sounds to be expensive for the subscription. Could you suggest me the best way to go about it. My aim is not make it expensive but I am ready to pay reasonable amount to use the platform.
Thanks in advance
Vijay
Pingback:SQL Server 2016 | Garrett Edmondson
Pingback:Copying data from Azure Blob Storage | James Serra's Blog
Pingback:Getting data into Azure Blob Storage | James Serra's Blog
Hi James,
How does Azure SQL DW compare to Azure SQL?
Magnificent goods from you, man. I’ve have in mind your stuff previous to and
you’re simply too great. I really like what you’ve got right here, certainly like what
you are stating and the best way by which you assert it.
You make it enjoyable and you still take care of to stay it smart.
I can not wait to learn far more from you. That is really a terrific site.
Pingback:Data loading into Azure SQL Data Warehouse - SQL Server - SQL Server - Toad World
Pingback:Data loading into Azure SQL Data Warehouse | James Serra's Blog
Hi James,
Can we consider Azure SQL DW as the cloud version of APS ? Recently in a MS led training , the instructor mentioned that it is NOT so but unable see any major difference between the two of them except cost and that APS has compute and storage coupled while it is de-coupled in Azure SQL DW. This would really help me to understand and then position the 2 accordingly.
Thanks
Hi Sandeep,
Yes, you can consider it “APS in the cloud” in that they are both MPP servers. SQL DW has some differences that you pointed out.