Azure Synapse Analytics confusion

I see a lot of confusion among many people on what features are available today in Azure Synapse Analytics (formally called Azure SQL Data Warehouse) and what features are coming in the future. Below is a picture (click to zoom) that I describe below that hopefully clears things up:

So, if you log into the Azure portal today and use Synapse Analytics, you are using the GA version and nothing is different – it’s simply a name change form SQL DW. For the purposes of this picture we will call the GA version “v1” (the v1, v2, and v3 in my diagram is not in any way officially part of the product naming and is just used in this blog). This v1 will get certain new features over time (shown on the left) that I blogged about here.

In private preview are other new features, shown as v2 in my diagram. If you join the private preview, your Azure subscription will be whitelisted and you will have these new features available to you. You will not see these new features unless you are accepted into the private preview. The major new features in v2 include Azure Synapse Studio (a single pane of glass that uses workspaces to access databases, ADLS Gen2, ADF, Power BI, Spark, SQL Scripts, notebooks, monitoring, security), Apache Spark, on-demand T-SQL, and T-SQL over ADLS Gen2.

Much further down the road will be “Gen3”, or v3 in my diagram. The biggest feature in that version will be Multi-master cluster, where user workloads can operate over the same shareable relational data set while having independent clusters to serve those various workloads. This allows for very high concurrency. This was demo’d at Ignite by Rohan Kumar showing 10k concurrent queries (video at 0:28).

Any of the features that get added to the v1 version will also be added to v2 and v3.

Hope this helps!

More info:

Understanding Azure Synapse Analytics

Comments

Azure Synapse Analytics confusion — 12 Comments

BHASKAR SHARMA on April 14, 2020 at 3:10 am said:

Awesome write-up James on the roadmap.
I have a concerns over Synapse monitoring the Query store is not capturing resource utilization at query level and all columns related to resource ( CPU,MEMORY,I/O) are populated as null will it get resolved in coming release.

Reply ↓
Chris Bailiss on April 28, 2020 at 8:22 am said:

Hi James
Helpful to get concise clarity as usual in your posts.
One question re: the diagram…
For v1, you highlight what are currently “private preview” features.
For v2, you just list “preview” features. This implies these currently aren’t in private preview, however I believe many are (such as on-demand queries and associated pricing).
Just checking whether these features are now in public preview or whether I am reading too much into an implied distinction in the diagram.
Chris
PS – please ignore preview comment.

Reply ↓
- James Serra on April 28, 2020 at 9:23 am said:
  
  Hi Chris,
  
  Ah, I see your confusion. Apparently I still caused some confusion 🙂 All the v2 features are in private preview. I have updated the picture. Thanks for letting me know!
  
  Reply ↓
Emir Tuncer on April 29, 2020 at 3:15 pm said:

Greate explanation about Synapse on first look. Thanks James.

Reply ↓
Darryll Petrancuri on May 7, 2020 at 5:53 pm said:

Good Day James:

My biggest question is what SQL engine will be behind Synapse. I simply cannot see moving forward with the MPP based engine and limited language surface area. From my perspective the only correct answer here is Hyperscale (Socrates).

Can you comment?

Reply ↓
- James Serra on May 7, 2020 at 9:08 pm said:
  
  Hi Darryll,
  
  It’s the same MPP engine that was in SQL DW. Nothing has changed on that front.
  
  Reply ↓
  - DARRYLL D PETRANCURI on May 7, 2020 at 9:23 pm said:
    
    Hello James
    
    That’s unfortunate and ready doesn’t make much sense.
    
    It would be the perfect time to migrate to the Hyperscale engine. It would be much more competitive which Snowflake and offers superior capabilities in many ways
    
    Reply ↓
    - James Serra on May 7, 2020 at 9:26 pm said:
      
      Hyperscale is an SMP engine, so much slower for data warehouse type queries. In what ways do you feel Synapse is not competitive with Snowflake?
      
      Reply ↓
      - Darryll Petrancuri on May 8, 2020 at 8:09 am said:
        
        James:
        
        It is not an SMP engine. I don’t know where you’re getting your information, but you need to read the Project Socrates whitepaper (available at https://www.microsoft.com/en-us/research/uploads/prod/2019/05/socrates.pdf)
        
        Respectfully,
        
        Darryll
        
        Reply ↓
      - Darryll Petrancuri on May 8, 2020 at 8:22 am said:
        
        Microsoft contradicts it’s own white paper published by Microsoft Research in it’s public facing web pages regarding such (https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tier-hyperscale-faq)
        
        I believe the persons who wrote the content do not truly understand Hyperscale and / or have been told such in order to tow the company line. This becomes evident when you read the white paper. And I find this disturbing but understandable because it would cut into Synapse if Microsoft insists on continuing to stick to the legacy MPP engine instead of doing the right thing and moving to the Hyperscale engine, which might be accomplished in relatively short order, in my humble opinion.
        
        Reply ↓
      - Darryll Petrancuri on May 8, 2020 at 8:34 am said:
        
        James,
        
        Just to be clear, I know there is no control node present in the architecture to support distributed compute at this time, but the architecture is not classic SMP architecture because of the way storage and compute are decoupled. Accordingly, it seems to be that it would be within reach for Microsoft to take the control node concepts and technology from DW and adapt it to work with the Hyperscale with an acceptable amount of effort if it’s truly necessary to do so. I know there are language surface area consideration with respect to distribution, but that should not be a reason not to move forward with Hyperscale as a replacement at the core.
        
        Reply ↓
      - Darryll Petrancuri on May 8, 2020 at 8:44 am said:
        
        James:
        
        One last point with respect to the lack of the control node, the core architecture of Hyperscale already addresses everything necessary to do distributed workloads without having to rely on distribution sharding of the data because of how it deals with data replication.
        
        Reply ↓

James Serra's Blog

Big Data and Data Warehousing

Azure Synapse Analytics confusion

Comments

Azure Synapse Analytics confusion — 12 Comments

Leave a Reply Cancel reply

Share:

Leave a Reply Cancel reply