Synapse database templates info and tips
I had previously blogged about Azure Synapse Analytics database templates, and wanted to follow-up with some notes and tips on that feature as I have been using it a lot lately:
- Purview does not yet pull in the metadata for database templates (table/field descriptions and table relationships). Right now it pulls in the metadata as if it was a SQL table or as if it was a file in ADLS. Both just have the basic information supported by those types. The SQL one is probably preferred
- Power BI does not import the table and field descriptions when connecting to a lake database (where the database templates are stored), but it does import the table relationships. You can see the table descriptions by hovering over the table names in the navigator when importing tables using the “Azure Synapse Analytics workspace (Beta)” connector. Note you are not able to see the table descriptions when hovering over the table names using the “Azure Synapse Analytics SQL” connector. Also note the “Select Related Tables” button does not work in the navigator
- When using Power BI to connect to database templates, make sure the tables you want to use have data and are stored in Parquet format or you will get message “DataSource.Error: Microsoft SQL: External table ‘dbo.TableName’ is not accessible because content of directory cannot be listed”. Note that DirectQuery does not work with the “Azure Synapse Analytics workspace (Beta)” connector (see Power BI connectors to Azure Synapse), but works with the “Azure Synapse Analytics SQL” connector
- The storage format for database templates defaults to delimited text, but recommend you store it in parquet format (via the “Data format” drop-down in the model properties). Delta format is not supported yet
- When using the database template designer, think of it as you are building multiple virtualization layers (data models) over the data in your data lake – it’s not a robust modeling tool like erwin or ER/Studio
- You can use the Map data tool (which is in beta) instead of using Spark notebooks or the ADF copy tool or ADF data flows to map source data to the database template (mapping is by far the most time consuming part). Once you are done with your Map Data transformations, you will click the “Create pipeline” button which will generate an Execute Pipeline which calls a mapping data flow, which you can then debug and run your transformation. You can think of the Map data tool as a wizard to shortcut the process of building a mapping data flow
- There is no ability to share models between different Synapse workspaces within the template designer. In the future, API’s may become available to accomplish this
- Create multiple domains from a database template model instead of just using one gigantic model. For example, instead of using the Consumer Goods model and just having that one model, break it up into multiple domains that each have its own model, such as HR and Finance
- Documentation is in the models themselves via the description fields in the database template designer, along with all the relationships you can see visually in the designer. Note that many of the models have the same entities/tables
- Within Synapse, you can utilize GitHub or Azure DevOps Git to store the database models as JSON files. While the actual data that is copied to the models is stored in the lake database, the metadata for the models is not visible in the lake database, only in Synapse (as well as within JSON’s files in GitHub)
- You might want to create SQL views that query the data models in order to limit permissions. These views can be created in Synapse serverless, or via a just released feature that allows you to create views inside a lake database (see Azure Synapse Analytics August Update 2022)
- Each database template has a version number that you can see in the properties blade when in the database template designer
- You are not able to share models between Synapse workspaces (must use GitHub to share)
- All the database template models are in third-normal form (3NF) and stored in a lake database (ADLS Gen2). There is not a way to automatically replicate the models to a dedicated pool from within the database template designer
- You can have a Power BI data modeler import the database template tables and relationships into a Power BI dataset. Purview will list those datasets (see Microsoft Power BI and Microsoft Purview work better together) and you can request access to the datasets via Purview, but the access needs to be given manually as Power BI datasets access grant is not automated yet. Power BI report builders can then choose a dataset to build their reports so they do not have to create the table relationships themselves
- If needed, you can to create your own star schema’s. Star schema’s will reduce the number of joins and improve performance. You can use the “Transform Data” in the Power BI navigator to fire up Power Query to build the star schema’s, or you can build the star schema’s using Synapse pipelines and store it in the data lake or in a Synapse dedicated pool. Using a star schema instead of 3NF will result in less tables and joins, reducing complexity for end-users when creating reports along with faster performance (which will be minimal if importing the data into Power BI). You can equate 3NF with a snowflake schema, and see the differences here: Star Schema vs. Snowflake Schema (zentut.com)
- Concerning a way to setup roles and responsibilities for a database template project: You can have data modelers, working with domain SME’s, who will create a spreadsheet (blueprint) of the mapping of source to target along with transformation rules. Once completed, the spreadsheet would be handed off to data engineers to write the code to copy the source data into the database template and do the transformations
For more information about database templates or to get started, visit aka.ms/SynapseDBTemplates.
More info:
What are Synapse Analytics Database Templates and why should you use them
Thanks this is very helpful. Do you have the documentation link for the “Map Data tool” that is mentioned ?