Microsoft Purview new data governance features
Starting last week is a rollout of the public preview of a new and fully reimagined Microsoft Purview data governance solution. Data governance has become so much more important with the collection of more and more data. Data governance involves the processes, policies, and standards that ensure the effective use of information in an organization. It focuses on maintaining data quality, managing data life cycle, ensuring compliance with regulations, and defining roles and responsibilities for data management. This framework helps organizations achieve their goals by enhancing decision-making, facilitating compliance, and improving data security and privacy.
This new SaaS experience is intended to supersede the previous Microsoft Purview Data Catalog experience and addresses the key areas of customer feedback as well as support the requirements of a modern data governance solution via these new features:
- Business-friendly catalog for business users across data roles (Chief Data Officer, data stewards, data asset creators, and data consumers) and anchored to durable business concepts and business objectives and key results (OKRs)
- A unified and extensible experience that works how customers work, backed by compliant, self-serve data access
- AI-enabled experiences and automation to dramatically scale business success
- Built to enable a culture of federated data governance through simplicity and actionable insights
The new data governance experience will be available on the new unified Microsoft Purview portal. If you have not yet transitioned to this new experience, please reference this useful article on how to transition. You can easily toggle from the new Purview portal experience to the previous one right from within the user interface (upper righthand corner). The new experience has already rolled out to a few regions, and additional regions will become available over the course of several weeks (see Data Governance for the Age of AI for the region rollout dates).
The changes to Purview, in short, are that the “Data Catalog” section of Microsoft Purview was redesigned and updated with new features, including a “Data management” section and a “Data estate health” section. Within the data management section, you can easily define and assign business-friendly terminology (such as Finance and Claims). Business-friendly language follows the data governance experience through Data Products (a collection of data assets used for a business function), Business Domains (ownership of Data Products), Data Quality (assessment of quality), Data Access, and Data Estate Health (reports and insights). Let’s explore each of these new features in detail:
- Data search – Browse, search, and discover data assets across your organization. Enter in keywords that help narrow down your search such as name, data type, classifications, and glossary terms. Or you can explore by source type or collection. There are many options to filter the results by, such as asset type, classification, contact, or endorsement. This is basically the same as the search in the older portal
- Data product search – Discover, understand, and access data products from across your organization. You can search for data products in the catalog, explore data products by business domain, view data product details, and view data asset details. In the data product search bar, you can perform a natural language search for any of your organization’s data products. For example: “I’m looking for daily sales data for retail stores worldwide to analyze sales trends for the last six months.” Also on this page is where you can see all your data access requests via the “My data access” tab. This will make it so much easier to find assets, as instead of using the data search page which returns all assets, this page allows you to drill down through the business domain and data products to find the relevant assets
- Business domains – Organize data products into meaningful groups (such as Sales or Marketing) and link them to business concepts. It employs the concept of business domains to manage business concepts and define data products. A business domain is a framework for managing your business-related information in the catalog, typically organized around a common business purpose or business capability. It is a boundary that aligns your data estate to your organization; think of it as a mini catalog inside your data catalog. This is different than the domains under Data Map, which are technical domains for logical groupings (called collections) such as by project, asset type, or ownership (be sure to check out What’s coming for domains? ). Business domains can be mapped to collections in your data map. Mapping your business domain to a collection means that the assets associated with the business concepts in that domain will be from that collection or its children. Within business domains is where you define glossary terms. Steps to build out your business domains are: 1) Create and manage business domains to curate your data catalog, 2) Assign owners and stewards for a business domain, 3) Relate business domains to physical data collections and domains in the data map, 4) Create glossary terms for the business domain (note you can use built-in Copilot to suggest terms), 5) Understand and take timely actions to keep your business domains in a healthy state, 6) Define business objectives and key results (OKR’s) such as 10% rise in sales or 3% reduction in support cases (of which you would manually track the progress of in Purview), 7) Define critical data elements, also called CDEs, which are a logical grouping of important pieces of information to make data easier to understand as well as to promote standardization (for example: a “Customer ID” critical data element can map “CustID” from one table and “CID” from another table into the same logical container)
- Data products – Manage groups of data assets packaged together for specific use cases. Data products are essentially logical business concepts. Each data product will be assigned to a business domain, and assets such as tables, files, and Power BI reports will be assigned to the data product (Copilot can be used for suggestions). No more requesting access to 15 different tables you might need to build a data model. Once one user does the research to create a viable data product, all other users can benefit from that work. They can find (and request access to) the data in that product and have everything they need in one place. A business domain can house many data products, but a data product is managed by a single business domain and can be discovered across many business domains. For example, with a data product, a data scientist can create a data product that lists all the assets used to create their data model. The description provides a full use case, with examples or suggestions on how to use the data. The data scientist is now a data product owner and they’ve improved their data consumer’s search experience by helping them get everything they need in this one data product. Data products streamline governance for data assets as well: With data products, when a user finds the data product, they can request access to the data product, which will provide them access (after approval) to all the associated data assets. The resulting hierarchy looks like: Business domains -> Data products -> Assets. An example would be Sales -> Global Sales Revenue for 2023CY -> Global Sales for 2023 (Power BI report). You can also assign previously created glossary terms to the data product
- Data quality – Identify and fix data quality issues. The new data quality model enables your organization to set data quality rules top down with business domains, data products, and the data assets themselves. This is done using no-code/low-code rules, including out-of-the-box (OOB) rules. Some examples of rules are checking for duplicate rows, empty fields, unique values, etc. Copilot can be used to suggest rules. Data access policies can be set on a business domain, data product and a glossary term, and also to manage access requests to a data product (a request triggers a workflow that requests that the owners of the data resource grant you access to the data product). Any time a glossary term is applied to a data product, all the associated policies will be automatically applied. These policies are where you determine permitted access (“access time limit”), approval requirements (“manager approval required” or “privacy and compliance review required”), and digital attestations (“permit data copies”). Once rules and policies are applied, the data quality model will generate data quality scores at the asset, data product, or business domain level giving you snapshot insights into your data quality relative to your business rules. Within the data quality model, there are two metadata analysis capabilities: 1) data profiling—quick sample set insights, and 2) data quality scans—in-depth scans of full data sets. These profiling capabilities use your defined rules or built-in templates to reason over your metadata and give you data quality insights and recommendations. Also available are data quality actions, which identify problems that you should address to improve data quality in your data estate. For example, “Data profile outlier values detected” and “Data asset quality rule score has fallen below threshold”. You would assign the action to a person, and when that person fixes the issue via a tool such as ADF, you would mark it as resolved. Also, there are data quality alerts that notify Microsoft Purview users about important events or unexpected behavior detected around the quality of the data. When you create alerts for assets, you’ll receive email notifications about data quality scores. An alert example would be to send a notification if the data quality score for the sales domain (containing customer and fact_sale data assets) dropped below 50%”. Lastly, there are data quality rules you can choose for CDEs, such as “unique values” and “empty/blank fields”
- Data access – This page is where you manage requests to access data products in your business domains by approving or declining the request (which can also be done through an email notification). But first you will manage access to your data products and set up a system to provide access to users who request them via the data products page by selecting a data product and on the data product page selecting “Manage policies”. There you can define access policies in many ways, as mentioned above in the data quality section. For example, setting the usage purposes, who approves the request, requiring manager approval, requiring acknowledgement of terms of use, determining if copies of the data are permitted, and setting the maximum access duration. The selected values affect what the data consumers see on their access request form and actions they need to take. Note in this preview experience, the approvers of the request must provide access to the individual data assets manually. To request access to a data product, a user will select the “Request access” button while on a data product details page
- Heath controls – Track your journey to complete data governance by monitoring health controls to monitor your progress. Health controls measure your current governance practices against standards that give your data estate a score. Some example controls are metadata completeness, cataloging, classification, access entitlement, and data quality. A data officer can configure rules which determine the score and define what constitutes a red/yellow/green indicator score, ensuring your rules and indicators reflect the unique standards of your organization. An example would be checking if data assets are classified, with a target value of 80%, or if data assets are mapped to data products for discoverability in the catalog, with a target of 90%
- Health actions – Steps you can take to improve data governance across your data estate. This new action center aggregates and summarizes governance-related actions by role, data product, or business domain. All the anomalies noted by health controls are translated into actions that you can assign an owner to and that has recommendations to resolve that you can track and address from within Microsoft Purview. Actions stem from usage or implementation being out of alignment from defined controls. This interactive summary makes it easy for teams to manage and track actions—simply click on the action to make the change required. Cleaning up outstanding actions helps improve the overall posture of your data governance practice—key to making governance a team sport. An example is missing classification on data assets, or data product not linked to data assets
- Metadata quality – A low-code/no code experience for data stewards and members of the Chief Data Officer’s office to write any logic to test the metadata health and quality. It comes with a set of predefined logic for each health control and you can add more rules to the existing logic, and at the next refresh of the health controls, the new metadata quality logic will be applied to all the metadata, based on the scope (data product or business domain or any other entity). For example, you can create a rule that a data product must have a published term of use or a data product must have a description
- Reports – Data governance is a practice which is nurtured over time. Aggregated insights help you put the “practice” into your data governance practice by showcasing the overall health of your governed data estate. These reports provide deep insight across a variety of dimensions: asset insights (an overview of assets by type and collection, and their curation status), catalog adoption (to understand at a glance how your data catalog is being used), classification insights (an overview of assets classified and the types of classifications), data stewardship (for the governance and quality focused users, like data stewards and chief data officers, to understand governance health gaps like asset curation and asset ownership), glossary insights (health and use of glossary terms), and sensitivity label insights (an overview of assets that have sensitivity labels applied and the types of labels applied). Coming soon is data governance health and data quality health
- Roles and permissions – Admin roles give users permission to view data and complete tasks in Microsoft Purview. Give users only the access they need by assigning the least-permissive role. Roles include Data Governance Administrators, Business Domain Creators, Data Health Owners, and Data Health Readers. These roles are called application-level permissions (with the application being the Data Catalog). Note these relate to using features in the Data Catalog, separate from the role assignments for collections and separate from the roles for each business domain (called business domain level permission). For details on all the permissions, check out Permissions in the new Microsoft Purview portal preview
Finally, I wanted to stress that it’s important to understand this new data quality life cycle:
- Assign users(s) data quality steward permissions in your data catalog to use all data quality features
- Register and scan a data source in your Microsoft Purview Data Map
- Add your data asset to a data product
- Set up a data source connection to prepare your source for data quality assessment. The currently supported data source types are ADLS Gen2 (delta format), Azure SQL Database, and Fabric Lakehouse (delta table)
- Configure and run data profiling for an asset in your data source. Data profiling is the process of examining the data available in different data sources and collecting statistics and information about this data
- When profiling is complete, browse the results for each column in the data asset to understand your data’s current structure and state
- Set up data quality rules based on the profiling results and apply them to your data asset. Data quality rules are essential guidelines that organizations establish to ensure the accuracy, consistency, and completeness of their data. These rules help maintain data integrity and reliability
- Configure and run a data quality scan on a data product to assess the quality of all supported assets in the data product and produce a score. Your data stewards can use that score to assess the data health and address any issues that might be lowering the quality of your data
- Review your scan results to evaluate your data product’s current data quality
- Repeat steps 5-8 periodically over your data asset’s life cycle to ensure it’s maintaining quality
- Continually monitor your data quality
- Review data quality actions to identify and resolve problems
- Set data quality notifications to alert you to quality issues
For more details on the new governance features, check out New Microsoft Purview Data Catalog (Preview) as well as the videos on the Microsoft Purview YouTube channel.
As you can see, Microsoft Purview has transitioned its focus from not just cataloging data and applying policies, but to managing logical concepts (business domains and data products) and providing data governance via quality checks to the data and ensuring compliance.
More info:
Introducing modern data governance for the era of AI
Get ready for the next enhancement in Microsoft Purview governance
Scalable Data Management with Microsoft Fabric and Microsoft Purview
Microsoft Purview’s reimagined data governance experience
Episode 5: Connecting the dots with Microsoft Purview
Integrated Data Quality experience in Purview Data Governance solution
Pingback:New Data Governance Features in Microsoft Purview – Curated SQL