Microsoft Purview FAQ
I get many of the same questions about Microsoft Purview, so I wanted to list those common questions here along with their answers. If your question is not answered here, please put it in the comments and I will reply:
Microsoft Purview is now the combination of multiple Microsoft products. Can you explain the differences?
Let’s break Microsoft Purview down into three sections of features that were formerly other products to clarify things:
- Data governance: This deals with data catalog, data quality (preview), data lineage, data management, and data estate insights (preview). The product that had these features was formerly called Azure Purview
- Data security: Covers data loss prevention, insider risk management, information protection, and adaptive protection. The product that had these features was formerly called Microsoft Information Protection (MIP)
- Data compliance: This covers compliance manager, eDiscovery and audit, communication compliance, data lifecycle management, and records management. The product that had these features was formerly called Microsoft Information Governance
For simplification, when talking about the data governance features, you can call the features dealing with them Azure Purview, and the features dealing with data security and data compliance M365 Purview (M365 is Microsoft 365, previously called Office 365). I will reference these names in this blog post. Azure Purview generally works with products that contain structured data such as SQL Database, ADLS, and Cosmos DB, collecting metadata and classifying data. M365 Purview usually deals with unstructured data such as email and Word and Excel documents, applying sensitivity labels and securing documents so only those with appropriate privileges can view them (Azure Purview has very limited features to secure data).
Explain the Microsoft Purview compliance portal and the Microsoft Purview governance portal?
In Microsoft Purview, you have the choice of using the classic portal or the new unified portal (which GA’d on 8/1/24). The classic portal means that the Microsoft Purview governance portal (https://web.purview.azure.com/) and the Microsoft Purview compliance portal (https://compliance.microsoft.com/) are completely separated. The new unified portal, available by flipping the “New Microsoft Purview portal” switch that is at the top of any of the web pages in Microsoft Purview, combines the two portals (https://purview.microsoft.com/). See Learn about the Microsoft Purview portal. Customers using the Azure government portal (GCC High, DoD) would have URL’s ending in .us instead of .com and will be rolled out to the new unified portal starting 8/30/24, along with customers in GCC (see roadmap item). However, these customers will not see the solution icons Data Map and Data Catalog in the new unified portal (GCC certification is yet to happen and is expected in H1CY25). The compliance portal will be deprecated starting 11/4/24 (but still available for 36 months), and there is no ETA for deprecating the governance portal. This new unified portal, in addition to a new easier-to-use menu layout, has new data governance features such as business domains, data products, data quality, data product search, data access, health controls, and metadata quality (these will start to GA on September 1st, 2024 to the 26 commercial regions that Microsoft Purview is available in – see rollout schedule). This takes Azure Purview from a PaaS solution to Microsoft Purview that is a SaaS solution. The one government region that Microsoft Purview is in (USGov Virginia) won’t have the new data governance features until late this year or early next year.
Is the new portal and new data governance features available for customers using GCC?
While a GCC customer can turn on the new Purview portal starting 8/30/24, the Data Map and Data Catalog icons are not visible. GCC customers will have those icons working on the new portal, as well as the new data governance features, when there is GCC certification which is expected in H1CY25. This is because Purview for GCC customers is considered part of the GCC M365 suite due to the M365 Purview features (i.e. Microsoft Information Protection). GCC customers using Purview will have the M365 Purview features using resources in the Gov cloud, while the Azure Purview governance features are using resources in the commercial cloud. This means Azure Purview under the GCC tenant is able to scan resources residing in Azure commercial.
Do you have a slide that covers the first two questions?
Sure:
What things can you use the “Request access” feature on (to request access for a data asset)?
The “Request access” feature can be done on two things: 1) Requesting access to a physical asset, and 2) requesting access to a data product that may contain multiple physical assets. Note #2 is only available in the new portal and currently does not automatically assign read permissions to the assets – it must all be done manually, and to review requests you would go to Data Catalog → Data management → Data access.
For a physical asset, will “Request access” automatically assign read permissions to an asset for the requester?
For a few assets (Azure Blob Storage, ADLS Gen2, Azure SQL Database), when you click “Request access” and it is approved, and if the data source is registered for data policy enforcement, it will automatically assign read permission to the asset for the requester (via a data access policy that gets auto-generated and applied against the respective data source to grant read access to the requestor) and you can see approved asset requests on the Self-service access policies screen. Otherwise, it just creates a task that is assigned to a user or a Microsoft Entra group who will need to manually provide access to the assets for the requestor and then approve the request. Note for the “Request access” option to be available for an asset, a self-service access workflow needs to be created and assigned to the collection where the asset is registered (the “Request access” can be made available for any asset, but there are only a few assets (Azure Blob Storage, ADLS Gen2, Azure SQL Database) where it will actually grant access, the rest will need to be manually provided). Also note automatic read permission can only be applied to the requester – applying read permission to a group or moving the requester to a group that has read permission is not supported. [NOTICE: automatic permission provisioning is not yet available in the new portal – slated for Q2CY25]. For the classic portal, in private preview for automatic permission provisioning is SQL MI, SQL Server 2022, Azure Databricks (Unity), and Snowflake.
As an example, an end-user can be browsing folders in Purview and find one that contains files the end-user would like to use. That person would request access to the folder through Purview (via the “Request access” button), which triggers a self-service data access workflow, and if the access is approved, that person would be able to use a tool outside of Purview to read the files (such as Power BI). This can also be done for storage accounts, containers, folders, individual files, databases, or tables in a database.
Can Purview scan on-prem file systems?
Yes, Microsoft Purview does have the capability to scan on-premises file systems. The metadata curated at the end of the scan process can include data asset names such as table names or file names, file size, columns, and data lineage among other details. To leverage this functionality, you would typically configure a self-hosted integration runtime which provides a bridge between your on-premises network and the cloud-based Microsoft Purview service. Once this is set up, you can proceed to register and configure your on-premises file system as a data source in Purview for scanning.
What security permissions are needed to register and scan a data source?
You’ll need to be a Data Source Admin and one of the other Microsoft Purview Data Map roles (for example, Data Reader or Data Share Contributor) to register a source and manage it in the Microsoft Purview data map. See our Microsoft Purview Permissions page for details on roles and adding permissions (or Understand access and permissions in the classic Microsoft Purview governance portal | Microsoft Learn if using classic portal). When you go to register a source, you choose one of your subscriptions and the items for that source that you have access to are displayed. For example, for ADLS Gen2, a list of storage account names would be displayed. Most data sources have prerequisites to register and scan them in Microsoft Purview. For example, to scan ADLS Gen2, the storage account must have the role “Storage Blob Data Reader” assigned to the Microsoft Purview account name. For a list of all available sources, and links to source-specific instructions to register scan, see our supported sources article. Click on a source to get details on how to register and scan it.
Does Purview connect to Databricks Unity Catalog?
Yes, see https://learn.microsoft.com/en-us/purview/register-scan-azure-databricks-unity-catalog. How they work together: (43) Best data governance tool: Databricks Unity Catalog or Microsoft Purview? | LinkedIn.
Can Purview scan Databricks tables without Unity Catalog?
Microsoft Purview can scan Databricks tables without requiring Unity Catalog, but using Unity Catalog offers significant benefits that enhance data governance and metadata management. Without Unity Catalog, Microsoft Purview connects directly to Azure Databricks and can scan tables to gather metadata (using the Hive Metastore), such as table schemas and columns, and perform basic classification tasks. However, this method may have limitations in metadata detail and advanced governance features.
Unity Catalog provides a centralized and standardized metadata layer for all data assets within Databricks, which Microsoft Purview can leverage to access richer metadata and comprehensive data classifications. This integration simplifies metadata management and enhances data governance by offering fine-grained access control, auditing, and compliance features. Unity Catalog streamlines the integration process by centralizing metadata, making it easier for Purview to scan and classify data assets effectively.
However, it’s important to note that Microsoft Purview does not currently support lineage extraction from Unity Catalog. This means that while Purview can identify and classify data assets within Unity Catalog, it cannot track the flow of data between different stages or transformations within Databricks. For lineage extraction, Purview relies on scanning the Hive Metastore within Databricks, which does provide lineage information. Note that Microsoft Purview requires either the Hive Metastore or Unity Catalog to scan Databricks. See Connect to and manage Azure Databricks Unity Catalog and Connect to and manage Azure Databricks.
Can purview scan pdf documents or a word document?
This applies to M365 Purview:
Yes, Microsoft Purview (formerly known as Microsoft Information Protection) can scan PDF documents, Word documents, and other types of files for sensitive information. Microsoft Purview offers a comprehensive range of compliance and risk management solutions, including information protection, data loss prevention (DLP), governance, and compliance capabilities across your Microsoft 365 and Office 365 services, as well as other environments.
For PDF documents, it uses Optical Character Recognition (OCR) to scan content in images for sensitive information (see Learn about optical character recognition in Microsoft Purview). This feature is optional and can be enabled at the tenant level. Once enabled, you can select the locations where you want to scan images. Please note that each page in a PDF file is charged separately. For example, if there are 10 pages in a PDF file, an OCR scan of the PDF file counts as 10 separate scans.
This applies to Azure Purview:
For Word documents, the scanning process establishes a connection to the data source and captures technical metadata like names, file size, columns, and so on. It also extracts schema for structured data sources, applies classifications on schemas, and applies sensitivity labels if your Microsoft Purview Data Map is connected to a Microsoft Purview compliance portal. The scanning process can be triggered to run immediately or can be scheduled to run on a periodic basis to keep your Microsoft Purview account up to date.
It doesn’t store the documents themselves but rather metadata about these documents and insights into the sensitive information they contain.
When it comes to a Word document (or any other document type like PDFs), Microsoft Purview can:
- Scan and classify the document based on its content, identifying sensitive information.
- Catalog the classification and metadata about the document in the Purview Data Map. This includes information like where the document is stored, the type of sensitive information it contains, and how it’s classified.
- Enable governance policies to be applied based on this classification, such as protection actions, access controls, and monitoring.
However, the actual content of the Word document remains stored in its original location. The Purview Data Catalog focuses on managing the metadata and governance policies around the document, rather than the document file itself. The text content of a Word document does not get stored in the Microsoft Purview Data Catalog.
In Azure Purview, scanning a data asset such as ADLS will return just basic file info for documents. Documents in ADLS it will scan: DOC, DOCM, DOCX, DOT, ODP, ODS, ODT, PDF, POT, PPS, PPSX, PPT, PPTM, PPTX, XLC, XLS, XLSB, XLSM, XLSX, XLT (structured data it scans: CSV, JSON, PSV, SSV, TSV, GZIP, TXT, XML, PARQUET, AVRO, ORC). See File types supported for scanning.
Azure Purview can scan cloud storage locations like Blob storage, ADLS, AWS S3, etc. but SharePoint and OneDrive for Business locations are only scanned with M365 Purview.
Can you explain the following concepts: Domains, Business Domains, Collections, Data Products, and Data Assets?
Is there customer training?
Here is click-through training:
- Overview – https://purviewdatagovernance.storylane.io/share/a13g1ie2izob
- Setup Business Domain –https://purviewdatagovernance.storylane.io/share/920vqytm1ech
- Scan the Data Estate – https://purviewdatagovernance.storylane.io/share/of6j1c1q2c2l
- Publish Data Products – https://purviewdatagovernance.storylane.io/share/8ubzzhh1npwl
- Improve data with Data Quality – https://purviewdatagovernance.storylane.io/share/ul046l6beqse
- Monitor the practice with DEH – https://purviewdatagovernance.storylane.io/share/wkojjbsxb90l
- Enable data democratization for consumers – https://purviewdatagovernance.storylane.io/share/mzzwqtfgdpre
Are there Purview best practices?
Yes, starting at Microsoft Purview (formerly Azure Purview) accounts architecture and best practices.
What is the difference between classifications and sensitivity labels in Purview?
See prior blog.
What Azure database and report products can use Microsoft Purview Information Protection sensitivity labels?
See prior blog.
Is there a way to enforce access control on database columns with sensitive data?
See prior blog.
How do sensitivity labels work in Power BI?
See prior blog.
Very Informative. Thanks for posting this.
Pingback:Frequently Asked Microsoft Purview Questions – Curated SQL