Classifications and sensitivity labels in Microsoft Purview
I see a lot of confusion on how classifications and sensitivity labels work in Microsoft Purview. This blog will help to clear that up, but I first must address the confusion with Purview now that multiple products have been renamed to Microsoft Purview. I decided to use a question-and-answer format that will hopefully clear up the confusion (I was very confused too!):
Microsoft Purview is now the combination of multiple Microsoft products. Can you explain the differences?
Let’s break Microsoft Purview down into three sections of features that were formerly other products to clarify things:
- Data governance: This deals with data catalog, data quality (preview), data lineage, data management, and data estate insights (preview). The product that had these features was formerly called Azure Purview
- Data security: Covers data loss prevention, insider risk management, information protection, and adaptive protection. The product that had these features was formerly called Microsoft Information Protection (MIP)
- Data compliance: This covers compliance manager, eDiscovery and audit, communication compliance, data lifecycle management, and records management. The product that had these features was formerly called Microsoft Information Governance
For simplification, when talking about the data governance features, you can call the features dealing with them Azure Purview, and the features dealing with data security and data compliance M365 Purview (M365 is Microsoft 365, previously called Office 365). I will reference these names in this blog post. Azure Purview generally works with products that contain structured data such as SQL Database, ADLS, and Cosmos DB, collecting metadata and classifying data. M365 Purview usually deals with unstructured data such as email and Word and Excel documents, applying sensitivity labels and securing documents so only those with appropriate privileges can view them (Azure Purview has very limited features to secure data).
Azure Purview and M365 Purview are two very different products combined into one, which is confusing as there is not much they have in common between the two products except for sensitivity labels, hence the reason for this blog. Within Microsoft, when a customer asks for a “Microsoft Purview” demo, we always have to ask if they want a demo on the Azure Purview features or the M365 Purview features as they are both very large products with very different features. Interestingly, to add to the confusion, my Microsoft cohort who demos the M365 piece is also named James!
One more thing of note: In Microsoft Purview, you have the choice of using the classic portal or the new unified portal (which GA’d on 8/1/24). The classic portal means that the Microsoft Purview governance portal (https://web.purview.azure.com/) and the Microsoft Purview compliance portal (https://compliance.microsoft.com/) are completely separated. The new unified portal, available by flipping the “New Microsoft Purview portal” switch that is at the top of any of the web pages in Microsoft Purview, combines the two portals (https://purview.microsoft.com/). See Learn about the Microsoft Purview portal. Customers using the Azure government portal would have URL’s ending in .us instead of .com and will be rolled out to the new unified portal starting 8/30/24 (see roadmap item). The compliance portal will be deprecated starting 11/4/24, and there is no ETA for deprecating the governance portal. This new unified portal, in addition to a new easier-to-use menu layout, has new data governance features such as business domains, data products, data quality, data product search, data access, health controls, and metadata quality (these will GA on September 1st, 2024). This takes Azure Purview from a PaaS solution to Microsoft Purview that is a SaaS solution. The new data governance features are currently deployed in 16 of the 26 commercial regions that Microsoft Purview is available in, with the remaining 10 to be done by November. The one government region that Microsoft Purview is in (USGov Virginia) won’t have the new data governance features for a while.
What is the difference between classifications and sensitivity labels in Purview?
In Microsoft Purview, classifications and sensitivity labels serve distinct purposes. Classifications categorize data based on its content, such as identifying credit card numbers or Social Security Numbers (SSNs). They can be applied to specific data points like columns in a database. Dozens of built-in classifications include personally identifiable information (PII) and financial data, and you can create custom classifications for something like “Customer ID”.
Sensitivity Labels, on the other hand, define how data should be handled and protected, using built-in labels such as Public, General, Confidential, or Highly Confidential, or custom labels you create such as Secret or Top Secret. They enforce protection policies such as encryption and access control and can be applied to various data types including documents, emails, and databases. When it comes to databases, sensitivity labels can be applied at the column level to ensure that data within those columns is handled according to the defined protection policies. For instance, a “Confidential” label might encrypt the data in a column and restrict access to authorized users, while a “Public” label would have no restrictions.
The key differences between the two are their objectives and functions. Classifications aim to identify and tag data, whereas sensitivity labels focus on protecting data. Classifications are automatically applied by scanning data, while sensitivity labels can be applied manually (for example, see Apply sensitivity labels to your files and email) or automatically based on policies. Essentially, classifications help organize data, while sensitivity labels ensure its protection. For more info see Classifications vs sensitivity labels.
Classifications rules are created within Azure Purview and automatically applied to data sources when the data sources are scanned by Azure Purview (see Data classification in the Microsoft Purview governance portal). Classifications can be applied to tables (for structured data such as CSV, TSV, JSON, SQL Table, etc.) or files (for unstructured data such as DOC, PDF, TXT, etc., see File types supported for scanning). Table assets are not automatically assigned classifications at the table level, because the classifications are only automatically assigned to their columns, but you can manually apply classifications to table assets at the table level. A classification can be automatically applied to a file asset. For example, if you have a file named multiple.docx and it has a National ID number in its content, during the scanning process Microsoft Purview adds the classification EU National Identification Number to the file asset’s detail page (a file can have multiple classifications applied to it). To see all the built-in classifications in Azure Purview, check out System classifications in Microsoft Purview.
Sensitivity labels are created within M365 Purview (see Create and publish sensitivity labels). Also within M365, you can setup auto-labeling for items (Office files, Power BI items, files in ADLS, emails), defining the conditions where you want your label to be automatically applied to your data (see apply a sensitivity label to data automatically). So you can, for example, automatically apply a Highly Confidential label to any content that contains customers’ personal information, such as credit card numbers, social security numbers, or passport numbers.
When an office document (as an example) has a sensitivity label that was applied manually or by auto-labeling, and then is scanned by Azure Purview into the Microsoft Purview Data Map, the label will be applied to the data asset within Azure Purview. While the sensitivity label is applied to the actual file in Microsoft Purview Information Protection, it’s only added as metadata in the Microsoft Purview Data Map.
Within M365, when creating sensitivity labels, you can choose to setup auto-labeling for schematized data assets (such as Azure SQL Database, Azure Synapse, and Cosmos DB) which will automatically apply sensitivity labels to your data in the Microsoft Purview Data Map. You choose the sensitive info types (SIT) that you want to apply to your label, such as driver’s license number, SSN, or passport number (for example, if an SSN is found, the data asset is marked Highly Confidential). Once you create a sensitivity label, you need to scan your data in the Microsoft Purview Data Map to automatically apply the labels you created, based on the auto-labeling rules you defined. Only columns can be tagged as sensitive, not at the table or database level. Scanning an asset in the Microsoft Purview Data Map applies the labels to assets in the catalog based on the SIT found in the data during the scan – it uses the Azure Purview scanning engine to find the SIT, the same scan process it uses to find classified data based on the Azure Purview classification rules. Sensitivity labels are applied only to the asset metadata in the Microsoft Purview Data Map and aren’t applied to the actual files or database columns. These sensitivity labels don’t modify your files or databases in any way. Applying sensitivity labels manually to data sources within Azure Purview is not supported. For more info, see Labeling in the Microsoft Purview Data Map (preview).
To see what data sources and file types support classification and which support sensitivity labeling, check out Microsoft Purview Data Map available data sources.
What Azure database and report products can use Microsoft Purview Information Protection sensitivity labels?
Microsoft Purview Information Protection sensitivity labels provide a simple and uniform way for your users to tag sensitive data within the products SQL Server, SQL Databases (Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics), and Power BI (dashboards, reports, semantic models, dataflows, and paginated reports). For databases, only columns can be tagged as sensitive, not at the table or database level. Any sources tagged with sensitive data in this way will automatically have the sensitivity labels imported into the Microsoft Purview Data Map in Azure Purview when scanned via Azure Purview. Note that SQL Server and SQL Databases offer both SQL Information Protection policy and Microsoft Information Protection policy to apply sensitivity labels. Labels applied via SQL Information Protection policy are NOT imported into the Azure Purview Data Map, only sensitivity labels applied via Microsoft Information Protection policy. Microsoft Purview Information Protection labels provide a simple and uniform way for users to classify sensitive data uniformly across different Microsoft applications, instead of each application classifying data in their own way.
Is there a way to enforce access control on database columns with sensitive data?
Yes! Azure SQL Database supports the ability to enforce access control on the columns with sensitive data that have been labeled using Microsoft Purview Information Protection sensitivity labels (other sources supported soon are Azure Blob storage, ADLS Gen2, and AWS S3). This enables personas like enterprise security/compliance admins to configure and enforce access control actions on sensitive data in their databases, ensuring that sensitive data can’t be accessed by unauthorized users for a particular sensitivity label. To configure and enforce Purview access policies, the database must be registered in the Microsoft Purview Data Map and scanned by Azure Purview, so that Microsoft Purview Information Protection sensitivity labels get assigned by Azure Purview to the database columns containing sensitive data. Once sensitivity labels are assigned, the user can configure Microsoft Purview Information Protection access policies to enforce deny actions on database columns with a specific sensitivity label, restricting access to sensitive data in those columns to only an allowed user or group of users. Any attempt by an unauthorized user to run a T-SQL query to access columns in a Azure SQL database with sensitivity label scoped to the policy will fail. This feature requires existing Microsoft Purview accounts upgraded to Microsoft Purview single tenant model and new Portal experience, using the enterprise version of Microsoft Purview. See Enabling access control for sensitive data using Microsoft Purview Information Protection policies (public preview) and Enable data policy enforcement on your Microsoft Purview sources and Authoring and publishing protection policies (preview).
How do sensitivity labels work in Power BI?
Sensitivity labels from Microsoft Purview Information Protection provide a simple way for your users to classify critical content in Power BI without compromising productivity or the ability to collaborate. They can be applied in both Power BI Desktop and the Power BI service, making it possible to protect your sensitive data from the moment you first start developing your content on through to when it’s being accessed from Excel via a live connection. Sensitivity labels are retained when you move your content back and forth between Desktop and the service in the form of .pbix files.
In the Power BI service, sensitivity labels, once enabled, can be applied to semantic models, reports, dashboards, and dataflows. When labeled data leaves Power BI, either via export to Excel, PowerPoint, PDF, or .pbix files, or via other supported export scenarios such as Analyze in Excel or live connection PivotTables in Excel, Power BI automatically applies the label to the exported file and protects it according to the label’s file encryption settings. This way your sensitive data can remain protected, even when it leaves Power BI. You can require your organization’s Power BI users to apply sensitivity labels to content they create or edit in Power BI.
In addition, sensitivity labels can be applied to .pbix files in Power BI Desktop, so that your data and content is safe when it’s shared outside Power BI (for example, so that only users within your organization can open a confidential .pbix that has been shared or attached in an email), even before it has been published to the Power BI service. See Restrict access to content by using sensitivity labels to apply encryption for more detail.
In the Power BI service, sensitivity labeling does not affect access to content. Access to content in the service is managed solely by Power BI permissions. While the labels are visible, any associated encryption settings (configured in the Microsoft Purview compliance portal) aren’t applied. They’re applied only to data that leaves the service via a supported export path, such as export to Excel, PowerPoint, or PDF, and download to .pbix.
In Power BI Desktop, sensitivity labels with encryption settings do affect access to content. If a user doesn’t have sufficient permissions according to the encryption settings of the sensitivity label on the .pbix file, they won’t be able to open the file. In addition, in Desktop, when you save your work, any sensitivity label you’ve added and its associated encryption settings will be applied to the saved .pbix file.
For more info, see Sensitivity labels from Microsoft Purview Information Protection in Power BI.
More info:
Understanding Sensitivity Labels: Set Up and Management Across Power BI, Azure Purview, and O365 (video)
Microsoft 365 Information Protection & How it REALLY Works! (video)
Pingback:Microsoft Purview Classifications and Sensitivity Labels – Curated SQL
Nice content!!