More Azure Blob Storage enhancements
(updated 12/6/20)
I recently blogged about Query Acceleration for ADLS, which also applies to Azure Blob storage. Now there are more new features for blog storage that I will talk about. To see which of these features is also available in ADLS Gen2, check out Blob storage features available in Azure Data Lake Storage Gen2.
Blob index preview: Recently announced in preview, blob index is a managed secondary index that allows you to store multi-dimensional object attributes to describe your data objects for Azure Blob storage. This allows you to categorize and find data based on attribute tags set on the data. Cool! To populate the blob index, you define key-value tag attributes on your data, either on new data during upload or on existing data already in your storage account. These blob index tags are stored alongside your underlying blob data. The blob indexing engine then automatically reads the new tags, indexes them, and exposes them to a user-queryable blob index. Blob Index not only helps you categorize, manage, and find your blob data but also provides integrations with other Blob service features, such as Lifecycle management, allowing you to move data to cooler tiers or delete data based on the tags applied to your blobs.
The below scenario is an example of how Blob Index works:
- In a storage account container with a million blobs, a user uploads a new blob “B2” with the following blob index tags: < Status = Unprocessed, Quality = 8K, Source = RAW >
- The blob and its blob index tags are persisted to the storage account and the account indexing engine exposes the new blob index shortly after
- Later on, an encoding application wants to find all unprocessed media files that are at least 4K resolution quality. It issues a FindBlobs API call to find all blobs that match the following criteria: < Status = Unprocessed AND Quality >= 4K AND Status == RAW>
- The blob index quickly returns just blob “B2,” the sole blob out of one million blobs that matches the specified criteria. The encoding application can quickly start its processing job, saving idle compute time and money
It will eventually work for ADLS Gen2. There is no cost for the indexing engine. For more info including signing up for the preview, see Manage and find data on Azure Blob Storage with Blob Index.
Geo-Zone-Redundant Storage (GZRS): GZRS and Read-Access Geo-Zone-Redundant Storage (RA-GZRS) are now generally available. GZRS writes three copies of your data synchronously across multiple Azure Availability zones, similar to Zone redundant storage (ZRS), providing you continued read and write access even if a datacenter or availability zone is unavailable. In addition, GZRS asynchronously replicates your data to the secondary geo pair region to protect against regional unavailability. RA-GZRS exposes a read endpoint on this secondary replica allowing you to read data in the event of primary region unavailability. To learn more, see Azure Storage redundancy.
Account failover (GA 6/21/20): Customer-initiated storage account failover is now generally available, allowing you to determine when to initiate a failover instead of waiting for Microsoft to do so. When you perform a failover, the secondary replica of the storage account becomes the new primary. The DNS records for all storage service endpoints—blob, file, queue, and table—are updated to point to this new primary. Once the failover is complete, clients will automatically begin reading from and writing to data to the storage account in the new primary region, with no code changes. Customer initiated failover is available for GRS, RA-GRS, GZRS and RA-GZRS accounts. To learn more, see Disaster recovery and account failover
Versioning (GA 9/7/20): Versioning automatically maintains prior versions of an object and identifies them with version IDs. You can restore a prior version of a blob to recover your data if it is erroneously modified or deleted. A version captures a committed blob state at a given point in time. When versioning is enabled for a storage account, Azure Storage automatically creates a new version of a blob each time that blob is modified or deleted. Versioning and soft delete work together to provide you with optimal data protection. To learn more, see Blob versioning. Also check out Comparing Azure Storage Blob Versions and Snapshots
Point in time restore (GA 9/27/20): Point in time restore for Azure Blob Storage provides storage account administrators the ability to restore a subset of containers or blobs within a storage account to a previous state. This can be done by an administrator to a specific past date and time in the event of an application corrupting data, a user inadvertently deleting contents, or a test run of a machine learning model. Point in time restore makes use of Blob Change feed (GA 9/13/20). To learn more, see Point in time restore.
Blob Change feed (GA 9/13/20): Change feed enables recording of all blob creation, modification, and deletion operations that occur in your storage account. More info.
Routing preferences preview: Configure a routing preference to direct network traffic for the default public endpoint of your Storage account using the Microsoft global network or using the public internet. Optimize for premium network performance by using the Microsoft global network, which delivers low-latency path selection with high reliability and routes traffic through the point-of-presence closest to the client. Alternatively, route traffic through the point-of-presence closest to your storage account to lower network costs and minimize traversal over the Microsoft global network. Routing configuration options for your Storage account also enable you to publish additional route-specific endpoints. Use these new public endpoints to override the routing preference specified for the default public endpoint by expliciting routing traffic over a desired path. Learn more.
Object replication (GA 9/13/20): Object replication is a new capability for block blobs that lets you asynchronously replicate your data from your blob container in one storage account to another anywhere in Azure. Object replication unblocks a new set of common replication scenarios:
- Minimize latency – have your users consume the data locally rather than issuing cross-region read requests
- Increase efficiency – have your compute clusters process the same set of objects locally in different regions
- Optimize data distribution – have your data consolidated in a single location for processing/analytics and then distribute only resulting dashboards to your offices worldwide
- Minimize cost – tier down your data to Archive upon replication completion using lifecycle management policies to minimize the cost
Please refer to Object Replication documentation for more details.
Azure Blob access time tracking and access time-based lifecycle management preview: Some data in Azure Blob storage is written once and read many times after that. To accurately manage the lifecycle of these data, it is crucial to know the last access/read time. Announcing the public preview of blob access time tracking and access time-based lifecycle management. Once access time tracking is enabled, each blob has a new property called last access time which is updated when the blob is read. Azure Blob lifecycle management supports using last access time as a filter to transition data betwen access tiers and manage data retention. You can minimize your storage cost automatically by setting up a policy based on last access time to:
- Transition your data from a hotter access tier to a cooler access tier (hot to cool, cool to archive, or hot to archive) if there is no access for a period
- Transition your data from the cool tier to the hot tier immediately if there is an access on the data
- Delete your data if there is no access for an extended period
The preview is available in France Central, Canada East, and Canada Central regions. Learn more about lifecycle management.
It is free to turn on access time tracking for your storage accounts. Customers are charged an operation which is categorized as “other operation” whenever last access time is updated. It is free to set up lifecycle managment policies for your storage accounts. Customers are charged the regular operation cost for the Set Blob Tier API calls. Delete operation is free. For more information about pricing, see Block Blob pricing.
Soft Delete for Containers public preview: Enterprises, partners, and IT professionals store business-critical data in Azure Blob Storage. We are committed to providing the best-in-class data protection and recovery capabilities to keep your applications running. Soft delete for containers expands upon Azure Blob Storage’s existing capabilities such as soft delete for blobs, account delete locking, and immutable blobs, making our data protection and restore capabilities even better.
When container soft delete is enabled for a storage account, any deleted container and their contents are retained in Azure Storage for the period that you specify. During the retention period, you can restore previously deleted containers and any blobs within them.
Container soft delete is available in preview in the following regions: France Central, Canada East, and Canada Central. There is no additional charge to enable container soft delete. Data in soft deleted containers is billed at the same rate as active data. Learn more about soft delete for containers (preview).
Blob storage with new 200 TB object sizes: We recently announced the preview of our new maximum blob size of 200 TB (specifically 209.7 TB), increasing our current limit of 5TB in size, which is a 40x increase! The increased size of over 200TB per object is much larger than other vendors that provide a 5TB max object size. This increase allows workloads that currently require multi-TB size files to be moved to Azure without additional work to break up these large objects. More info
Azure Storage blob inventory public preview: Provides an overview of your blob data within a storage account. Use the inventory report to understand your total data size, age, encryption status, and so on. Enable blob inventory reports by adding a policy to your storage account. Add, edit, or remove a policy by using the Azure portal. Once enabled, an inventory report is automatically created daily. More info
More info:
Azure Blob Storage enhancing data protection and recovery capabilities
Comments
More Azure Blob Storage enhancements — No Comments
HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>