Azure Data Lake Store Gen2 is GA
Azure Data Lake Store (ADLS) Gen2 was made generally available on February 7th. In short, ADLS Gen2 is the best of the previous version of ADLS (now called ADLS Gen1) and Azure Blob Storage. ADLS Gen2 is built on Blob storage and because of that it is a “ring 0” service and is available in all regions. To create an ADLS Gen2 account, see Quickstart: Create an Azure Data Lake Storage Gen2 storage account. Note that creating Blob storage or ADLS Gen2 both follow the same process, which is to create a storage account. The difference is there is an Advanced tab with an ADLS Gen2 section where you set Hierarchical namespace (HNS) to Enabled to make the account ADLS Gen2 instead of Blob.
ADLS Gen2 has most of the features of both ADLS Gen1 and Blob storage (with the features not supported yet listed below). Features currently supported include limitless storage capacity, Azure Active Directory integration, hierarchical file system, and read-access geo-redundant storage.
When to use Blob vs ADLS Gen2
New analytics projects should use ADLS Gen2, and current Blob storage should be converted to ADLS Gen2, unless these are non-analytical use cases that only need object storage rather than hierarchical storage (i.e. video, images, backup files), in which case you can use Blob Storage and save a bit of money on transaction costs (storage costs will be the same between Blob and ADLS Gen2 but transaction costs will be a bit higher for ADLS Gen2 due to the overhead of namespaces).
Upgrade guidance
Soon existing Blob storage accounts (general purpose v2) will have a seamless mechanism to convert it to ADLS Gen2 to get all the features of ADLS Gen2 (eg. go to the Configuration screen and set Hierarchical namespace to Enabled). If your Blob storage account is general purpose v1, you will first need to upgrade it to general purpose v2 before you can set Hierarchical namespace to Enabled. For now you can’t upgrade to ADLS Gen2 this way so you will need to copy your data from Blob storage to ADLS Gen2.
For existing customers of ADLS Gen1, no new features will be added to ADLS Gen1. You can stay on ADLS Gen1 if you don’t need any of the new capabilities or can move to ADLS Gen2 where you can leverage all the goodness of the combined capabilities and save money (ADLS Gen2 pricing will be roughly half of ADLS Gen1). You can upgrade when you chose to do so, which is done by copying the data from ADLS Gen1 to ADLS Gen2 as well as any app modifications required to run on ADLS Gen2, with migration tools coming soon. See Upgrade your big data analytics solutions from Azure Data Lake Storage Gen1 to Azure Data Lake Storage Gen2 for help on migrating as well as the list of some Azure products and 3rd-party products that don’t yet support ADLS Gen2 that may make you want to hold off on migrating.
What works and what does not work
You might want to stay with Blob storage for now if there is a feature that is not yet supported in ADLS Gen2 such as soft delete, snapshots, object level storage tiers (Hot, Cool, and Archive), and lifecycle management (see Known issues with Azure Data Lake Storage Gen2). If you are using a 3rd-party app or an Azure app, make sure that it supports ADLS Gen2 (see Upgrade your big data analytics solutions from Azure Data Lake Storage Gen1 to Azure Data Lake Storage Gen2). If you are using the WASB or ADLS driver, it will be as simple as switching to the new ADLS Gen2 driver and changing configs.
Blob Storage APIs and ADLS Gen2 APIs aren’t interoperable with each other yet in an ADLS Gen2 account (an account with the HNS enabled). If you have tools, applications, services, or scripts that use Blob APIs, and you want to use them to work with all of the content that you upload to your account, then don’t upgrade yet until Blob APIs become interoperable with ADLS Gen2 APIs. Using a storage account without a hierarchical namespace means you then don’t have access to ADLS Gen2 specific features, such as directory and filesystem access control lists.
Note that the file size limit in ADLS Gen2 is 5TB.
Replication options
Below is an excellent slide that describes the four replication options available with Azure Blob Storage which is also available for ADLS Gen2. Notice the SLA which is a guarantee of the percentage of time Azure will successfully process requests to read data or write data from storage. Also notice the durability, which specifies the chances of losing data, which is incredible small.
Check out the Azure Data Lake Storage Gen2 overview video for more info as well as A closer look at Azure Data Lake Storage Gen2 and finally check out the Gen2 documentation.
More info:
Individually great, collectively unmatched: Announcing updates to 3 great Azure Data Services
hi James – Thanks for the article.
Can you please confirm this statement – “For now you can’t upgrade to ADLS Gen2 this way so you will need to copy your data from Blob storage to ADLS Gen2.” – this can be done today as one-click in the Azure Portal, or via PowerShell, CLI etc – https://docs.microsoft.com/en-us/azure/storage/common/storage-account-upgrade
Is this correct?
No, you will need to copy the data using a tool such as Azure Data Factory.
Pingback:Azure Data Lake Store Gen2 – Curated SQL
Pingback:Azure Data Lake Store Gen2 is GA by James Serra | South Florida SQL Server User Group