Copying data from Azure Blob Storage
In a previous blog I talked about copying on-prem data to Azure Blob Storage (Getting data into Azure Blob Storage). Let’s say you have copied the data and it is sitting in Azure Blob Storage (or an Azure Data Lake) and you now want to copy it from Azure Blob Storage into either SQL Server on an Azure Virtual Machine (SQL Server IaaS), SQL DW, or SQL DB. Below I cover the various ways to do this by listing the technology and the supported destinations:
- PolyBase (SQL DW, SQL Server 2016 IaaS). PolyBase allows you to use T-SQL statements to access data stored in Hadoop or Azure Blob Storage and query it in an adhoc fashion. For SQL DW, see Load data with PolyBase. For SQL Server 2016 IaaS, see PolyBase
- SQOOP (SQL DW, SQL DB, SQL Server IaaS). SQOOP is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. See Use Sqoop with Hadoop in HDInsight (Windows)
- Azure Data Factory (ADF) (SQL DW, SQL DB, SQL Server IaaS). ADF is a cloud service for processing structured and unstructured data from nearly any source. For SQL DW, see Move data to and from Azure SQL Data Warehouse using Azure Data Factory. For SQL DB, see Move data to and from Azure SQL using Azure Data Factory. For SQL Server IaaS, see Move data to and from SQL Server on-premises or on IaaS (Azure VM) using Azure Data Factory. UPDATE: Released on March 18th was a Copy Wizard within ADF that gives you an interactive data movement experience to easily move data between Azure Blob Storage, Azure SQL Database, Azure SQL Data Warehouse, On-Premises SQL Server, Azure Data Lake, Oracle, MySQL, DB2, Sybase, PostgreSql and Teradata using a simple and code free wizard. It supports both one-time and scheduled copy operations.
- SSIS (SQL DW, SQL DB, SQL Server IaaS). SSIS (SQL Server Integration Services) is a platform for building enterprise-level data integration and data transformations solutions. See Microsoft SQL Server 2014 Integration Services Feature Pack for Azure and Data Flow and How to use SQL server Integration services (SSIS) to migrate data from SQL server to SQL Azure
- BCP: BCP is a utility that bulk copies data between an instance of Microsoft SQL Server and a data file in a user-specified format. Copy flat files out of Azure Blob using AzCopy or Azure Storage Explorer then import flat files using BCP (SQL DW, SQL DB, SQL Server IaaS). For SQL DW, see Load data with bcp. For details on how to use BCP, see bcp Utility
- Data Warehouse Migration Utility. This can be used not if your data is in blob storage but rather in SQL Server IaaS or SQL DB. Download here. Use this to migrate schema and data from SQL Server and SQL DB to the new SQL Data Warehouse (SQL DW) service. After schema translation and migration this tool also gives users the ability to create BCP scripts to run that will automatically migrate the data
If loading data into SQL DW, check out loading techniques at Data loading into Azure SQL Data Warehouse.
Here is an important point to keep in mind when reviewing your options if you are building a Big Data platform: all these options are copying data, but you can use PolyBase to query the data as it sits in Azure Blob Storage and avoid the ETL time and storage of copying the data.
Note all the above technologies that work against Azure Blob Storage will also work against Azure Data Lake except for PolyBase, which is not supported yet.
More info:
Load data into SQL Data Warehouse
Comments
Copying data from Azure Blob Storage — No Comments
HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>