Difference between ETL and ELT
ETL is the most common method used when transferring data from a source system to a data warehouse. But there are cases where you might want to use ELT. What is the difference between the two? Read on:
Extract, Transform, and Load (ETL) is a process that involves extracting data from outside sources, transforming it to fit operational needs (sometimes using staging tables), then loading it into the end target database or data warehouse. This approach is reasonable as long as many different databases are involved in your data warehouse landscape. In this scenario you have to transport data from one place to another anyway, so it’s a legitimate way to do the transformation work in a separate specialized engine. Typically the entire process is done in SSIS using data flows.
Extract, Load, Transform (ELT) is a process where data is extracted for the source, then loaded into a staging table in the database, transforming it where it sits in the database and then loading it into the target database or data warehouse. When the data is extracted from the source into the staging tables, it is a raw copy, meaning you keep the column names the same as in the source database and you don’t convert data, calculate new data fields, etc. You may however filter unneeded rows and columns as you extract data so that you don’t waste resources on unneeded data. The process of moving data from the source to the staging tables is done in SSIS using data flows, but the process of moving the data from the staging tables to the data warehouse can be done with T-SQL instead for a performance boost along with the fact that it is usually easier to code than using SSIS data transformations.
ELT should be used, instead of ETL, in these various cases :
- There are big volumes of data
- The source database and the target database are the same
- The database engine is well adapted for that kind of processing, such as PDW, which is great at loading massive amounts of data very quickly
So in short, when using ETL the transformations are processed by the ETL tools, while in ELT the transformations are processed by the target datasource.
ELT has the benefit of minimizing the processing on the source since no transforming is being done, which can be extremely important if the source is a production system where you could be impacting the user experience as opposed to a copy of the source (via replication, database snapshot, etc). The negative of this approach is it may take longer to get the data into the data warehouse as with the staging tables you have an extra step in the process, and you will need more disk space for the staging tables.
More info:
Difference between ETL and ELT
Designing an ETL process with SSIS: two approaches to extracting and transforming data
Great, fast read, making the comparison simple, thanks.James
I am glad I came across this post…because my team is actually using ELT…Thanks, Jackie
We largely use ELT in large part so that we have the unaltered data in our database for reference if there are any problems that need troubleshooting.
What I don’t like about ETL is that detecting errors can be more difficult, and then fixing them can be even more difficult because this may require modifications to SSIS packages or other such things.
I prefer to have the load process as “dumb” as possible, just load EXACTLY what is in the file into the database, change NOTHING. Then, once in the database, its easier to see the data, profile the data, make assessments and in the even of unexpected data, address it as needed instead of having largely hidden processes dealing with it.
I will say though that we deal largely with custom data loads, for a data feed that is well defined and consistent, then ETL makes more sense. When loading “new” data, or data from new sources, etc., I prefer ELT.
Great input, thanks!
If one were a strong ETL developer/engineer/architect, could one easily transition into an ELT environment and vice-versa?
How long would it take to get accustomed assuming we are in a vacuum?
Thanks in advance
Pingback:Comparison between ETL and ELT « BLOGGERZSPOT.COM
Pingback:ETL or ELT… or both?? | Microsoft OLAP Blog
Hi,
Many other sources tell me exact opposite of the definitions you’ve provided above.. can you please elaborate? What I know and read in multiple posts, that ETL is about “Extracting data from source systems” , landing data in staging, then performing some “Transformations” and finally “Loading into target system”.
While ELT is when data is “Extracted from source systems”, and directly “Loaded to target system” and “Transformations” are applied there..
Please let me know if you meant the same or I’m missing something?
Thanks & Regards.
Very good brief and this post helped me alot. Say thank you I searching for your facts. Thanks for sharing with us!
Pingback:ETL or ELT… or both?? « ORAYLIS Blog
Pingback:ETL or ELT… or both?? |
Pingback:Big Data Architectures | James Serra's Blog
Pingback:Integration Services Performance Best Practices – Writing to the Destination – My Blog