When to use T-SQL or SSIS for ETL
When doing ETL, you have the choice of using T-SQL or SSIS. What things should you consider when deciding which one to use? Here are some of the major design considerations to think about. These points were derived from Tim Mitchell’s (site) presentation:
- Performance – With T-SQL, everything is processed within the SQL engine. With SSIS, you are bringing all the data over to the SSIS memory space and doing the manipulation there. So if speed is an issue, usually T-SQL is the way to go, especially when dealing with a lot of records. Something like a JOIN statement in T-SQL will go much faster than using lookup tasks in SSIS. Another example is a MERGE statement in T-SQL has much better performance than a SCD task in SSIS for large tasks
- Features/capabilities – Some features can only be done in either T-SQL or SSIS. You can shred text in SSIS, but can’t in T-SQL. For example, text files with an inconsistent number of fields per row can only be done in SSIS. So certain tasks may force you into using one or the other
- Current skill set – Are the people in your IT department more familiar with SSIS or T-SQL?
- Ease of development/maintenance – Of course, whatever one you are most familiar with will be the easiest, but if your skills at both are fairly even, then SSIS is usually easier to use because it is graphical, but sometimes you can develop quicker in T-SQL. For example, having to join a bunch of tables will require a bunch of tasks in SSIS, where in T-SQL it is one statement. So it might be easier to create the tasks to join the tables in SSIS, but it will take longer to build then writing a T-SQL statement
- Complexity – SSIS can be more complex because you might need to create many tasks to accomplish your objective, where in T-SQL it might just be one statement, like in the example above for joining tables
- Extensibility – SSIS has better extensibility because you can create a script task that uses C# that can do just about anything, especially for non-database related tasks. T-SQL is limited because it is only for database tasks. SSIS also has logging, which T-SQL does not
- Likelihood of depracation/breaking changes – Minor issue, but T-SQL is always removing features in each release that will have to be rewritten
- Types/architecture of sources and destinations – SSIS is better if you have multiple types of sources. For example, it works really well with Oracle, XML, flat-files, etc. SSIS was designed from the beginning to work well with other sources, where T-SQL is designed for SQL Server and it requires more steps to access other sources, and there are additional limitations when doing so
- Local regulations – Are there some company standards you have to adhere to that would limit which tool you can use?
If you decide T-SQL is the way to go and you just want to execute a bunch of T-SQL statements, it’s still a good idea to wrap them in SSIS Execute SQL Tasks because you can use logging, auditing and error handling that SSIS provides that T-SQL does not. You can also easily run SSIS Execute SQL Tasks in parallel so you are able to run stored procedures in parallel.
Note that most people use a hybrid approach, where you use SSIS, but for certain situations use an Execute SQL Task in SSIS to execute T-SQL instead of using a data flow. And that Execute SQL Task can also call a stored procedure. Some like this because it allows you to make a change in the stored procedure, avoiding having to make a change in SSIS and re-deploying the package. The negative side of this is that instead of having everything contained within SSIS, you instead have to jump back and forth between SSIS and SSMS when you are building or debugging a project. It’s a balance you will have to weigh as you go along.
More Info:
Video ETL Head-To-Head: T-SQL vs. SSIS by Tim Mitchell
Row Insert from SSIS package Vs Transact-SQL Statements
SSIS vs T-SQL – which one is fastest for ETL tasks?
On The Board #8 : ETL in T-SQL vs. SSIS
BI Best Practices: SQL vs. SSIS Tools (in SSIS ETL packages)
SQL Server Data Migration Approaches: SSIS vs. SQL Server Stored Procedure
Hi James,
Good article. I wrote a lengthy reply but accidentally hit CTRL-W and lost the whole damn lot 🙂
I can’t be bothered writing it all out again but I will bring up what I think is one vital point; that is, you haven’t explicitly mentioned the one true differentiator of SSIS – its ability to combine data from different data sources and operate upon it *in a single operation*.
Everything else is by-and-large subjective.
Extensibility? T-SQL has got SQLCLR.
Complexity? That simply depends on one’s opinion.
Ease of dev – Again, a matter of opinion.
The one absolute differentiator of SSIS dataflows is their heterogeneity.
JT
P.S. Can you give an example of the following:
“text files with an inconsistent number of fields per row can only be done in SSIS”
I don’t see what about this scenario makes it easier to deal with this in SSIS (except for the fact that it has the Flat File source – but I don’t think that’s what you’re getting at)
Thank you for writing this article down. Out of frustration to prove to the hiring manger that I am right on T-SQL is faster and outperformance SSIS in many different ways although SSIS has man advantages over T-SQL and your article covers that.
(but do I need to prove to stupid people that they don’t know the stuff or should I move on)
Two points I want to add to your article are these.
1. SSIS can only handle row by row process (procedural) where as T-SQL can handle both procedural and set-based.
2. SSIS can’t handle recursion such parent/child relationship.
I know your post is an year ago but thought of replying,the right way to do it is using tsql inside a storeprocedure and place it in SSIS package,1)the ssis package can handle setbased all you need to do is first load into a temporary table and bulk update to actual table which is same when it comes to tsql 2)handling recursion such parent/child relationship need to be done with tsql in the source task or lookup task of ssis depending on what you need.
Pingback:SSIS Lookup or T-SQL Join « Derek Tech Cornor
Pingback:Presentation Slides for Building an Effective Data Warehouse Architecture | James Serra's Blog
Its really grt post and made me to learn , ‘ how to use non-equi join using lookup and merge join transformation. Thanks a lot.
Pingback:SQL Server Agent job steps vs SSIS - SQL Server - SQL Server - Toad World
Really excellent article! Thank you.
I agree with Jamie Thomson’s comment.
Another point worth a mention is error-handling when importing/exporting rows. It is a lot easier to handle erroneous data and process it differently in SSIS as compared to SQL.
Pingback:Preparation for the 70-467 SQL BI exam | x86x64
Pingback:SQL Server:SSIS vs Stored Procedure – Andy Tsai學習筆記
Hi James,
I have a question about ETL. When designing an ETL we can do data transformation steps in database (procedures or SQL) and we can do this steps in a ETL tool (PowerCenter, Pentaho DI, SSIS…), as far as I know there are steps that have better performance in ETL tool and others have better performance in database. For example there are ETL tools that make the sorts or aggregation faster than database procedures or SQL.I would like to know if there are more steps that go different in a ETL tool than in a database, in order to have some criteria to make a design decision.
Thanks in advance
Hi J.Vidal!
Regarding your question on if there are more steps that go different in a ETL tool than in a database, I think, our “SSIS vs ETL” article will provide a comprehensive reply: https://skyvia.com/blog/ssis-vs-etl-best-ssis-alternatives-of-2021-2022
The greatest pro T-SQL point is conversion to another database, like Oracle. There are tools to convert T-SQL code. There are no tools that I’m aware of that will convert SSIS packages.
So , not only is T-SQL faster in performance, but if the business is even considering moving from SQL Server to Oracle then T-SQL is definitely the way to go.
Good point Todd. There are also tools to convert from Oracle to T-SQL 🙂
I’m just curious how this blog post holds up in 2023. Have there been any big developments that might change your points listed in the post?
Thanks!
I agree with using sql will make the transforms much faster. Export data into raw files, and import data into staging tables, then use sql to transform into your public schema. We saw improvement of more than 10x speed by moving our transforms to sql.