Parallel execution in SSIS
Parallel execution in SSIS improves performance on computers that have multiple physical or logical processors. To support parallel execution of different tasks in a package, SSIS uses two properties: MaxConcurrentExecutables and EngineThreads. If you are like me, you probably did not even know about these two properties, and therefore were unaware of the opportunity to make your SSIS packages execute faster. A description of each property:
The MaxConcurrentExecutables property is a property of the package. This property defines how many tasks can run simultaneously by specifying the maximum number of executables that can execute in parallel per package. The default value is -1, which equates to the number of physical or logical processors plus 2.
The EngineThreads property is a property of each Data Flow task. This property defines how many threads the data flow engine can create and run in parallel. The EngineThreads property applies equally to both the source threads that the data flow engine creates for sources and the worker threads that the engine creates for transformations and destinations. Therefore, setting EngineThreads to 10 means that the engine can create up to ten source threads and up to ten worker threads. The default is 5 in SQL Server 2005 and 10 in SQL Server 2008, with a minimum value of 2.
One other thing to consider: If you are using the Execute Package Task, the child package to be executed can be run in-process or out-of-process by use of the ExecuteOutOfProcess property. If a child package is executed out-of-process, you will see another dtshost.exe process start. These processes will remain “live”, using up resources, for quite a while after execution is complete.
If executing in-process, a bug in a task of the child package will cause the master package to fail. Not so if executing out-of-process. On 32-bit systems a process is able to consume up to 2GB of virtual memory. Executing out-of-process means each process can claim its own 2GB portion of virtual memory. Therefore if you are simply using many packages to structure your solution in a more modular fashion, executing in-process is probably the way to go because you don’t have the overhead of launching more processes.
More info:
Improving the Performance of the Data Flow
Designing Your SSIS Packages for Parallelism (SQL Server Video)
I have a parent package running mutiple children packages (about 20). The MaxConcurrentExecutables is set to -1, Children processes Execute in process.
SQL Server 2008 R2, 64 bit server. When I run the parent package I get the following ERROR Error: Error 0xC0011008 while preparing to load the package. Error loading from XML. No further detailed error information can be specified for this problem because no Events object was passed where detailed error information can be stored.
When I set the MaxConcurrentExecutables to 1 or 2 the jobs runs fine, but beyond that it fails on arbitary children packages.
Could you point out what mistake I might be doing? I need to gain performance by running the packages in parallel.
“the child package to be executed can be run in-process or out-of-process by use of the ExecuteOutOfProcess property”
unfortunately I don’t think setting ExecuteOutOfProcess to true makes it run asynchronously. It’s a bit of a misleading property. You have to use an ExecuteProcess task to accomplish true Parallel execution.
Hello James,
I have a package, where in db table, it has 10 rows. I want to achive parallel processing of each row with the help of sepearte tasks. I am also not sure how many parallel task i need to add to package to achive this. Is it possible? if yes, pls guide.
Interesting thread (no pun intended).
I recently encountered an issue with a Data Flow component in one of our legacy packages. The data source pulled from a simple procedure that returned a mid-sized rowset from several joins. The issue turned out to be that parallelism was causing the query to clock itself.
When run at night with many other jobs, (the theory is that) resource contention caused less threads to be used and the package succeeds. During the day, we could see many threads being spawned and blocking the same spid at the DB level.
We are experimenting with MAXDOP(1) hint in the query, which seems to resolve the problem.
I am curious if setting the EngineThreads to 1 in the Data Flow tasks would have the same effect? There are considerations like parent packages that make it tough to isolate specific task settings for iterative testing. Does anyone have direct experience with similarities (or differences) between a MAXDOP setting in the query vs. an EngineThreads setting in the package task?
Correction : “caused the query to Block itself”.
Scenario : Need to build a package which loads the data from the flat file to the target tables in Oracle database with time stamp + milliseconds as data type for one of the field.
We were able to build the package and load the data with the datatype as timestamp for that specific column however we were not able to load the data with the timestamp + milliseconds datatype.
Could some one help me out here
Regards
Srivatsav
I have what I consider a more robust solution for running a single SSIS package in parallel. Please check out the article: https://www.mssqltips.com/sqlservertip/4329/performance-improvements-to-process-a-large-number-of-files-with-ssis/