Monday, August 30, 2010

SSIS-Parallel Processing

Parallel execution improves performance on computers that have multiple physical or logical processors. To support parallel execution of different tasks in the package, SSIS uses two properties: MaxConcurrentExecutables and EngineThreads.



MaxConcurrentExcecutables Property

The MaxConcurrentExecutables property is a property of the package. This property defines how many tasks can run simultaneously; by specifying the maximum number of SSIS threads that can execute in parallel per package. The default value is -1, which equates to the number of physical or logical processors plus 2. Using a package which calls another package, in this example MaxConcurrentExecutables has its default value set as -1 and the server running the package has 8 processors, all 10 tasks (taking the Execute Package task in the example, though it applies in the same way to other tasks as well) are executed in parallel, as shown below:

If MaxConcurrentExecutables was changed to 4 in the above package and run it on the same server, then only 4 tasks will runn in parallel at a time (Note the image below shows tasks are executed in a batch of 4, once 4 tasks are executed another batch of 4 tasks will be executed)



Note:

If SSIS runs on a dedicated server and there are a lot of operations that run in parallel, it can be beneficial to increase this setting if some of the operations (threads) do a lot of waiting for external systems to reply. However if there's no dedicated SSIS machine and the data integration application runs alongside several other applications, reducing this setting should avoid resource conflicts.

EngineThreads Property

The EngineThreads property is a property of each Data Flow task. This property defines how many threads the data flow engine can create and run in parallel. The EngineThreads property applies equally to both the source threads that the data flow engine creates for sources and the worker threads that the engine creates for transformations and destinations. Therefore, setting EngineThreads to 10 means that the engine can create up to ten source threads and up to ten worker threads.

Note:

A source thread pulls the data from the data source whereas worker thread performs transformation and upload data into the destination.



To understand how this property works, consider the sample package with three Data Flow tasks. Each Data Flow task contains ten source-to-destination execution trees. If the EngineThreads are set to 10 on each Data Flow task, all 30 execution trees can potentially run simultaneously (considering all three data flow tasks run in parallel because of MaxConcurrentExecutables property of the package has either -1 or 3).

As discussed above, the execution tree is a unit of work for a worker thread in the SSIS process. A worker thread may execute one or more execution trees, but each transformation listed in the execution tree will be executed on the same thread. A thread will process one buffer at a time, executing it against all transforms in the execution tree before working on the next buffer in the flow, at which point it would pass the current buffer to another thread executing another execution tree and it would pull a new data buffer from its buffer list which was queued from an upstream component (either a data source or the last asynchronous transform before this execution tree started).

Because an execution tree is tied to a thread, it is often recommended to increase the EngineThreads setting to equal or greater than the number of execution trees listed in the PipelineExecutionTrees log event.

The default is 5 in SQL Server 2005 and 10 in SQL Server 2008, with a minimum value of 2. However, the engine will not use more threads than it needs, regardless of the value of this property. The engine may also use more threads than specified in this property, if necessary to avoid concurrency issues.

However, the general rule is to not run more threads in parallel than the number of available processors. Running more threads than the number of available processors can hinder performance because of the frequent context-switching between threads.



Note:

Parallelism is a double-edge sword, on one hand a properly designed package noticeably improves performance by parallel running several tasks of the package or several execution trees of data flow; however a poorly designed package can severely impact performance if the number of threads exceed the number of available processors, throughput may be harmed due to an excessive amount of context switching. Therefore it's recommended that testing takes place considering available system resources or environment, loaded on the server, if the server is only ETL server or it is a database server as well.

8 comments:

  1. As per the explaination :

    The default value is -1, which equates to the number of physical or logical processors plus 2. In the example MaxConcurrentExecutables has its default value set as -1 and the server running the package has 8 processors, all 10 tasks will run in parallel (8 physical processor +2 = 10), so does it mean that we dont have to change the MaxConcurrentExecutables value to 8 ? so that it can run all 10 DFT in parallel ?

    Also if we want to run 4 pkg in parallel , we have to change the MaxConcurrentExecutables value to 4 ? why ?

    Can some one explain this property in detail ,

    thanks

    ReplyDelete
  2. It was just awesome pieace of information. thanks for sharing!

    ReplyDelete
  3. Nice article, Ravi. Would you say that this remains correct for SSIS on SQL Server 2016, 2017, and 2019?

    Is there any mechanism to set the value of MaxConcurrentExecutables and EngineThreads from outside the package? A project setting? Environment variable?

    Promoting from development to test to prod should not require a package to be changed.

    ReplyDelete
    Replies
    1. Hi, Were you able to get the solution.Please let me know as I new to SSIS and facing the same issue.

      Delete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Caesars Entertainment and Hollywood Casino in Las Vegas - Dr.
    › › May 3, 2017 — May 3, 2017 Caesars Entertainment and Hollywood 보령 출장샵 Casino in Las Vegas is one of the largest 창원 출장안마 and most prestigious entertainment 수원 출장안마 and 평택 출장샵 gaming destinations in 구리 출장샵 the world.

    ReplyDelete