SQL Server Integration Services(SSIS): August 2010

Monday, August 30, 2010

SSIS Multicast Transformation vs Conditional Split Transformation

In a data warehousing scenario, it's not rare to replicate data of a source table to multiple destination tables, sometimes it's even required to distribute data of a source table to two or more tables depending on some condition. For example splitting data based on location etc. So how we can achieve this with SSIS? SSIS provides several built-in transformation tasks to achieve these kinds of , for details click here.

Working with SQL Server Service Broker (SSBS) When the Initiator and Target are on the Same Database

Introduction

In my previous article, I introduced you SQL Server Service Broker, what it is, how it works, what its different components are and how they are related to each other. Now it's time to roll up our sleeves and write some SSBS applications. In this article, I will be creating an application in which Initiator and Target both are in the same database. In the next couple of articles, I will be helping you to write SSBS applications if the Initiator and Target are not in the same database.

Problem Statement

There are two applications, one is Order application and the other one is Inventory application. Before accepting any order from the users, the Order application needs to make sure that the product, which is being ordered, is available in the store but for that, the Order application does not want to wait. The Order application will request to check the product stock status asynchronously and will continue doing its other work. On the other side, the Inventory application will listen to the request coming into the queue, process it and respond back to the Order application, again asynchronously.

All in one database

Depending on the arrangement of Initiator and Target, the architecture can be grouped into three categories:

•Initiator and Target in same database

•Initiator in one database and Target in another database on same instance

•Initiator and Target in separate instances

In this example, I will demonstrate how to create an SSBS application if both Initiator and Target are in same database.

Though we can enable Service Broker for an existing database and create SSBS objects in it, for simplicity I will be creating a new database for this demonstration.

USE master;
GO
--Create a database for this learning session, it will help you to do 
--clean up easily, you can create SSBS objects in any existing database
--also but you need to drop all objects individually if you want to do
--clean up of these objects than dropping a single database
IF EXISTS(SELECT COUNT(1) FROM sys.databases WHERE name = 'SSBSLearning')
 DROP DATABASE SSBSLearning
GO
CREATE DATABASE SSBSLearning
GO
 
--By default a database will have service broker enabled, which you can verify 
--with is_broker_enabled column of the below resultset
SELECT name, service_broker_guid, is_broker_enabled, is_honor_broker_priority_on 
FROM sys.databases WHERE name = 'SSBSLearning'
 
--If your database is not enabled for Service Broker becuase you have 
--changed the default setting in Model database, even then you can enable
--service broker for a database with this statement
ALTER DATABASE SSBSLearning
      SET ENABLE_BROKER;
      --WITH ROLLBACK IMMEDIATE
GO
----To disable service broker for a database
--ALTER DATABASE SSBSLearning
--      SET DISABLE_BROKER;
--GO

Once you have created a database or enabled the Service Broker for an existing database, you need to create the service broker objects; first Message Types then Contracts, which will use the created message types. Then you need to create Queues and finally you would be required to create Services, which are nothing but endpoints that sit on top of queues to send and receive messages.

USE SSBSLearning;
GO
--Create message types which will allow valid xml messages to be sent
--and received, SSBS validates whether a message is well formed XML 
--or not by loading it into XML parser 
CREATE MESSAGE TYPE
       [//SSBSLearning/ProductStockStatusCheckRequest]
       VALIDATION = WELL_FORMED_XML;
CREATE MESSAGE TYPE
       [//SSBSLearning/ProductStockStatusCheckResponse]
       VALIDATION = WELL_FORMED_XML;
GO
 
--Create a contract which will be used by Service to validate
--what message types are allowed for Initiator and for Target.
--As because communication starts from Initiator hence 
--SENT BY INITIATOR or SENT BY ANY is mandatory
CREATE CONTRACT [//SSBSLearning/ProductStockStatusCheckContract]
      ([//SSBSLearning/ProductStockStatusCheckRequest]
       SENT BY INITIATOR,
       [//SSBSLearning/ProductStockStatusCheckResponse]
       SENT BY TARGET
      );
GO
 
--Create a queue which is an internal physical table to hold 
--the messages passed to the service, by default it will be 
--created in default file group, if you want to create it in 
--another file group you need to specify the ON clause with 
--this statement. You can use SELECT statement to query this 
--queue or special table but you can not use other DML statement
--like INSERT, UPDATE and DELETE. You need to use SEND and RECEIVE
--commands to send messages to queue and receive from it
CREATE QUEUE dbo.SSBSLearningTargetQueue;
GO
 
--Create a service, which is a logical endpoint which sits on top 
--of a queue on which either message is sent or received. With 
--Service creation you all specify the contract which will be
--used to validate message sent on that service
CREATE SERVICE
       [//SSBSLearning/ProductStockStatusCheck/TargetService]
       ON QUEUE dbo.SSBSLearningTargetQueue
       ([//SSBSLearning/ProductStockStatusCheckContract]);
GO
 
--A Target can also send messages back to Initiator and hence
--you can create a queue for Initiator also
CREATE QUEUE dbo.SSBSLearningInitiatorQueue;
GO
 
--Likewsie you would need to create a service which will sit 
--on top of Initiator queue and used by Target to send messages
--back to Initiator
CREATE SERVICE
       [//SSBSLearning/ProductStockStatusCheck/InitiatorService]
       ON QUEUE dbo.SSBSLearningInitiatorQueue;
GO

Once you are done creating all of the required SSBS objects for both Initiator (Order application) and Target (Inventory application), you will be sending messages between Initiator and Target. A dialog or conversation is always started by Initiator and therefore Initiator will be sending the request message first (to check the inventory stock for product as mentioned in problem statement) to the Target something
like this.

--To send message, first you need to initiate a dialog with 
--BEGIN DIALOG command and specify the Initiator and Target
--services which will be talking in this dialog conversation
DECLARE @SSBSInitiatorDialogHandle UNIQUEIDENTIFIER;
DECLARE @RequestMessage XML;
BEGIN TRANSACTION;
 BEGIN DIALOG @SSBSInitiatorDialogHandle
   FROM SERVICE
    [//SSBSLearning/ProductStockStatusCheck/InitiatorService]
   TO SERVICE
    N'//SSBSLearning/ProductStockStatusCheck/TargetService'
   ON CONTRACT
    [//SSBSLearning/ProductStockStatusCheckContract]
   WITH ENCRYPTION = OFF;
 SELECT @RequestMessage =
     N'<Request>
    <ProductID>316</ProductID>
    <LocationID>10</LocationID>
   </Request&g';
 --To send message you use SEND command and specify the dialog
 --handle which you got above after initiating a dialog  
 SEND ON CONVERSATION @SSBSInitiatorDialogHandle
   MESSAGE TYPE 
   [//SSBSLearning/ProductStockStatusCheckRequest]
   (@RequestMessage);
 SELECT @RequestMessage AS RequestMessageSent;
COMMIT TRANSACTION;
GO

The request message sent using the above statements is stored in the Target queue until it is processed. You can query the Target queue using a SELECT statement.

--You can query the Target queue with SELECT statement
SELECT * FROM dbo.SSBSLearningTargetQueue;
GO
--If in case message cannot be put into Target queue becuase
--Target queue is not enabled or because of any other reasons
--it will be temporarily put into transmission queue until 
--its delivery to Target queue
SELECT * FROM sys.transmission_queue
GO

Next the Target (Inventory application) will pick up the request messages from its queue, process it and respond back to the Initiator (Order application) with the product inventory status; at the end it will end the dialog conversation, which Initiator initiated as it is no longer required.

--To retrieve a message from a queue you use RECEIVE command,
--With every message you also get dialog handle which you can 
--use to reply back to sender of the message
DECLARE @SSBSTargetDialogHandle UNIQUEIDENTIFIER;
DECLARE @RecvdRequestMessage XML;
DECLARE @RecvdRequestMessageTypeName sysname;
BEGIN TRANSACTION;
 --WAITFOR command is used to wait for messages to arrive
 --on the queue, TIMEOUT is specified in miliseconds
 WAITFOR
 ( RECEIVE TOP(1)
  @SSBSTargetDialogHandle = conversation_handle,
  @RecvdRequestMessage = CONVERT(XML, message_body),
  @RecvdRequestMessageTypeName = message_type_name
   FROM dbo.SSBSLearningTargetQueue
 ), TIMEOUT 1000;
 
 SELECT @RecvdRequestMessage AS RequestMessageReceived;
 --If the message type is request from Initiator, process the request.
 IF @RecvdRequestMessageTypeName = N'//SSBSLearning/ProductStockStatusCheckRequest'
 BEGIN
   DECLARE @ReplyMessage NVARCHAR(max);
   DECLARE @Quantity smallint
   SELECT @Quantity = Quantity 
   FROM AdventureWorks.Production.ProductInventory
   WHERE ProductID = @RecvdRequestMessage.value('(/Request/ProductID)[1]', 'int')  
   AND LocationID = @RecvdRequestMessage.value('(/Request/LocationID)[1]', 'int') 
   SELECT @ReplyMessage = 
   N'<Reply>
   <Quantity>' + CONVERT(VARCHAR(10), @Quantity) + '</Quantity>
   </Reply>';
   --To send message back to sender you again use SEND command and specify the dialog
   --handle which you got above while retrieving the message from the queue
   SEND ON CONVERSATION @SSBSTargetDialogHandle
     MESSAGE TYPE 
     [//SSBSLearning/ProductStockStatusCheckResponse]
     (@ReplyMessage);
   --To end a dialog you use END CONVERSATION command, here the dialog
   --is being ended by Target, and then a message of 
   -- http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog message type 
   --will be sent to Initiator
   END CONVERSATION @SSBSTargetDialogHandle;
 END
 SELECT @ReplyMessage AS ReplyMessageSent;
COMMIT TRANSACTION;
GO

The response message (which Target sends) comes and resides in the Initiator queue, which you can see by querying the Initiator queue with a SELECT statement as you did for Target queue above.

--You can query the Initiator queue with SELECT statement
--Here you will see two records, one is the response and another
--one is the end dialog intimation with NULL in its message_body column
SELECT * FROM dbo.SSBSLearningInitiatorQueue
GO
--Again if there is any error during transmission for messages from Target to 
--Initiator, the messages will be temporarily put into transmission queue until 
--its delivery to Initiator queue
SELECT * FROM sys.transmission_queue
GO

At this point, the Initiator (Order application) can retrieve the product status response message from its
queue, which was written by Target (Inventory application). Please note, if you use the above statement to see the Initiator queue, you will see two records; the first one is the response message from the Target and second one message is for dialog end, which was ended by Target after sending the response. To end a
dialog successfully, it has to be ended by both Target and Initiator. Hence, you need to execute the script given below twice; the first time to process the response and the second time to end the conversation as it is already ended by Target.

--At this point the Initiator queue will hold two records, first
--one is a response message for the request and another one is for 
--intimation that dialog has been ended by the Target.
--You need to execute below piece of code twice to retrive both the 
--records from Initiator queue, if the message is of type 
-- http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog the dialog will
--be ended by Intiator also or else response message will be processed. 
DECLARE @RecvSSBSInitiatorDialogHandle UNIQUEIDENTIFIER;
DECLARE @RecvReplyMessage NVARCHAR(100);
DECLARE @RecvMessageType SYSNAME
BEGIN TRANSACTION;
 WAITFOR
 ( RECEIVE TOP(1)
  @RecvSSBSInitiatorDialogHandle = conversation_handle,
  @RecvReplyMessage = message_body,
  @RecvMessageType = message_type_name
   FROM dbo.SSBSLearningInitiatorQueue
 ), TIMEOUT 1000;
 --If the message is of type http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog 
 --the dialog will be ended by Intiator also or else response message will be processed.
 IF (@RecvMessageType = 'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog')
  END CONVERSATION @RecvSSBSInitiatorDialogHandle;
 ELSE
  SELECT @RecvReplyMessage AS ReplyMessageRecived;
COMMIT TRANSACTION;
GO

Once you are done with the testing and want to drop all the objects which you created, you can use the clean-up scripts below.

--Cleanup code to drop SSBS objects individually which 
--you created above, notice the order of dropping the objects,
--its reverse of the order in which you created the objects
DROP SERVICE [//SSBSLearning/ProductStockStatusCheck/InitiatorService]
DROP SERVICE [//SSBSLearning/ProductStockStatusCheck/TargetService]
DROP QUEUE dbo.SSBSLearningInitiatorQueue
DROP QUEUE dbo.SSBSLearningTargetQueue
DROP CONTRACT [//SSBSLearning/ProductStockStatusCheckContract]
DROP MESSAGE TYPE [//SSBSLearning/ProductStockStatusCheckRequest]
DROP MESSAGE TYPE [//SSBSLearning/ProductStockStatusCheckResponse]
GO
--Drop the database which you created above
IF EXISTS(SELECT COUNT(1) FROM sys.databases WHERE name = 'SSBSLearning')
 DROP DATABASE SSBSLearning
GO

Note: Needless to say, you must learn and do thorough testing of your SSBS application (or the scripts demonstrated above) first on your development box before going to production environment.

SQL Server Service Broker - An Introduction

SQL Server Service Broker (SSBS), introduced with SQL Server 2005 and enhanced in SQL Server 2008, allows you to write queuing/message based applications within the database itself. This article discusses SSBS' important features and how to use it in different scenarios.

Introduction

SQL Server Service Broker (SSBS) is a new architecture (introduced with SQL Server 2005 and enhanced further in SQL Server 2008) which allows you to write asynchronous, decoupled, distributed, persistent, reliable, scalable and secure queuing/message based applications within the database itself. This integration of queuing and message based infrastructure into the database provides several advantages from other message/queuing based applications out of the database. For example, SSBS provides simplified administration, inbuilt security, improved performance, transaction support without making it distributed (more details later), simplified recovery process as queued data are backed up as part of the normal database back up etc. SSBS in itself is a very big topic, requiring several hundred pages to explain. Therefore, I will be talking about its important features; how to use it in different scenarios and what are the improvements in SQL Server 2008 for SSBS across several articles.

This introductory article on SSBS discusses on a very basic level, how it differs from MSMQ, looks at SSBS architecture and its different components.

How SSBS differs from MSMQ/Other message based technology

Similar to SSBS, Microsoft Message Queuing (MSMQ) is another technology that enables applications to send and read messages from queues; it comes as a part of the Windows OS. However, SSBS outweighs the features and benefits that MSMQ provides. For example, in comparison with MSMQ, SSBS provides the following:

          •Faster transactional message processing

          •Reliability and enhanced recoverability

          •Optimize the performance

          •Inbuilt Security

          •Parallel processing

          •Reduced management and maintenance overhead

MSMQ has been in use a long time and is a well-tested technology. Even though MSMQ is the preferred technology for writing queuing and message based applications and provides several advantages over other alternatives, in comparison to SSBS, SSBS outweighs MSMQ in several ways in an environment where a SQL Server instance is already available.

Advantages of SSBS over MSMQ/other queuing and message based technology

•SSBS is a part of the database engine itself and is more efficient for data-intensive queuing operations. It has improved performance, efficient and faster transactional message processing because transaction contexts are managed by SQL Server internally, whereas MSMQ requires Distributed Transaction Coordinator (DTC) to coordinate transaction context between MSMQ and SQL Server. Remember DTC uses two-phase commit protocol.

•Service Broker queues are maintained in the database as database objects and are backed up during normal database backup processes. This provides reliability and enhanced recoverability. MSMQ stores messages in the memory when you optimize it for faster performance. This is fine as long as there is no failure, but if there is any failure or you restart MSMQ service, the stored messages will be gone.

•SSBS optimizes the performance by directly writing messages to the receiver queue (bypassing the sender queue) if the receiver queue is in same database or another database on the same server.

•SSBS handles sending and receiving messages internally by database and SSBS commands are executed over the same connection to the database (no need of separate connection and therefore no need of DTC as would be required when it is external to database). This ensures the access permissions of the senders and receivers are checked by the database engine itself. Having a single identity check for messaging and database it provides the inbuilt security, which is easier to configure and maintain in one place.

•SSBS achieves parallel processing by letting us define the number of simultaneous queue readers. Each reader can process the messages of a particular conversation group. To ensure messages of a particular conversation group are being processed by a single reader it uses a new kind of lock, called conversation group lock.

•With SSBS, you store both logic and data in the database itself, so chances of getting both out of sync is minimized to a great extent when you restore your system after any failures.

MSMQ uses TCP/IP or HTTP transport protocols whereas SSBS supports only TCP/IP so if you want your messages to communicate over HTTP, MSMQ would be your choice.

Service Broker - Architecture

In SSBS terminology, the Sender of the message is called Initiator and the Receiver of the message is called Target. Depending on the placement of Initiator and Target the SSBS application's architectures can be categorized in three different scenarios.

         A.Initiator and Target in the same database in the same SQL Server instance.

        B.Initiator and Target in different databases but still at the same SQL Server instance.

      C.Initiator in a database at one SQL Server instance whereas Target in another database at another    SQL Server instance.

The behavior in scenario A and B above is almost same. In these cases, SSBS optimizes performance by writing messages directly to the Target Queue. If, while writing message to the Target Queue, it encounters any problem (for example the Target service is not available, Target Queue is disabled, etc.) it keeps that message in the sys.transmission_queue table temporarily so that it can push the message to Target Queue once it is available. In the image below, you can see there are two databases, Initiator and Target, on a SQL Server instance. When Initiator sends (by using SEND T-SQL command) a message, the message is directly written to Target Queue from where the Target reads (by using RECEIVE T-SQL command) it. If the Target sends a response back to the Initiator, the message is directly written to Initiator Queue.

The behavior in the third scenario, scenario C, becomes a bit complex and interesting too. In this case, messages travel from one server to another and the network is involved, hence SSBS ensures successful delivery of the message by storing messages temporarily in the sys.transmission_queue temporary queue at Initiator and removing it from there only if it receives receipt acknowledgement from the Target. This means as long as messages are in transit they will reside in the sys.transmission_queue queue. This is what has been depicted in the image below. You can see there is a queue at database level and there is sys.transmission_queue temporary queue at instance wide.

When an application issues the SEND command to send a message, that message is stored in sys.transmission_queue queue at Initiator, then SSBS sends that message over the network to the Target and at same time, it marks the status of this message as waiting for acknowledgement from the Target. On the Target, when the message is received successfully in the Target queue, it sends the acknowledgement back to Initiator. At this point only, the message is deleted from sys.transmission_queue queue at the Initiator. If the Target issues a SEND command to send a response message back to the sender, the response message goes into the sys.transmission_queue queue at the Target, again SSBS sends that response message over the network to the Initiator and at same time, it marks the status of this response message as waiting for acknowledgement from the Initiator. On the Initiator, when the response message is received successfully in the Initiator queue, it sends the acknowledgement back to the Target. At this point only, the message is deleted from the sys.transmission_queue queue at the Target.

Please note, the sys.transmission_queue temporary queue is not specific to a database but rather is only for each instance.

SSIS-Parallel Processing

Parallel execution improves performance on computers that have multiple physical or logical processors. To support parallel execution of different tasks in the package, SSIS uses two properties: MaxConcurrentExecutables and EngineThreads.

MaxConcurrentExcecutables Property

The MaxConcurrentExecutables property is a property of the package. This property defines how many tasks can run simultaneously; by specifying the maximum number of SSIS threads that can execute in parallel per package. The default value is -1, which equates to the number of physical or logical processors plus 2. Using a package which calls another package, in this example MaxConcurrentExecutables has its default value set as -1 and the server running the package has 8 processors, all 10 tasks (taking the Execute Package task in the example, though it applies in the same way to other tasks as well) are executed in parallel, as shown below:

If MaxConcurrentExecutables was changed to 4 in the above package and run it on the same server, then only 4 tasks will runn in parallel at a time (Note the image below shows tasks are executed in a batch of 4, once 4 tasks are executed another batch of 4 tasks will be executed)

Note:

If SSIS runs on a dedicated server and there are a lot of operations that run in parallel, it can be beneficial to increase this setting if some of the operations (threads) do a lot of waiting for external systems to reply. However if there's no dedicated SSIS machine and the data integration application runs alongside several other applications, reducing this setting should avoid resource conflicts.

EngineThreads Property

The EngineThreads property is a property of each Data Flow task. This property defines how many threads the data flow engine can create and run in parallel. The EngineThreads property applies equally to both the source threads that the data flow engine creates for sources and the worker threads that the engine creates for transformations and destinations. Therefore, setting EngineThreads to 10 means that the engine can create up to ten source threads and up to ten worker threads.

Note:

A source thread pulls the data from the data source whereas worker thread performs transformation and upload data into the destination.

To understand how this property works, consider the sample package with three Data Flow tasks. Each Data Flow task contains ten source-to-destination execution trees. If the EngineThreads are set to 10 on each Data Flow task, all 30 execution trees can potentially run simultaneously (considering all three data flow tasks run in parallel because of MaxConcurrentExecutables property of the package has either -1 or 3).

As discussed above, the execution tree is a unit of work for a worker thread in the SSIS process. A worker thread may execute one or more execution trees, but each transformation listed in the execution tree will be executed on the same thread. A thread will process one buffer at a time, executing it against all transforms in the execution tree before working on the next buffer in the flow, at which point it would pass the current buffer to another thread executing another execution tree and it would pull a new data buffer from its buffer list which was queued from an upstream component (either a data source or the last asynchronous transform before this execution tree started).

Because an execution tree is tied to a thread, it is often recommended to increase the EngineThreads setting to equal or greater than the number of execution trees listed in the PipelineExecutionTrees log event.

The default is 5 in SQL Server 2005 and 10 in SQL Server 2008, with a minimum value of 2. However, the engine will not use more threads than it needs, regardless of the value of this property. The engine may also use more threads than specified in this property, if necessary to avoid concurrency issues.

However, the general rule is to not run more threads in parallel than the number of available processors. Running more threads than the number of available processors can hinder performance because of the frequent context-switching between threads.

Note:

Parallelism is a double-edge sword, on one hand a properly designed package noticeably improves performance by parallel running several tasks of the package or several execution trees of data flow; however a poorly designed package can severely impact performance if the number of threads exceed the number of available processors, throughput may be harmed due to an excessive amount of context switching. Therefore it's recommended that testing takes place considering available system resources or environment, loaded on the server, if the server is only ETL server or it is a database server as well.

Friday, August 27, 2010

SSIS Tasks Explained

Data Flow Task

          •Data Flow Task: This task extracts data from a source, allows for transformations of that data, and   then the data is loaded into a target data destination.

Data Preparation Tasks

         •File System Task: This task allows the user to copy/move/delete files and directories on a file system.

         •FTP Task: This task allows the user to copy/move/delete files and directories over FTP.

        •Web Service Task: This task allows the user to execute a Web service method and store the results

        •XML Task: This task is used to work with XML data. XSLT can be used along with XPath to validate, compare and merge documents. The results of this can then be stored.

       •Data Profiling Task: This task can be used for checking and validating data quality. Profiles can be set up and checked for varius data quality issues such as, column length issues, column patterns, column statics, etc.

Workflow Tasks

     •Execute Package Task: This task will run other SQL Server Integration Services packages.

       •Execute Process Task: This task will execute an application or batch file.

       •Message Queue Task: This task allows you to send and receive messages between SSIS packages, or to send messages to an application via an application queue. This task uses Message Queuing (MSMQ)

       •Send Mail Task: This task allows for email messages to be created and sent using an SMTP server.

      •WMI Data Reader Task: This task allows a Package to query, using WQL, computer systems (local and remote) for information regarding that computer.

       •WMI Event Watcher Task: This task watches for WMI events that have occurred on a computer system, and allows the package to take an action if certain criteria are met.

Scripting Tasks

       •Script Task: This task can be used to program functions that are not available in the standard SSIS tasks or transformations. In SSIS 2005 this task can be programmed in VB .NET. In SSIS 2008 VB .NET and C# can be used to program a Script Task.

SQL Server Maintenance Tasks

       •Back Up Database Task: This task will allow you to backup a one or many SQL Server databases.

        •Check Database Integrity Task: This task will allow you to check the integrity of all the objects in one or many SQL Server databases.

        •Execute SQL Server Agent Job Task: This task allows for the execution of a SQL Server Agent job.

          •Execute T-SQL Statement Task: This task is similar to the Execute SQL Task, however it only supports Transact SQL Statements. It should be used for SQL Server specific SQL statements.

          •History Cleanup Task: This task allows for the cleanup of historical activity data. It will cleanup the history for database maintenance plans, backup activites, restore activities and SQL Server agent jobs.

          •Maintenance Cleanup Task: This task allows for the cleanup of backup files, and the reports of maintenance plans.

          •Notify Operator Task: This task allows SSIS to notify SQL Server Agent operators. They can be notifies by email, pager, or netsend.

          •Rebuild Index Task: This task will rebuild an index or indexes on one or many databases.

         •Reorganize Index Task: This task will reorganize an index or indexes on one or many databases.

          •Shrink Database Task: This task will shrink the size of the SQL Server database data and database log files.

          •Update Statistics Task: This task will update the statistics for one of many tables in one or many databases.

SQL Server Tasks

           •Bulk Insert Task: This task offers an efficient way to copy large volumes of data.

          •Execute SQL Task: This task allows the execution of a SQL statement. If the statement returns results, they can be stored in a variable.

           •Transfer Database Task: This task will copy or move a SQL Server database between two instances of SQL Server. It can even be used to make a copy of a database on the same server. Databases can be copied either online or offline.

           •Transfer Error Messages Task: This task will transfer a single or multiple SQL Server user defined error messages between SQL Server instances. It can be setup to transfer specific user messages or all error messages.

           •Transfer Jobs Task: This task will transfer a single or multiple SQL Server Agent jobs between SQL Server instances.

          •Transfer Logins Task: This task will transfer a single or multiple SQL Server logins between SQL Server instances.

          •Transfer Master Stored Procedures Task: This task will transfer a single or multiple SQL Server Master database stored procedures between SQL Server instances.

          •Transfer SQL Server Objects Task: This task will transfer a single or multiple SQL Server database objects between SQL Server instances. Most of SQL Servers DDL objects can be copied with this task.

Analysis Services Tasks

           •Analysis Services Execute DDL Task: This task will run data definition language statements on Analysis Services. This allows for the create, drop, alter of cubes, dimensions and mining models.

           •Analysis Services Processing Task: This task will process Analysis Services Cubes, Dimensions, and Mining Models.

          •Data Mining Query Task: This task will run a DMX (Data Mining Extensions) query that create a prediction based on new data that is run against a Analysis Services data mining model.

Thursday, August 26, 2010

Checkpoints,ErrorHandiling, and Transactions in SSIS

Error Handling:
To use error paths, you will need to configure the error output. There are three error-handling options for handling errors in the data flow components:

• Setting the error output to Fail Transformation will cause the data flow to fail if an error is encountered.

• Using the Ignore Failure option will allow the row to continue out the normal green data path, but the value that resulted in the error is changed to a NULL in the output.

• Setting the error output to Redirect Row will send the error row out the red error path; this is the only way to handle errors with separate components.

Check Points:

What does it do?

With Checkpoints enabled on a package it will save the state of the package as it moves through each step or task and place it in a XML file upon failure of the package. If your package does fail you can correct the problem in your package and rerun from the point of the tasks that did not successfully run the first time. Once the package completes successfully the file is no longer needed and automatically discarded.

How does this benefit you?

Just imagine your package is loading a table with 10 million records. Your package passes the Data Flow that performs this huge load without any problem (Other than the fact that it took two hours to load). The next task in your package is a Send Mail Task and for some reason fails.

You correct the problem in the Send Mail Task, but without using Checkpoints your package would still have to run that Data Flow that loads the 10 million records again (taking another two hours) even though you’ve already done it once. If you had enable Checkpoints on this package you could simply correct the problem in the Send Mail Task and then run the package again starting at the Send Mail Task. Sounds great right?

How do I configure it?

This example will run you through very basic package using Checkpoints. To see these instructions with screenshots go to my regular blog http://blogs.pragmaticworks.com/devin_knight/2009/06/enabling-checkpoints-in-your-ssis-packages.html

Example Overview

• Use Three Execute SQL Task using the AdventureWorks2009 (It can really be any database for this example) database as a connection manager.

• Configure the package to handle Checkpoints

• Configure the individual tasks to handle Checkpoints

Step 1: Configure Execute SQL Tasks

• Drag three Execute SQL Tasks on your Control Flow.

• Use any database for the Connection property on all three tasks

• Configure Execute SQL Task SQLStatement property: Select 1

• Configure Execute SQL Task 1 SQLStatement property: Select A (Set to intentionally fail)

• Configure Execute SQL Task 2 SQLStatement property: Select 1

Step 2: Configure Package to enable Checkpoints

• Open the properties menu at the package level (Just open properties in the Control Flow without any task or connection manager selected)

• Change the properties CheckpointFileName: c:\Checkpoint.xml (Feel free to use the .txt extension when naming the checkpoint if you want to open it in notepad and look at it!)

• Change the properties CheckpointUsage: IfExists

• Change the properties SaveCheckpoints: True

Step 3: Configure Each Task

• Select each task individually and open the properties menu at the task level (Just click the task once then hit F4)

• Change the FailPackageOnFailure property to True

Step 4: Run the Package

• Run the package and you will see the package fail on the second task

• This also created the file c:\Checkpoints.xml. Feel free to open it and take a look! I use the tool XML Notepad to view XML Files. It’s Free.

• You could also save this file with the.txt extension and just view in regular notepad and it still works as a Checkpoint.

• If you run the package a second time it will skip the first task that was successful and start right at the second task

Step 5: Correct the Problem and Rerun Package

• Open the Execute SQL Task 2 and configure the SQLStatement property: Select 1

• The package has now completed and skipped the first step which already succeeded. Imagine if that first step would normally take two hours to run!

Defining Package and Task Transaction Settings

You can set package transactions at the entire package level or at any control flow container level or task level. Transactions in SSIS use the Windows Distributed Transaction Coordinator (DTC); the DTC service needs to be started on the machine for transactions to work. Any service or program that is enabled to work with the DTC can be part of a transaction in SSIS.

To enable a transaction within a package, you need to set the TransactionOption property of the task or container to Required. Figure 5-1 highlights the properties of a package at the control flow level, which means they apply to the package as a whole. The TransactionOption property is the same on any control flow object.

Figure:1 A task's or container's TransactionOption property must be set to Required to enable a transaction within a package.

When deciding whether and how to implement a transaction, follow these guidelines:

• For transactions to be enabled in SSIS, you need to turn on the DTC service, and the tasks that you want to be part of the transaction must work with the DTC service natively.

• If a series of tasks must be completed as a single unit, in which either all the tasks are successful and committed or an error occurs and none of the tasks are committed, then place the tasks within a Sequence Container, and then set the TransactionOption property of the container to Required.

• A task can inherit the transaction setting of its parent when the TransactionOption property is set to Supported, which is the default setting when creating a task or container.

• You can prevent a task from participating in a transaction by setting its TransactionOp-tion setting to NotSupported.

• Transactions work at the control flow level and not within a data flow. This means that you can turn on a transaction for a data flow task, but you cannot turn it on separately for selected components within the data flow; either the entire data process will be successful or it will be rolled back.

Implementing Restartability Checkpoints

At times, especially if you are working with complicated or long-running packages, you will want the ability to restart a package if it fails and have it start at the point of failure. In other words, you might not want the tasks that were already successful to be run again if the package is restarted. This can be done by enabling checkpoints in the package.

Enabling restartability within a package requires, first, enabling a package to use checkpoints, and, second, setting the specific tasks and containers to write checkpoints. To turn on checkpoints within a package, follow these steps:

1. Within the package, open the Properties window, and then select the Control Flow tab, which will reveal the properties of the package.

2. Set the SaveCheckpoints property at the package level to True. This allows checkpoints to be saved during package execution.

3. In the CheckpointFileName property, provide a valid path and file name to a checkpoint file. Packages use files to maintain their state information, so if a package fails and is then restarted, the package can read the checkpoint file to determine where it left off and to track the state information at the last successful task.

4. Set the CheckpointUsage to IfExists, which causes the package to run from the beginning if the file is not present or to run from the identified point if the file exists.

Practice: Implementing Package and Task Transactions

In this practice, you will turn on transactions at the container level and observe the results when a task fails within the container.

Exercise 1: Enabling Transactions

1. Navigate to the Control Panel/Administrative Tools/Services console, and then start the Distributed Transaction Coordinator service.

2. Open the SSIS project that you created in Chapter 4, and then open the package called MyPackage.dtsx. Or you can use the 'Start Here' project in the Source\Ch 05 folder in the installed practice files.

3. From the View menu in the menu bar, open the Properties window, and then click the pushpin in the Properties window to lock the window in the open position.

4. In the Control Flow Designer, select the Sequence Container (by clicking on it), and then note the Sequence Container properties listed in the Properties window.

5. Set the TransactionOption to Required using the drop-down list.

6. Save the package by clicking the Save icon in the toolbar.

Exercise 2: Observing a Transaction Rollback

1. With the package from Practice 1 still opened, expand the Sequence Container, and drag and drop a new Execute SQL Task to the bottom of the Sequence Container workspace.

2. Connect the green precedence arrow from the Data Flow to the new Execute SQL Task by dragging the green arrow from the bottom of the Data Flow onto the Execute SQL Task.

3. Edit the Execute SQL Task by double-clicking on the task. In the Execute SQL Task Editor, change the Connection property to the AdventureWorks connection.

4. Change the Name property within the Execute SQL Task Editor to Force Failure.

5. Select OK in the Execute SQL Task Editor to return to the Control Flow.

6. With the Force Failure Execute SQL Task still selected, open the Properties window and change the ForceExecutionResult property to Failure. By setting this property to Failure, you are specifying that the task should fail intentionally, which is something you might do for testing purposes.

7. Open a new database query, in SSMS, against the AdventureWorks database.

8. Run the following SQL statement and observe the results: SELECT COUNT(*) FROM Sales_Summary

9. In the Business Intelligence Development Studio (BIDS), execute the SSIS package you just modified, which will intentionally fail at the last step.

10. Stop the package execution, and rerun the query from step 8.

11. Observe that even though the data flow was successful, the data was rolled back because the Sequence Container was configured with a transaction, and the last task within the Sequence Container failed.

Quick Check

1. You add a sequence container to a package that contains several tasks, one of which calls a command on a legacy system and another of which a Data Flow Task imports data into SQL Server. Even with DTC started and transactions turned on, your sequence container fails before the tasks even run. What is the problem?

2. What do you need to set in order to use checkpoint properties at the package level, after you have turned the checkpoint properties on?

Quick Check Answers

1. The transactions featured in SSIS use the DTC service. However, not all systems support DTC, and a transaction cannot be forced on a non-compliant system, so the container will fail. You should remove the legacy task from the sequence container that has the transaction.

2. You need to set the FailPackageOnFailure property to True for tasks to write to the checkpoint file. However, if you want to rerun any successful tasks that occur before the failure task, you need to use a sequence container around the group of related tasks that require transactions.

Handling Package Errors with Event Handlers

In the data flow, using data viewers provides the ability to easily debug problems while processing data. The control flow, however, is different because the focus is on workflow and execution rather than on data and transformations. Leveraging the capabilities in Visual Studio, the control flow supports visual debugging and breakpoint features. Let's first look at the event handlers that SSIS provides; then we will explore the debugging capabilities in the control flow.

Event Handlers

SSIS provides the ability to listen for certain execution events and perform other operations when an event happens (depending on the execution event). For example, if an error happens, the error event handler can send an alert or potentially fix a data problem. Event handlers use the control flow paradigm for workflow processing, which includes all the same control flow tasks and containers that are found in the toolbox of the control flow.

You can define zero, one, or more than one event handler for a package. To add an event handler to a package, you need to select the Event Handler tab in the package designer. Creating a new package event handler requires that you select the executable and the event handler event, as Figure 2 shows.

Figure:2 Selecting the executable and event handler event for a package

The executable is the task or container scope that the event will fire. You can also choose the package itself (the highest-level container) as the executable for an event. The event handler event is the actual event that causes the event workflow to execute. The following table includes the package event handler types:

In addition, event handlers assigned to an executable scope will propagate down to child events when the event fires. If an event is assigned to a container, the child executables include the tasks and containers that are embedded within the parent container. This means that if you assign an OnError event to the package and an OnError event occurs at a task, the event handler would fire for both the task and the package (and for any containers in between). You would use an event handler for tracking error details, for sending failure messages in emails, and for implementing manual rollback logic.

BEST PRACTICES Capturing error information with the OnError event

Each package contains a set of system variables that are updated for the various levels in the package during the package execution. With event handlers, you can capture these variables and values, which provide contextual information, such as the ErrorCode, ErrorDescription, and SourceName (the task), when the event fires.

Using event handlers is a great way to track package execution; they can be used to audit the execution, capturing the errors that occur in a task. In addition, the event handler Send Mail Task can be used for notification; for example, it can notify an administrator of a certain predefined condition that requires a special response.

Exam Tip

Event handlers can be turned off for any task or container by setting the Disable-EventHandlers property of the Task or Container to True. In other words, if you have an event handler defined, but you specifically do not want it to be invoked for a specific task, then you can turn off event handlers for that task only.

Debugging the Control Flow with Breakpoints

Package debugging lets you know what is going on during the execution of a package in the designer so that you can troubleshoot or validate processing logic. Control flow debugging involves setting breakpoints in the package, which will pause the control flow execution so that you can observe the execution state. SSIS takes advantage of the breakpoint functionality that comes with Visual Studio, which means you have the capabilities to view execution information about the package when you execute a package in the designer.

BEST PRACTICES Breakpoints work in control flow only

Breakpoints function in the control flow but not in the data flow. For scripting, this means that you can set breakpoints only in a control flow Script Task and not in a data flow Script Component.

To set a breakpoint, highlight the task or container, and either press F9 or navigate to the Debug/Toggle Breakpoint menu. You can set multiple breakpoints in a package, and you can embed a breakpoint within a Script Task at a line of code. Figure:3 shows a package that is running but is paused at execution.

Figure:3 A package that is running but paused

In this screen, the arrow next to the breakpoint icon indicates which task the package is currently waiting to run. When you are paused in the debug environment, you can do the following things to help troubleshoot your package:

• Open the Locals window to see all the variable values and the package status. You can find this window in the Debug toolbar, next to the package execution selections. If you have several variables in a package that you actively use to control logic and precedence, you can use a breakpoint to pause the execution, allowing you to troubleshoot variable values handling before the package execution completes.

• When you have completed working during a pause and are in a Script Task, you can continue the execution of the script and package to the next breakpoint by either clicking the Continue button in the Debug toolbar or by pressing F5. Alternatively, you can stop the package from continuing by clicking the Stop button on the toolbar. Breakpoints in a Script Task are very useful because they help you validate the code logic and branching that you are performing with the script.

• When the package is paused on a task (as opposed to within the code of a Script Task as described in the prior bullet) in the control flow, you can also continue running the package to completion (or to the next breakpoint) by selecting the Continue button on the Debug toolbar or by pressing F5.

• Stopping the package during execution, whether the package is paused at a breakpoint or executing, is handled by clicking the Stop button in the toolbar or by pressing the Shift and F5 keys at the same time.

In all, breakpoints are powerful tools for resolving errors and validating functionality. Combined with data views in the data flow, they provide comprehensive debugging support during your package development.

Practice: Identifying Data Flow Errors

In this practice, you will add an error flow path and then identify the error by observing the rows in the output through a data viewer.

Exercise: Adding Error Flow Paths

1. In SSMS, create a new database query connected to the AdventureWorks sample database and run the following code:

USE [AdventureWorks] GO

TRUNCATE TABLE dbo.Sales_Summary GO

ALTER TABLE dbo.Sales_Summary WITH CHECK ADD CONSTRAINT [CK_Summary_ProductLine]

CHECK ((upper([ProductLine])='R' OR upper([ProductLine])='M' OR upper([ProductLine])='T')) GO

2. The above code uses the Sales_Summary table that you created in Chapter 4.

3. Open the package that you modified in the previous practices in this chapter.

4. Navigate to the Data Flow designer and open the OLE DB Destination adapter.

5. In the Data Access Mode drop-down list, change the selection to Table Or View, and then verify that the Sales_Summary table is still selected in the Name Of The Table Or The View drop-down list.

6. Click OK in the OLE DB Destination Editor to return to the package designer.

7. Right-click in the Data Flow workspace, and then choose Execute Task from the drop-down list to run the Data Flow Task. Before stopping the package, navigate to the Progress tab and observe that the OLE DB Destination failed because of a constraint violation.

8. Stop the package to return to design mode.

9. From the SSIS menu in the menu bar, select Variables; this will open the Variable window.

10. Select the leftmost icon in the Variable window toolbar to create a new variable for the package and name the variable ErrorCount.

11. Open the toolbar, and then drag a Row Count transformation onto the Data Flow work-space.

12. Highlight the OLE DB Destination adapter, and then drag the red error output path and drop it on the Row Count transformation.

13. When the Configure Error Output window appears, change the value in the Error Column drop-down list to Redirect Row, and then click OK to return to the Data Flow designer.

14. Open the Row Count transformation, and then change the VariableName property to

15. User::ErrorCount.

16. Click OK in the Row Count Editor to return to the Data Flow.

17. Right-click the red error path, and then select Data Viewers from the drop-down list. 16. Choose Add in the Data Flow Path Editor, highlight Grid, and click OK in the Configure

18. Data Viewer window and OK in the Data Flow Path Editor.

19. Right-click in the Data Flow designer workspace, and select Execute Task from the drop-down list.

20. A new Data Viewer will appear, which will reveal the OLE DB Destination Error Output.

21. Observe that the ProductLine column for all the error rows is S, which violates the defined constraint that you created in step 1.

22. Select Detach in the Data Viewer window, and then stop the package execution.

23. In SSMS, run the following database query, which adds S as a valid value of the Pro-ductLine column in the Sales_Summary table:

USE [AdventureWorks] GO

ALTER TABLE dbo.Sales_Summary

DROP CONSTRAINT [CK_Summary_ProductLine] GO

ALTER TABLE dbo.Sales_Summary WITH CHECK ADD CONSTRAINT [CK_Summary_ProductLine]

CHECK ((upper([ProductLine])='R' OR upper([ProductLine])='M' OR upper([ProductLine])='T' OR upper([ProductLine])='S')) GO

24. Return to BIDS and rerun the Data Flow, observing that the OLE DB Destination is now successful, with no rows being routed to the error path output or data viewer.

Quick Check

1. A Data Conversion transformation is failing in the middle of the data flow execution, and you need to determine what is causing the error. How should you proceed?

2. Your package contains a string variable that you are updating, using a Script Task, to be a file path and file name. Your package is failing at a File System Task that is configured to use the variable to move the file to a different folder on the server. How do you troubleshoot the package?

3. You would like to log all the package errors to a custom database table you have created for auditing purposes. How can you accomplish this task?

Quick Check Answers

1. To determine what is causing the error, configure the Data Conversion transformation error path to Flat File so that any rows that are failing conversion are sent out to a file. Then, create a data viewer on the error path, and run the package in BIDS. This technique will capture the errors in a file and display the rows in the designer for troubleshooting.

2. Because the Script Task can contain embedded breakpoints in the code, set a breakpoint in the script so that you will be able to execute the package and step through the lines of code, observing the value of the variable to check the code and accuracy.

3. By using the OnError event handler assigned to the package level, you can also use an Execute SQL Task that calls a stored procedure, passing in the SourceName and ErrorDescription variable values. The procedure can then track these details into a metadata storage table for auditing.

Thursday, August 19, 2010

Looping over files with the Foreach Loop Container

In SQL Server 2000 Data transformation Services (DTS) it was a bit of a hack to be able to loop over files of a given type in a certain directory and import them into your destination. It involved a lot of "Glue Code" and a certain amount of fooling the package into going back to a previous task because it still had work to do. Well thankfully in SQL Server 2005 Integration Services (SSIS) that has all changed and this article is going to show you how.

The image below shows us how incredibly simple and clean the package will look when finished. There are some things worth pointing out at this stage. In the centre of the screen we see the Foreach Enumerator container and inside that we see the Data Flow task which houses the pipeline. At the bottom in the Connection Managers tray we see our Flat File Connection Manager (My Source File) and our OLEDB Connection Manager (My Destination). The Flat File Connection Manager is the one in which we are most interested for this article. Both of these managers are used in the Data Flow behind the DataFlow task. We will not be detailing the pipeline behind the DataFlow task in this article but it consists of a Flat File Source moving data to an OLEDB destination.

Let's begin then by opening up the Foreach enumerator and moving straight to the Collection node in the tree on our left. Below we see our information already populated.

What we see on the screen is pretty self explanatory but let's go through it anyway. We have chosen to enumerate over a file collection and ths is indicated by the value next to the Enumerator property at the top. We need to specify a folder over which to loop and for which type of files to look and we do that in the centre of the form. We are given three options as to what is returned when the loop finds a file in the folder at the bottom of the form. We can return the whole filename including extension and path, the name and extension or simply the name of the file found. Because our connection manager is going to need to know exactly where to find the file and it's name we have hosen the first option. The final thing we see on this screen is the ability to traverse subfolders. In our example we do not need to do this.

When the Foreach enumerator finds a file it needs to tell us about what it found and it does this by populating a variable. Click on to the Variable Mappings node now. Our package currently has no variables able to accept the name of the file so we are going to create a new one.

The next screen we see allows us to set the values of the variable. As we can see variables can be scoped in SSIS to certain executables in the package or to the package itself.

Here is how our variable looks with all its properties set.

Because the enumerator will only return us at most one value on every iteration we map our variable to an index of 0.

We have now configured everything as far as the Foreach enumerator is concerned. We now need to set the rpoerties of the Flat File Connection Manager. Highlight the manager in the tray at the bottom, right click and choose properties.

The important part of this dialog is highlighted and that is "Expressions". Click on the ellipses and we will be taken through to the next screen where we can start to create the expression. In the screen that follows, from the Property column drop the list down and choose ConnectionString

Now hit the ellises button to the right and we are taken through to the expression editor where we will build the actual expression itself

Our requirements are pretty simple here and all we want to do is to retrieve the variable we defined earlier. To do this simply drag the variable from the list at the top to the expression text box at the bottom. Property Expressions can become very complex and we shall no dount be seeing more of them in future articles. After you have chosen the variable click OK

We now see that our expression is mapped to our ConnectionString property. Click OK

Finally we can now see our File Manager's Connection string property being mapped to an expression in the properties of the manager.

When the enumerator finds a file matching our requirements it will set the correct property on the connection manager and this will be used by the pipeline at runtime.