Tuesday, February 1, 2011

Variables Window

Use the Variables window to create and modify user-defined variables and view system variables.
You can optionally display the Variables window by mapping the View.Variables command to a key combination of your choosing on the Keyboard page of the Options dialog box. 


Add Variable

Add a user-defined variable.
Delete Variable
Select a variable from the list, and then click Delete Variable.
Show System Variables
Click to toggle between a list that includes system variables and a list that does not. System variables are predefined. You cannot add or delete system variables or modify their properties. This list is color coded. System variables are gray, and user-defined variables are blue. The list includes variable properties.
Show All Variables
Click to toggle between a list that includes all the variables in a package and a list that filters the variables. If the Show All Variables option is not selected, the list includes only the variables in the scope of the package or the selected container or task; otherwise, the list includes all variables. If the Show System Variables option is selected, the list includes system variables. System variables are predefined. You cannot add or delete system variables or modify their properties. This list is color coded. User-defined variables are blue and system variables are gray. The list includes variable properties.
Choose Variable Columns
Click to open the Choose Variable Columns dialog box where you can change the column selection. Use the Variable Column Chooser dialog box to select the columns whose properties will display in the Variables window.
 
Scope
Select to display the scope of variables.
Data type
Select to display the data types of variables.
Value
Select to display the values of variables.
Namespace
Select to display the namespaces of variables.
Raise event when variable value changes
Select to display Booleans that indicate whether the variables raise the OnVariableValueChanged event.

Name


View the variable name. For user-defined variables, you can update the variable name.

Scope


View the scope of the variable. A variable has either the scope of the entire package, or the scope of a container or task. The scope of the variable must be sufficient so that the variable is visible to any other tasks or components that need to read or set its value. Scope cannot be modified in the Variables window, but depends on the item that was selected in the designer when you created the variable.

Data Type


View the data type of the variable. For user-defined variables, you can select a data type from the list.

Value


View the variable value. For user-defined variables, you can update the variable value. This value can be a literal or an expression.

Namespace


View the namespace name. User-defined variables are initially created in the User namespace, but you can change the namespace name. To display this column and change the namespace, click Choose Variable Columns and select the option in the Choose Variable Columns dialog box.

Raise Change Event


Indicate whether to raise an event when a value changes. By default, the Variables window does not list this column. Click Choose Variable Columns and select the option in the Choose Variable Columns dialog box.

IncludeInDebugDump


Indicate whether the variable value is included in the debug dump files.
For user-defined variables and system variables, the default value for the InclueInDebugDump option is true.
However, for user-defined variables, the system resets the IncludeInDebugDump option to false when the following conditions are met:
  • If the EvaluateAsExpression variable property is set to true, the system resets the IncludeInDebugDump option to false.
    To include the text of the expression as the variable value in the debug dump files, set the IncludeInDebugDump option to true.
  • If the variable data type is changed to a string, the system resets the IncludeInDebugDump option to false.
When the system resets the IncludeInDebugDump option to false, this might override the value selected by the user.

Using SQL Server Configurations to configure Flat File Connection Manager



The following is an example on how to use SQL Server based configurations to configure a Flat File Connection Manager’s ConnectionString property. We will store in an SSIS variable the directory that the Flat File Connection Manager will store/read its file to/from. The file name in this example will be based on another variable. We will use SQL Server configurations to dynamically manage the directory in this case, though you could/should do both.
  • Create a new package.
  • Create a new string variable, “Directory” and give it a default value of “C:\”
  •  Create another new string variable, “FileName” and give it a default value of “test.txt”
NOTE: You don’t need the quotes when typing in the values for the above variables
  •    Right click on the Control Flow, select “Package Configurations…”
  • Check the box to “Enable package configurations”
  • Click “Add…”
  • Change “Configuration type:” to “SQL Server”
  • Click on “New…” on the right of the “Connection:” line to create a new connection.
  • This will bring up a new window, “Configure OLE DB Connection Manager”. Click “New…” here.
  • Enter in the required information on this screen such as server (“localhost” in my case) and database name (“AdventureWorksDW” in my case). I am using “Windows Authentication”
  • Click “OK”
  • Click “OK” again
  • Now we should be back to the “Package Configuration Wizard” screen. Click “New…” on the right of the “Configuration table:” line to create a new configuration table (if you don’t already have one created.)
  • This brings up a “Create Table” screen. Leave the SQL alone, unless you want to change the name of the table it is trying to create. (Which I do — I replace the space with an underscore: “SSIS_Configurations”) Click “OK”
  • Back on the “Package Configuration Wizard” screen, you can type in a “Configuration filter:” value. This is basically a WHERE clause when SSIS tries to query the table from above. On the next screen, we’ll get to choose which properties we’d like to configure via this method, so give this a unique name for this test package. I used “test_package”.
  •   Click “Next >”
  • Collapse all branches on this next window in the “Objects” pane. Make sure that nothing is checked. This is easy to do by checking the top level object (the name of your package) and unchecking it again. Expand the “Variables” sub-tree. Here you’ll see the two variables we created from before. Expand the “Directory” variable. Expand its “Properties” branch. Check the “Value” property under the “Directory” branch. At this point, you should only have one item checked.
  • Click “Next >”
  • Give this configuration a name. This is only used to help you identify which configuration is which in this package, as you can have more than one at any given time. I used “Directory Configuration” in this case. Click on “Finish”
  • Click on “Close”
  •           From here, we have the configuration setup, which will dynamically set the value of the “Directory” variable based on whatever value is in the configuration table. Now let’s set it up on a Flat File Connection Manager object.
·          
  •  In the “Connection Managers” pane at the bottom of the window, right click and select “New Flat File Connection…”
  •  Give it a “Connection manager name” of something descriptive. I used “My Test Flat File” and left the “Description” property blank.
  • Type in the name of the file, fully qualified. I used “C:\test_placeholder.txt”
  •  I used the default options of “Format: Delimited” and “Text qualifier: ” and “Header row delimiter: {CR}{LF}” and “Headers rows to skip: 0″
  • Click on the “Advanced” tab on the left.
  • Click on the “New” button at the bottom to add a new column. Click on it again to add a second column.
  • Click “OK”
Now the Flat File Connection Manager has been setup, though it is not configured to use the variables from before. Let’s do that now.
  •  Right click on the Flat File Connection Manager object in the “Connection Managers” pane and select “Properties”
  •    Click on the “Expressions” property. If you have installed SQL Server 2005 Service Pack 1 or greater, you should now see some ellipses (…) on the right. Click on them.
  • You should now have the “Property Expression Editor” showing. In the left side, click and select the “ConnectionString” property.
  •   Click on the ellipses here as well to bring up the expression editor for this property.
  • Use the following for your expression: @[User::Directory] + @[User::FileName]
  • Click “OK”
  • Close the properties window if you want.
  • Now we have the Flat File Connection Manager object configured to use our variables, which will get their values from our database table. Let’s put it all together in a package we can run.
  • Add a Data Flow task to the Control Flow
  • Double click on the Data Flow task to edit it.
  • Add an “OLE DB Source” component by dragging it to the Data Flow surface.
  • Double click on the “OLE DB Source” component to edit it. The “OLE DB Connection Manager” property should already be populated for you and should be the “AdventureWorksDW” connection manager object we created from before.
  • Change the “Data access mode” to “SQL command” as this is always good practice to do. This allows us to select only the columns from the database that we need and also allows us to add a WHERE clause if necessary. This is the only practice I recommend.
  • Use the following SQL:
  • SELECT DISTINCT FirstName, LastName FROM     DimEmployee
  • Click on the “Columns” tab on the left. This populates the list of columns automatically that we want to expose to the Data Flow.
  • Click “OK”
  • Drag a “Flat File Destination” component to the Data Flow.
  • Connect the two components using the green line from the “OLE DB Source” component.
  • Double click on the “Flat File Destination” component to edit it.
  • Again, just like the previous component, the “Flat File Connection Manager” property has been automatically selected for you.
  • Click on the “Mappings” tab on the left to populate the mappings between the source and destination. Drag “FirstName” from the left to “Column 0″ on the right. Drag “LastName” from the left to “Column 1″ on the right.
  • Click “OK”
Now, we’re ready to execute the package, but let’s understand a few things first. The value currently stored in the SSIS_Configurations table currently reflect the default value you established when you created the “Directory” variable. So, to show that this works, let’s change the value of the variable in our SSIS_Configurations table to something like “C:\TEMP\”. Also, make sure that you actually have a “C:\TEMP\” directory and that you have permissions to write to it. So the package has “C:\” as the value for “Directory,” but the SSIS_Configurations table has “C:\TEMP\”. You can update this value by running an UPDATE statement in Management Studio, or by opening the table directly by right-clicking on it in Management Studio. Either way, I assume you know how to do this.
Run the package! Note that when we initially setup the “Flat File Connection Manager” object, we gave it a file name of “test_placeholder.txt”. After the package executes, you should find a file, “test.txt” in the “C:\TEMP” directory, not the “C:\” directory as you have in your variable.
What’s going on here? Well, the package configurations are applied at run-time and configure the “Directory” variable to be the value stored in the table. We then use that variable along with the “FileName” variable to dynamically configure the “ConnectionString” property of the “Flat File Connection Manager” by using expressions.


Courtesy:www.ssistalk.com


Sunday, January 9, 2011

SSIS Expression Cheat Sheet


SSIS Expression Cheat Sheet

Problems
Expression
Create a file name with today's date
Expression on the Flat File or File Connection Manager:

"C:\\Project\\MyExtract" + (DT_WSTR, 30)(DT_DBDATE)GETDATE() + ".csv"

Expression Output Example: C:\Project\MyExtract2009-03-20.csv
Use a 2 digit date
(ex. "03" for March instead of "3")
RIGHT("0" + (DT_WSTR, 2)MONTH(GETDATE()),2)


Expression Output: 03 (if the month is March)
Multiple condition if statement
In this example, the statement determines that if the ColumnName
is blank or NULL, it will be set to unknown. To make a Logical AND condition, use "&&" instead of the "||" operator.



ISNULL(ColumnName)||TRIM(ColumnName)==""?"Unknown": ColumnName
Returns the first five characters from a zip code
Derived Column Transform in the Data Flow:

SUBSTRING(ZipCodePlus4,1,5)
Remove a given character from a string
(ex. Remove "-" from a social security number)
Derived Column Transform in the Data Flow:

REPLACE(SocialSecurityNumber, "-","")
Uppercase data
Derived Column Transform in the Data Flow:

UPPER(ColumnName)
Replace NULL with another value
Derived Column Transform in the Data Flow:

ISNULL(ColumnName)?"New Value" : ColumnName
Replace blanks with NULL values
Derived Column Transform in the Data Flow:

TRIM(ColumnName)=="" ? (DT_STR, 4, 1252)NULL(DT_STR, 4, 1252) : ColumnName
Remove any non-numeric data from a column
Script Transform in the Data Flow Task with the code as follows (VB 2008):

Imports System.Text.RegularExpressions


Public Overrides Sub Input()_ProcessInputRows(ByVal Row As Input()Buffer)

If Row.ColumnName_IsNull = False Or Row.ColumnName = "" Then
Dim pattern As String = String.Empty

Dim r As Regex = Nothing

pattern = "[^0-9]"

r = New Regex(pattern, RegexOptions.Compiled)

Row.ColumnName = Regex.Replace(Row.ColumnName, pattern, "")
End If
End Sub
Convert text to proper case
(ex. 1st letter in each word is uppercase)
Script Transform with the line of partial code as follows:

Row.OutputName = StrConv(Row.InputName, VBStrConv.ProperCase)
Build dynamic SQL statement
Expression on the SQLStatementSource property of Execute SQL Task:

"SELECT Column From " + @[User::TableName] +WHERE DateFilterColumn = '" + (DT_WSTR,4)YEAR(@[User::DateTimeVar]) +
RIGHT("0" + (DT_WSTR,2)MONTH(@[User::DateTimeVar]),2) + RIGHT("0" + (DT_WSTR,2)DAY(@[User::DateTimeVar]),2) + "'"

Expression Output:SELECT Column FROM MyTable WHERE DateFilerColumn = '20060915'
Calculate beginning of the previous month
Expression on component or task:

(DT_DATE)(DT_DBDATE)DATEADD("dd",-1*(DAY(GETDATE())-1),DATEADD("month", -1, GETDATE()))
Round to the nearest two decimal mark
Expression on Derived Column Transform:

ROUND(YourNumber, 2)

Expression Output Example: 1.2600000

Common SSIS Problems and Solutions


Common SSIS Problems and Solutions

Problems
Solutions
Loop over a list of files & load each one
Tasks Required: Foreach Loop, Data Flow Task

Solution: Configure the Foreach Loop to loop over any particular directory of files. The loop should be configured to output
to a given variable. Map the given variable to a connection manager by using expressions.
Conditionally executing tasks

Solution: Double-click the precedence constraint and set the Evaluation property to Expression and Constraint.
Type the condition that you want to evaluate in the Expression box.
Pass in variables when scheduling or running a package

Solution: Use the /SET command in the DTExec command line or change the Property tab in the Package Execution Utility to have the property path like:
\Package.Variables[User::VariableName].Properties[Value]
Move and rename the file at the same time
Tasks Required: File System Task

Solution: Set the File System task to rename the file and point to the directory you'd like to move the file to. This enables you to rename and move the file in the same step.
Loop over an array of data in a table & perform a set of tasks for each row
Tasks Required: Execute SQL Task, Foreach Loop

Solution: Use an Execute SQL Task to load the array and send the data into an object variable. Loop over the variable in a Foreach Loop by use an ADO Enumerator.
Perform an incremental load of data
Tasks Required: 2 Execute SQL Tasks, Data Flow Task

Solution: Have the 1st Execute SQL Task retrieve a date from a control table of when the target table was last loaded and place that into a variable.
In the Data Flow Task, create a date range on your query using the variable. Then, update the control table using a 2nd Execute SQL Task to specify when the table was
last updated.
Perform a conditional update & insert
Components Required: Data Flow Task, Conditional Split, Lookup Transform or Merge Join, OLE DB Command Transform

Solution: Use the lookup Transform or Merge Join to detemine if the row exists on the destination and ignore a failed match. If the row yields blank
on the key, then you know the row should be inserted into the target (by Conditional Split). Otherwise, the row is a duplicate or an update. Determine if the row
is an update by comparing the source value to the target value in the Conditional Split. The update can be done by an OLE DB Command Transform or by loading the data
into a staging table.

Sunday, January 2, 2011

SQL Server Integration Services SSIS Package Configuration

Problem
As we develop, test and deploy SSIS packages, we need to specify different parameter values for things like database connection strings, file and folder paths, etc. without having to edit the individual SSIS packages to make these changes. How can we take advantage of the configuration capabilities that SSIS provides, particularly the SQL Server package configuration, to accomplish our goal? Read More..

Monday, December 20, 2010

Slowly Changing Dimension Transformation

The Slowly Changing Dimension transformation coordinates the updating and inserting of records in data warehouse dimension tables. For example, you can use this transformation to configure the transformation outputs that insert and update records in the DimProduct table of the AdventureWorksDW2008R2 OLAP database with data from the Production.Products table in the AdventureWorks2008R2 OLTP database. Read more..

Also check this article

Sunday, December 19, 2010

SQL Server Integration Services (SSIS) 15 Best Practices

Listed below are some SQL Server Integration Services (SSIS) best practices:

1.Keep it simple.

Avoid using components unnecessarily. For example:

Step 1. Declare the variable varServerDate.

Step 2. Use ExecuteSQLTask in the control flow to execute a SQL query to get the server date-time and store it in the variable

Step 3. Use the dataflow task and insert/update database with the server date-time from the variable varServerDate.

This sequence is advisable only in cases where the time difference from step 2 to step 3 really matters. If that doesn't really matter, then just use the getdate() command at step 3, as shown below:


--create table #table1 (Lap_Id int, LAP_Date datetime)

Insert into #table1 (Lap_Id, LAP_Date) values (1, getdate())2.Calling a child package multiple times from a parent with different parameter values.

When a child package is executed from a master package, the parameters that are passed from the master need to be configured in the child package. For this, you can use the ‘Parent Package Configuration’ option in the child package. But, for using the ‘Parent Package Configuration’, you need to specify the name of the ‘Parent Package Variable’ that is passed to the child package. If you want to call the same child package multiple times (each time with a different parameter value), declare the parent package variables (with the same name as given in the child package) with a scope limited to ‘Execute Package Tasks’.

SSIS allows declaring variables with the same name but the scope limited to different tasks – all inside the same package!

3.SQL job with many atomic steps.

For the SQL job that calls the SSIS packages, make multiple steps, each doing small tasks, rather than a single step doing all the tasks. In the first case, the transaction log grows too big, and if a rollback happens, it may take the full processing space of the server.

4.Avoid unnecessary typecasts.

Avoid unnecessary type casts. For example, the flat file connection manager, by default, uses the string [DT_STR] data type for all the columns. In case you want to use the actual data types, you have to manually change it. Better to change it at the source-level itself to avoid unnecessary type castings.

5.Transactions.

Usually, the ETL processes handle large volumes of data. In such a scenario, do not attempt a transaction on the whole package logic. However, SSIS supports transaction, and it is advisable to use transactions where the atomicity of the transaction is taken care of.

For example, consider a scenario where a source record is to be spitted into 25 records at the target - where either all the 25 records reach the destination or zero. In this scenario, using a transaction, we can ensure either all the 25 records reach the destination or zero.


6.Distributed transaction spanning multiple tasks.

The control flow of an SSIS package threads together various control tasks. It is possible to set a transaction that can span into multiple tasks using the same connection. To enable this, the “retainsameconnection” property of the Connection Manager should be set to “True”.

7.Limit the package names to a maximum of 100 characters.

When an SSIS package with a package name exceeding 100 chars is deployed into SQL Server, it trims the package name to 100 chars, which may cause an execution failure. So, limit the package names to a maximum of 100 characters.

8.Select * from…

Make sure that you are not passing any unnecessary columns from the source to the downstream. With the OLEDB connection manager source, using the ‘Table or View’ data access mode is equivalent to ‘SELECT * FROM ’, which will fetch all the columns. Use ‘SQL command’ to fetch only the required columns, and pass that to the downstream. At each down-stream component, filter out the unnecessary columns.

9.Sorting.

Sorting in SSIS is a time consuming operation. At the places where we know that data is coming from database tables, it’s better to perform the sorting operation at the database query itself.

10.Excel Source and 64-bit runtime.

The Excel Source or Excel Connection manager works only with the 32 bit runtime. When a package using the Excel Source is enabled for 64-bit runtime (by default, it is enabled), it will fail on the production server using the 64-bit runtime. Go to the solution property pages\debugging and set Run64BitRuntime to False.

11.On failure of a component, stop/continue the execution with the next component.


When a component fails, the property failParentonFailure can be effectively used either to stop the package execution or continue with the next component - exception - stop/continue with the next component in a sequence container. The value of the constraint connecting the components in the sequence should be set to "Completion", and the failParentonFailure property should be set to False (default).

12.Protection.

To avoid most of the package deployment error from one system to another system, set the package protection level to ‘DontSaveSenstive’.

13.Copy pasting the Script component.

Once you copy-paste a script component and execute the package, it may fail. Just open the script editor of the pasted script component, save the script, and execute the package – it will work.

14.Configuration filter – Use as a filter.

It is a best practice to use the package name as the configuration filter for all the configuration items that are specific to a package. It is especially useful when there are so many packages with package-specific configuration items. For the configuration items that are general to many packages, use a generic name.

15.Optimal use of configuration records.

Avoid the same configuration item recorded under different filter/object names. For example, if two packages are using the same connection string, you need only one configuration record. To enable this, use the same name for the connection manager in both the packages. Also, use a generic configuration filter. This is quite convenient at the time of porting from one environment to another (e.g.: from UAT to production).

Courtesy:CodeProject

SQLCat