The Slowly Changing Dimension transformation coordinates the updating and inserting of records in data warehouse dimension tables. For example, you can use this transformation to configure the transformation outputs that insert and update records in the DimProduct table of the AdventureWorksDW2008R2 OLAP database with data from the Production.Products table in the AdventureWorks2008R2 OLTP database. Read more..
Also check this article
Monday, December 20, 2010
Sunday, December 19, 2010
SQL Server Integration Services (SSIS) 15 Best Practices
Listed below are some SQL Server Integration Services (SSIS) best practices:
1.Keep it simple.
Avoid using components unnecessarily. For example:
Step 1. Declare the variable varServerDate.
Step 2. Use ExecuteSQLTask in the control flow to execute a SQL query to get the server date-time and store it in the variable
Step 3. Use the dataflow task and insert/update database with the server date-time from the variable varServerDate.
This sequence is advisable only in cases where the time difference from step 2 to step 3 really matters. If that doesn't really matter, then just use the getdate() command at step 3, as shown below:
--create table #table1 (Lap_Id int, LAP_Date datetime)
Insert into #table1 (Lap_Id, LAP_Date) values (1, getdate())2.Calling a child package multiple times from a parent with different parameter values.
When a child package is executed from a master package, the parameters that are passed from the master need to be configured in the child package. For this, you can use the ‘Parent Package Configuration’ option in the child package. But, for using the ‘Parent Package Configuration’, you need to specify the name of the ‘Parent Package Variable’ that is passed to the child package. If you want to call the same child package multiple times (each time with a different parameter value), declare the parent package variables (with the same name as given in the child package) with a scope limited to ‘Execute Package Tasks’.
SSIS allows declaring variables with the same name but the scope limited to different tasks – all inside the same package!
3.SQL job with many atomic steps.
For the SQL job that calls the SSIS packages, make multiple steps, each doing small tasks, rather than a single step doing all the tasks. In the first case, the transaction log grows too big, and if a rollback happens, it may take the full processing space of the server.
4.Avoid unnecessary typecasts.
Avoid unnecessary type casts. For example, the flat file connection manager, by default, uses the string [DT_STR] data type for all the columns. In case you want to use the actual data types, you have to manually change it. Better to change it at the source-level itself to avoid unnecessary type castings.
5.Transactions.
Usually, the ETL processes handle large volumes of data. In such a scenario, do not attempt a transaction on the whole package logic. However, SSIS supports transaction, and it is advisable to use transactions where the atomicity of the transaction is taken care of.
For example, consider a scenario where a source record is to be spitted into 25 records at the target - where either all the 25 records reach the destination or zero. In this scenario, using a transaction, we can ensure either all the 25 records reach the destination or zero.
6.Distributed transaction spanning multiple tasks.
The control flow of an SSIS package threads together various control tasks. It is possible to set a transaction that can span into multiple tasks using the same connection. To enable this, the “retainsameconnection” property of the Connection Manager should be set to “True”.
7.Limit the package names to a maximum of 100 characters.
When an SSIS package with a package name exceeding 100 chars is deployed into SQL Server, it trims the package name to 100 chars, which may cause an execution failure. So, limit the package names to a maximum of 100 characters.
8.Select * from…
Make sure that you are not passing any unnecessary columns from the source to the downstream. With the OLEDB connection manager source, using the ‘Table or View’ data access mode is equivalent to ‘SELECT * FROM’, which will fetch all the columns. Use ‘SQL command’ to fetch only the required columns, and pass that to the downstream. At each down-stream component, filter out the unnecessary columns.
9.Sorting.
Sorting in SSIS is a time consuming operation. At the places where we know that data is coming from database tables, it’s better to perform the sorting operation at the database query itself.
10.Excel Source and 64-bit runtime.
The Excel Source or Excel Connection manager works only with the 32 bit runtime. When a package using the Excel Source is enabled for 64-bit runtime (by default, it is enabled), it will fail on the production server using the 64-bit runtime. Go to the solution property pages\debugging and set Run64BitRuntime to False.
11.On failure of a component, stop/continue the execution with the next component.
When a component fails, the property failParentonFailure can be effectively used either to stop the package execution or continue with the next component - exception - stop/continue with the next component in a sequence container. The value of the constraint connecting the components in the sequence should be set to "Completion", and the failParentonFailure property should be set to False (default).
12.Protection.
To avoid most of the package deployment error from one system to another system, set the package protection level to ‘DontSaveSenstive’.
13.Copy pasting the Script component.
Once you copy-paste a script component and execute the package, it may fail. Just open the script editor of the pasted script component, save the script, and execute the package – it will work.
14.Configuration filter – Use as a filter.
It is a best practice to use the package name as the configuration filter for all the configuration items that are specific to a package. It is especially useful when there are so many packages with package-specific configuration items. For the configuration items that are general to many packages, use a generic name.
15.Optimal use of configuration records.
Avoid the same configuration item recorded under different filter/object names. For example, if two packages are using the same connection string, you need only one configuration record. To enable this, use the same name for the connection manager in both the packages. Also, use a generic configuration filter. This is quite convenient at the time of porting from one environment to another (e.g.: from UAT to production).
Courtesy:CodeProject
SQLCat
1.Keep it simple.
Avoid using components unnecessarily. For example:
Step 1. Declare the variable varServerDate.
Step 2. Use ExecuteSQLTask in the control flow to execute a SQL query to get the server date-time and store it in the variable
Step 3. Use the dataflow task and insert/update database with the server date-time from the variable varServerDate.
This sequence is advisable only in cases where the time difference from step 2 to step 3 really matters. If that doesn't really matter, then just use the getdate() command at step 3, as shown below:
--create table #table1 (Lap_Id int, LAP_Date datetime)
Insert into #table1 (Lap_Id, LAP_Date) values (1, getdate())2.Calling a child package multiple times from a parent with different parameter values.
When a child package is executed from a master package, the parameters that are passed from the master need to be configured in the child package. For this, you can use the ‘Parent Package Configuration’ option in the child package. But, for using the ‘Parent Package Configuration’, you need to specify the name of the ‘Parent Package Variable’ that is passed to the child package. If you want to call the same child package multiple times (each time with a different parameter value), declare the parent package variables (with the same name as given in the child package) with a scope limited to ‘Execute Package Tasks’.
SSIS allows declaring variables with the same name but the scope limited to different tasks – all inside the same package!
3.SQL job with many atomic steps.
For the SQL job that calls the SSIS packages, make multiple steps, each doing small tasks, rather than a single step doing all the tasks. In the first case, the transaction log grows too big, and if a rollback happens, it may take the full processing space of the server.
4.Avoid unnecessary typecasts.
Avoid unnecessary type casts. For example, the flat file connection manager, by default, uses the string [DT_STR] data type for all the columns. In case you want to use the actual data types, you have to manually change it. Better to change it at the source-level itself to avoid unnecessary type castings.
5.Transactions.
Usually, the ETL processes handle large volumes of data. In such a scenario, do not attempt a transaction on the whole package logic. However, SSIS supports transaction, and it is advisable to use transactions where the atomicity of the transaction is taken care of.
For example, consider a scenario where a source record is to be spitted into 25 records at the target - where either all the 25 records reach the destination or zero. In this scenario, using a transaction, we can ensure either all the 25 records reach the destination or zero.
6.Distributed transaction spanning multiple tasks.
The control flow of an SSIS package threads together various control tasks. It is possible to set a transaction that can span into multiple tasks using the same connection. To enable this, the “retainsameconnection” property of the Connection Manager should be set to “True”.
7.Limit the package names to a maximum of 100 characters.
When an SSIS package with a package name exceeding 100 chars is deployed into SQL Server, it trims the package name to 100 chars, which may cause an execution failure. So, limit the package names to a maximum of 100 characters.
8.Select * from…
Make sure that you are not passing any unnecessary columns from the source to the downstream. With the OLEDB connection manager source, using the ‘Table or View’ data access mode is equivalent to ‘SELECT * FROM
9.Sorting.
Sorting in SSIS is a time consuming operation. At the places where we know that data is coming from database tables, it’s better to perform the sorting operation at the database query itself.
10.Excel Source and 64-bit runtime.
The Excel Source or Excel Connection manager works only with the 32 bit runtime. When a package using the Excel Source is enabled for 64-bit runtime (by default, it is enabled), it will fail on the production server using the 64-bit runtime. Go to the solution property pages\debugging and set Run64BitRuntime to False.
11.On failure of a component, stop/continue the execution with the next component.
When a component fails, the property failParentonFailure can be effectively used either to stop the package execution or continue with the next component - exception - stop/continue with the next component in a sequence container. The value of the constraint connecting the components in the sequence should be set to "Completion", and the failParentonFailure property should be set to False (default).
12.Protection.
To avoid most of the package deployment error from one system to another system, set the package protection level to ‘DontSaveSenstive’.
13.Copy pasting the Script component.
Once you copy-paste a script component and execute the package, it may fail. Just open the script editor of the pasted script component, save the script, and execute the package – it will work.
14.Configuration filter – Use as a filter.
It is a best practice to use the package name as the configuration filter for all the configuration items that are specific to a package. It is especially useful when there are so many packages with package-specific configuration items. For the configuration items that are general to many packages, use a generic name.
15.Optimal use of configuration records.
Avoid the same configuration item recorded under different filter/object names. For example, if two packages are using the same connection string, you need only one configuration record. To enable this, use the same name for the connection manager in both the packages. Also, use a generic configuration filter. This is quite convenient at the time of porting from one environment to another (e.g.: from UAT to production).
Courtesy:CodeProject
SQLCat
Subscribe to:
Posts (Atom)