Defining a Configuration Approach for Integration Services Packages
by John Welch, SQL Server MVP and Chief Architect at Mariner. You can view John’s blog at http://agilebi.com/cs/blogs/jwelch/.
Introduction
It is important to have a defined approach for handling configurations in Integration Services packages. Configurations are important to making packages portable - being able to deploy them in multiple environments without making changes to the package file itself. Developing a consistent approach is also important because it makes maintenance simpler, and supports multiple developers much more easily.
Considerations for Choosing a Configuration Approach
There are several important aspects in defining an approach to configurations.
1. Understand the advantages and limitations of each configuration type. Books Online includes details about each type of configuration (https://msdn.microsoft.com/en-us/library/ms141682(SQL.100).aspx). One limitation has a direct effect on the configuration approach - Parent Package Variable configurations are always applied later than the other types. This means that a Parent Package Variable configuration cannot be used to set values that other configurations depend on.
2. Some configuration types are excluded by environmental factors. For example, in some environments, the people responsible for the servers may not allow new environment variables to be created on them. This eliminates environment variable configurations from consideration. It also prevents the use of indirect configurations, since those rely on environment variables as well. If the deployment environment does not include a SQL Server relational instance, then SQL Server configurations cannot be used. Sometimes, it may be worth seeking an exception to such restrictions, as they add complexity to the configuration solution.
3. How the packages will be run? Will the packages be run as single, standalone packages? Or will multiple packages be run together as a unit, using a parent package to call multiple child packages? If the packages are to be run as single units, the configuration approach needs to avoid the use of parent package variables. In some cases, the packages may need to be capable of running both standalone and as a unit, which can require special handling.
4. How will the location of the configurations be specified? With SQL Server configurations, Integration Services needs to know which database holds the configuration table. With an XML configuration file, Integration Services needs to know the path to the XML file. These values are stored in the package, but they often need to be updated or overridden when the package is deployed.
5. Which properties are going to be grouped in a single configuration file? An Integration Services configuration container can store multiple property settings. For example, an XML configuration file can hold connection string values for two separate connection managers. When the configuration is applied, SSIS will attempt to set the connection strings for all connection managers specified in the file. The ConfigurationFilter value serves the same purpose of grouping property settings together for SQL Server configurations. If you attempt to apply a configuration to a package, and the configuration contains a property that does not exist in the current package, a warning will be generated. For this reason, you should only include multiple property settings in a configuration file or group when you are confident that any package that is using the configuration will have all the properties defined.
6. How often will the configurations need to be changed? Will the actual configuration need to be changed dynamically when the package is executed, for example, by using the /CONFIG, /CONN, or /SET switch of DTEXEC? Or will the configuration be consistent whenever the package is executed in a given environment?
Understanding How Integration Services Applies Configurations
Integration Services applies configurations with a “last in wins” approach. The last configuration applied will overwrite any previous values. However, it will only apply the property settings that are in the configuration container, which means that two configuration files or groups can overlap. Configurations will always be loaded in the order specified in the Package Configurations dialog, with the exception of Parent Package Variable configurations, which are always applied after the other configuration types.
SQL Server 2005 Integration Services
When you execute the package in SQL Server 2005, configurations are applied in this order:
1. The package file is loaded.
2. The configurations specified in the package at design time are applied in the order specified in the package (except for Parent Package Variables).
3. Any options specified from the command line are applied. Any configurations specified on the command line overwrite the current values; however, they do not trigger a reload of the configurations if a changed value impacts a configuration dependency. For example, if the connection string used for SQL Server configurations is updated from the command line at run time, the package will still retain any other values from the design-time SQL Server configuration database.
4. Parent Package Variable Configurations are applied.
5. The package is run.
This order can create a scenario that is difficult to resolve with Integration Services configurations. If you use XML - based configurations, you need to specify the path to the configuration file. If you are using SQL Server configurations, you need to specify the initial connection string to the database that holds the configuration table. Ideally, you would be able to override these values by using the command line switches of DTEXEC. However, since configurations are applied before values specified on the command line, this does not work. For example, if you specified a new connection string for the configuration database at the command line, the SQL Server configurations would be applied from the original connection string. Then the connection string from the command line would be applied. However, the SQL Server configurations would not reload, so there is effectively no way to override the configurations from the command line.
SQL Server 2008 Integration Services
This behavior has been changed in SQL Server 2008. Configurations are applied twice - once before the command line options, and then again after applying the command line options. The configurations are applied in this order:
1. The package file is loaded.
2. The configurations specified in the package at design time are applied in the order specified in the package (except for Parent Package Variables).
3. Command line values are applied.
4. The configurations specified in the package at design time are reloaded in the order specified in the package (except for Parent Package Variables). The configurations will use any updated values from the command line when they are reloaded, so it is possible to change the connection string for a SQL Server configuration.
5. Parent Package Variable Configurations are applied.
6. The package is run.
This is an improvement over the behavior in SQL Server 2005. However, there is still a potential problem with the new functionality. Since the design time configuration is applied twice, it can overwrite a value specified on the command line. This means that you cannot override a configured value from the command line without also overriding the design time configuration location.
Parent and Child Packages
Command line options are not passed from a parent package to a child package. So it is not possible to override the configuration location on a child package by using a DTEXEC command line switch to override it on the parent.
Sample Approaches to Configurations
Now that the various aspects of configurations have been covered, we will review two approaches to configurations. These approaches should cover the most commonly encountered scenarios.
SQL Server Approach
Advantages and Disadvantages
The first approach uses SQL Server as the primary configuration location. It uses an environment variable to get the initial connection string for the SQL Server database. The advantages of this approach are:
- The package can run independently, as there is no dependency on parent package variables or command line options to specify the primary configuration location. The location is handled by the environment variable.
- This approach can also serve as a base for parent-child packages. The core configuration items can be handled through the environment variable and SQL Server configurations, with remaining items passed as Parent Package Variable configurations. However, the packages can still be run in a standalone manner easily.
- Multiple configurations can be stored in the same SQL Server database, so this approach only requires a single environment variable to contain the connection string. If we were using XML configurations, and required multiple configurations, we would have to create an environment variable for each XML configuration file.
The disadvantages of this approach are:
- It requires an instance of SQL Server and an environment variable. If these are prohibited in the deployment environment, this approach is not usable.
- Users of the computer are not protected from seeing the values of environment variables, so this approach should only be used if you are using Windows Authentication to connect to the SQL Server database for configurations. Using Windows Authentication will prevent you from having to store the password in the environment variable.
- Setting up a system environment variable requires system administrator privileges.
Setting Up the Environment Variables
To set up this approach at design time, follow these steps:
1. Before opening the Business Intelligence Developer Studio (BIDS) environment, create a system environment variable named SSIS_ROOT to hold the connection string for the SQL Server database. Enter the connection string into the value for the variable. Directions for creating a system environment variable can be found here: (http://technet2.microsoft.com/windowsserver/en/library/4029b464-c281-49af-84d5-6cd092e61a651033.mspx?mfr=true). This needs to be done before opening BIDS, since it caches the environment variables when it is first opened. It should also be created as a system environment variable, so that all users of the computer will have access to it.
2. Open BIDS and create a new Integration Services project. Open the newly created package.
3. Create an OLE DB connection manager named Configuration that points to the SQL Server database that will be used to store configurations in the development environment.
Figure 1: Creating an OLE DB connection manager for SQL Server configurations
4. Open the Package Configurations dialog box by choosing Package Configurations from the SSIS menu.
Figure 2: Selecting the Package Configurations option on the SSIS menu
5. Select the Enable package configurations and add a new Environment Variable configuration that points to the SSIS_ROOT environment variable. Select Next to continue.
Figure 3: Specifying an environment variable in the Package Configuration Wizard
6. Select the Configuration connection manager’s ConnectionString property. Select Next to continue.
Figure 4: Select the target property in the Package Configuration Wizard
7. Give the configuration an appropriate name, and select Finish to complete the wizard.
Adding the Configurations
Now the package is prepared to have SQL Server configurations added to it. The process below should be repeated for each set of configurations that you want to store for the package.
1. Add a new SQL Server configuration that uses the Configuration connection manager. Note that you can create the configuration table in the database if it does not already exist. Set the configuration filter to an appropriate value. Select Next to continue.
Figure 5: Setting up SQL Server configurations in the Package Configuration Wizard
2. On the next page, select the appropriate properties that you would like to configure for your package. The values selected here will be stored in the SQL Server table, with the same Configuration filter value, so that they will be treated as a unit. Click next to continue.
3. Give the configuration an appropriate name, and select Finish to complete the wizard.
Reviewing the Configurations
The Package Configuration dialog should look similar to the following image, with the first configuration referencing the SSIS_ROOT environment variable, and subsequent configurations referencing SQL Server:
Figure 6: Review configurations in the Package Configurations Organizer
Deploying the Package
To deploy this package to a different environment, follow these steps:
1. Deploy the package files to the new environment, either by copying the files or by deploying the packages to the MSDB database in SQL Server.
2. Create the SSIS_ROOT environment variable in the new environment and set the value to the appropriate connection string for that environment.
3. For an initial deployment, copy the configuration table to the configuration database for the new environment. Update the Configured Values stored in the table to reflect the appropriate values for the new environment. For subsequent deployments, only new or updated values will need to be adjusted.
4. At this point, the packages should be ready to run in the new environment, and to use the configured values from the new environment.
Please note that while the above approach uses a separate environment variable configuration to get the initial connection string for the SQL Server, it is possible to use indirect configurations to achieve the same goal. However, both approaches are functionally equivalent, and the approach used above makes it clearer where the initial connection string is coming from.
Parent Package Variables Approach
Advantages and Disadvantages
The parent package variable approach uses the ability of Integration Services to pass values between packages. It requires a parent package that retrieves all the configuration values, and then calls the child packages. The child packages have only Parent Package Variable configurations defined. The advantages of this approach are:
- It does not require the use of environment variables, SQL Server, or XML files. Everything is driven from the parent package.
- The Parent Package Variable approach is often the most flexible if the packages will be deployed to a large number of environments, where you cannot be sure what options for configurations might be available. Changes based on the external environment affect only the parent package.
- All of the values in the parent package can be set from the command line. These can then be passed to child packages through the Parent Package Variable configurations. For example, you could use an XML configuration (specified at run time through the /CONFIG switch of DTEXEC) to set all the values in the parent package. The file from the /CONFIG switch won’t be passed automatically to the child packages, but the values from the parent package will be passed.
The disadvantages of this approach are:
- The packages are typically not usable in a stand-alone manner. The package can be run with the default values stored in the package, but this may not be adequate in multiple developer environments, or if the packages need to be run standalone in test or production environments. One option to work around this is to provide a generic parent package which can be used to execute any child package with the proper values.
- It does not blend well with other configuration types. Since Parent Package Variables are always applied last, they will overwrite any values from other configurations. They also cannot be used to override the initial connections for other configuration types, as they are applied after the other configurations have already been loaded.
Setting Up the Configurations
To set up this approach at design time, follow these steps:
1. Create a parent package. Define a variable in the parent package for each value that you want to configure in the child packages.
Figure 7: Setting up variables in the parent package
2. Add the appropriate logic to the package to set the variable values to the desired values. This could be by using other configuration types to set the variables, by using a Script Task to set the values by reading from a non-SQL Server database, or by hard coding the values into the packages. For this example, we will add an XML configuration that holds the value for each variable.
Figure 8: Setting up XML configurations in the Package Configurations Organizer
3. Add a connection manager and an Execute Package Task for each child package that needs to be called.
4. In each child package, enable package configurations. Add a Parent Package Variable configuration. Specify the name of the parent variable that you want to use in this package. Click Next to continue.
Figure 9: Setting up Parent Package Variable configurations in the child packages
5. On the properties page, pick the appropriate package property that should be set from the parent variable. Click next to continue.
6. Give the configuration an appropriate name, and select Finish to complete the wizard.
Deploying the Packages
To deploy these packages to a different environment, follow these steps:
1. Deploy the package files to the new environment, either by copying the files or by deploying the packages to the MSDB database in SQL Server.
2. Copy the XML configuration file to the new environment. Update the variable values in the XML file to reflect the appropriate values for the new environment. For subsequent deployments, only new or updated values will need to be adjusted.
3. Run the parent package from DTEXEC and use the /CONFIG switch to specify the location of the XML configuration file in the new environment. The packages should run, and use the updated values from the configuration.
Recommendations for Specific Scenarios
If the deployment environment supports environment variables and has a SQL Server database available, the SQL Server approach discussed in this article should be used. If a SQL Server database is not available, then XML files with indirect configurations (based on environment variables) can be used in a very similar way.
If the deployment environment does not allow environment variables, but a consistent path can be provided in each environment, then XML configurations can be used. The path should be identical in each environment (for example, “C:\SSIS_Config\”), so that a local copy of the configuration file can be deployed to each environment without having to modify the path to point to a new configuration location.
If environment variables are not allowed, and a consistent local path cannot be used, the Parent Package Variable approach is often the best approach, as it isolates any environmental changes to the parent package.
This is not an exhaustive list of the possibilities, but these are approaches that have worked well for me in a variety of situations.
Conclusion
This article has described two sample approaches to configurations in SQL Server Integration Services. In addition, it has presented many of the factors that must be considered before selecting a configuration approach. The samples here are usable as they are, but in many cases, you will want to modify or blend multiple approaches. Integration Services configurations are very flexible, and, used properly, can be a powerful way to make packages more portable and reusable.
About the author**. John Welch is Chief Architect with Mariner, a consulting firm that specializes in enterprise reporting & analytics, data warehousing and performance management solutions. John has been working with business intelligence and data warehousing technologies for 6 years, with a focus on Microsoft products in heterogeneous environments. He is a Microsoft Most Valued Professional (MVP), an award given due to his commitment to sharing his knowledge with the IT community. John is an experienced speaker, and has given presentations at Professional Association for SQL Server (PASS) conferences, Software Development West (SD West), Software Management Conference (ASM/SM), and others. John has also been published in DM Review, SQL Server Professional, and XML Developer.