Connectivity and SQL Server 2005 Integration Services
SQL Server Technical Article
Bob Beauchemin
Technical Reviewers: Deniz Erkan, Bob Bojanic, Ted Lee, Darren Green, Roger Toren
February 2007
Applies to:
Microsoft SQL Server 2005 SP1
Summary: This white paper enumerates the connectivity options for databases with Microsoft SQL Server 2005 Integration Services and mentions special considerations that are required when using some database sources with some of the more complex transformations. (21 printed pages)
Click to download the Word version of this article.
Contents
Introduction
Elements of SSIS Connectivity
Generic Connectivity Issues
Database Connector-Specific Details
Non-Database Data Connectors
Extensibility
Conclusion
Appendix: Data-Connectivity Providers and SSIS
Introduction
Microsoft SQL Server 2005 Integration Services (SSIS) exists to implement enterprise-level extract, transformation, and loading functionality (ETL) for SQL Server and heterogeneous data sources. As part of the SQL Server 2005 product, its main focus is to provide seamless integration between SQL Server–based online transaction processing (OLTP) systems and data warehouses, SQL Server Analysis Services cubes, and data-mining models. The rich variety of transformation and analysis functionality and the data-flow engine in SSIS also make it an attractive choice with databases other than SQL Server. This white paper seeks to enumerate and clarify the data-source choices.
There are many different facets for measuring ETL quality. SSIS data flows often run with large amounts of data during off hours. On occasion, the speed of the transformation and loading determines how much data can be processed and made available for analysis and data mining. So, one way to measure the quality of data-source support is to consider how fast data can be extracted from and loaded into the data source in question. When using SSIS, extracting from data sources other than SQL Server is usually more important than loading into data sources other than SQL Server and Analysis Services. SSIS can execute in 32-bit mode or 64-bit mode. Using 64-bit mode results in more available linear memory, which often translates into faster execution and data transformation. Thus, 64-bit support for any data source is beneficial.
Another measure of quality is how well the data type and precision of the data can be maintained. Data-type conversations that result in loss of precision or the inability of a product to support all data types of the underlying data stores affect the quality of the analytical outcome. In addition to precision, having to perform a time and buffer-consuming data conversation on a per-row basis can adversely affect the speed of the ETL process.
Finally, although it is possible to compose ETL workflows programmatically, most users use the Business Intelligence Development Studio (BI Development Studio) tool to design and debug their workflows. Graphic user interface–based tools use data-source metadata (for example, lists of tables, connections, and procedure parameters) to compose data input and output steps. Quality of support for metadata is a win for heterogeneous data sources when using SSIS and BI Development Studio. To summarize, when we discuss connectivity options we'll concentrate on the following features for different data sources:
- 64-bit driver support in addition to 32-bit driver support
- Data-type support
- Metadata richness
- Speed of execution of the database driver, especially data extraction and loading
This white paper is roughly divided into three sections. The first section covers the built-in SSIS components that involve data connectivity and the data sources that each component supports. The second section covers generic connectivity issues and addresses questions such as, "If I want to connect SSIS to a database and the database supports OLE DB, ADO.NET, and JDBC, which connectors can I use?" and "What type of connectors would result in the richest BI Development Studio experience and minimize data-type conversions?" The last section covers the specifics of using SQL Server and of using selected popular databases and data stores, on a per-database basis. That section covers relational databases, files, and XML, rather than specifics of SSIS interaction with SQL Server Analysis Services and data mining.
Elements of SSIS Connectivity
In SQL Server 2005 Integrations Services, almost all packages will be designed and tested using the Business Intelligence Development Studio tool. Therefore, I'll start by looking at how data-source providers interact with the components at design time and during testing.
SSIS Components
SSIS packages consist of series of one or more control flow tasks. Data input, output, and transformations are accomplished by using data-flow tasks, and the Data Flow task is where we'll center our discussion of connectivity. Data Flow tasks use one or more data-flow sources and produce one or more data-flow destinations. Data providers are the mechanism to feed data to a data-flow source (SSIS Source Adapter) and to harvest data from a data-flow destination (SSIS Destination Adapter); they interact with data adapters as shown in Figure 1.
Figure 1. Interactions between data providers and the SSIS data flow (Click on the picture for a larger image)
Information about how to locate and connect to a data source or destination (also called connection string information) is encapsulated in a connection manager. The data sources and destinations process sets of rows, which are called datasets in SSIS (not to be confused with the ADO.NET DataSet class). A special Data Flow task, the Execute OLE DB Command task, can execute the SQL command of your choice once per row of the dataset. This is another way to access a database in a Data Flow task. Some Data Flow transformation components, such as the Lookup component, use secondary reference data sources.
In addition to the Data Flow task, the Execute SQL task and the Bulk Insert task are special control flow tasks can use databases. These tasks are usually used to set up a data flow. For example, the Execute SQL Task can be used to truncate the data in staging tables.
SSIS uses the SQL Server Import and Export Wizard, which can create a simple data-centric SSIS package, consisting of an Execute SQL task (if needed for set up), a Data Flow task containing a source and destination, and a Data Conversion transformation if the data types used require it.
SSIS components can generally be categorized as using data sources for input, output, and for reference.
The following SSIS components use database data as input:
- Execute SQL task
- DataReader source
- OLE DB source
- OLE DB Command (depending on type of command)
- SQL Server Import and Export Wizard (this uses the Execute SQL task and DataReader source or OLE DB Source)
The following SSIS components can use a database as output:
- Bulk Insert task
- OLE DB destination
- OLE DB Command (depending on type of command)
- SQL Server destination
- SQL Server Import and Export Wizard (this uses an OLE DB destination)
The following SSIS transformations reference an OLE DB data source:
- Lookup
- Fuzzy Lookup
- Fuzzy Grouping
- Slowly Changing Dimension
- Term Extraction (if the exclusion term feature is used)
- Term Lookup (for the reference table)
The DataReader destination and the Recordset destination do not use a database connection.
In conjunction with the built-in data sources, destinations, tasks, and transformations, you can write custom components for SSIS by using compiled Microsoft .NET or COM code, or the Script component and Visual Basic.NET script. The Script component is by far the easiest to use, and can be used to write a custom data source, transformation, or data destination component. As an example of the Script component's usefulness, script can be used to create a destination that writes directly to a database using ADO.NET. Although there is not a built-in ADO.NET destination component, a custom script destination for a database that is only accessible using ADO.NET can be produced with a few lines of code.
Not all SSIS components can use each of the different data sources. Table 1 summarizes which generic data provider types are useable with each component. I'll have more to say about provider types later in this white paper.
Table 1. SSIS components and provider types
Component | SQL Server only | OLE DB | ADO.NET (includes ODBC) | ADO | Flat file | Office Excel | SQL Server Mobile |
Import/Export Wizard
Source |
- | Y | Y | N | Y | Y | N |
Import/Export Wizard
Destination |
- | Y | N | N | Y | Y | N |
Execute SQL Task | - | Y | Y | Y | N | Y | Y |
Bulk Insert Task | Y | N | N | N | N | N | N |
Data Flow Source | - | Y | Y | N | Y | Y | N |
Data Flow Destination | Y | Y | N* | N** | Y | Y | Y |
SQL Server Destination | Y | N | N | N | N | N | N |
OLE DB Command | - | Y | N | N | N | N | N |
Lookup Reference Tables | - | Y | N | N | N | N | N |
Fuzzy Lookup Reference Tables | Y | N | N | N | N | N | N |
Fuzzy Grouping
Work Tables |
Y | N | N | N | N | N | N |
Slowly Changing Dimension Outputs | - | Y | N | N | N | N | N |
Term Extraction Work Tables | Y | N | N | N | N | N | N |
Term Lookup Work Tables | Y | N | N | N | N | N | N |
Term Lookup Reference Tables | - | Y | N | N | N | N | N |
*Although ADO.NET may not be used directly for loading database unless a custom SSIS script component is written, a data destination does exist for the ADO.NET DataReader object.
**Although ADO may not be used directly for loading databases unless a custom SSIS script component is written, a data destination does exist for the ADO Recordset object.
†This; can also be a Microsoft Office Access table.
Connection Managers
Almost all of the SSIS built-in components are configured by using connection managers. There are a plethora of built-in connection managers. You can also write your own to plug in to the architecture. Only a subset of the built-in connection managers applies to databases and traditional data access. The database-specific connection managers are the following:
- ADO
- ADO.NET
- EXCEL
- FILE, MULTIFILE, FLATFILE, MULTIFLATFILE
- MSOLAP90 (Analysis Services)
- ODBC
- OLEDB
- SQLMOBILE
Note The SSIS connection manager SQLMOBILE uses SQL Server Compact Edition as a data source; the original name for SQL Server Compact Edition was SQL Server Mobile Edition.
When you add a connection manager to an SSIS package by choosing New Connection from the context menu, a list of all of the connection manager types displays, as shown in Figure 2. There are also shortcuts for specific connection managers, such as the OLE DB connection manager.
Figure 2. SSIS built-in connection managers
Choosing a connection manager opens a list of data providers. The list of providers in the Provider list reflects the type of connection manager that you are using. ADO and OLE DB connection managers display a list of OLE DB providers installed on the local computer. OLE DB provider information lives in the Microsoft Windows® registry. The ADO.NET provider dialog box displays a list of ADO.NET 2.0 data providers. You can also choose to use any of the OLE DB data providers through ADO.NET by using the ADO.NET OleDb bridge provider, although there is usually no reason to do this. If you need an OLE DB provider, use an OLE DB connection manager. The ODBC connection manager uses the ADO.NET Odbc bridge provider, so the Provider list simply contains the word "ODBC". A list of the available ODBC data sources is available as part of the dialog box. To build a custom ODBC data source, select Use Connection String and press the Build button. This invokes the ODBC Control Panel item.
To specify connection information, the main body of the dialog box contains the most commonly used configuration parameters. There will be a custom dialog page if one exists for the provider; an example of a provider with a custom dialog page is the SQL Native Client OLE DB provider. Some providers may not have a dialog page with commonly used parameters at all. To specify advanced connection properties or to specify all of the connection properties for providers without a dialog page, click the Advanced tab in the main Connection Manager dialog box. Providers are allowed to specify as many connection properties as are useful, each as a name/value pair. The standard connection properties vary by API. For example, the Data Source property in OLE DB is called the Server property in ADO.NET.
Data-Flow Sources and Destinations
Data-flow sources and destinations are used to extract and load data in the Data Flow task. Data-flow source and destination connection information is configured by using connection managers. Once the database is selected, you have a variety of ways to specify the details of how data will be extracted or loaded; these are called data-access modes.
The OLE DB source supports the following data-access modes:
- A table or view name
- A table of view name from an SSIS variable
- An SQL command
- An SQL command from an SSIS variable
The OLE DB destination supports the following data-access modes:
- An SQL command
- A table or view name
- A table or view name from an SSIS variable
- A table or view name with fast load option
- A table or view name from an SSIS variable with fast load option
We'll talk about the fast load option later in this white paper. You usually choose a table or view by choosing it from a drop-down list, but this requires the provider to supply a list of tables and views. If the provider does not support listing tables and views, you need to specify a table name or use an SSIS variable containing the table name manually. Some providers have trouble with parameterized queries or with stored procedures that return rowsets. See Table 2 for a list of provider limitations, and the database-specific sections for details.
In addition to consuming data from a data destination in SSIS itself, you can also use the datasets produced from SSIS in external programs. A simple example of this technique is provided in Loading Data-Flow Results into a Client Application in SQL Server Books Online. You can also use a DataReader destination as a data source in SQL Server 2005 Reporting Services, though this is technically unsupported by both SSIS and Reporting Services.
Import and Export Wizard
A developer's first exposure to SSIS often comes through using the Import and Export Wizard. You invoke the Import and Export Wizard from within an SSIS project in BI Development Studio. To start the wizard, you can either right-click the SSIS Package node in the Solution Explorer pane or invoke it from the Project menu. You can also invoke the wizard in SQL Server Management Studio by right-clicking a database in Object Explorer, and selecting Import Data or Export Data. The wizard has a data-source chooser that's different from anything else in SSIS. Your data source can be any of the OLE DB providers or ADO.NET 2.0 data providers listed. You can use an ODBC data source by choosing the ADO.NET data provider for ODBC. Microsoft Office Excel, Microsoft Office Access, and flat files are also separately available as data sources without having to configure the OLE DB Provider for Jet 4.0. Your choice of database destinations includes only the OLE DB providers, Office Excel, Office Access, and flat files. Analysis Services through the 9.0 provider and the Microsoft Data Mining provider are available as either sources or destinations. As with SSIS in general, when you retrieve data from a database, you can choose Tables and Views or Write a SQL query to retrieve the data. If your data destination is a database, you can choose to create a new table, delete the existing rows, or append the new rows to the existing rows. With SQL Server databases there are additional options; for example, you can generate or insert the existing values of identity columns. When you are finished, the wizard creates an SSIS package and can catalog it to the msdb database in the current instance or save it to a file.
SSIS does few implicit data conversations; if needed it uses its own internal data-type system that is independent of any database. We'll discuss SSIS internal data types later in this paper. If a specific conversion is required to ensure data-type integrity, the wizard may add a Data Conversion transformation component to the Data Flow task in the resulting package. But because the Import and Export Wizard just moves data from one database to another, it has a special built-in, but extensible mechanism to preserve maximum data-type fidelity when moving data between heterogeneous databases. A set of XML files in C:\Program Files\Microsoft SQL Server\90\DTS\MappingFiles is used to map data types between databases. These files name a provider or set of providers that they apply to. The granularity is such that you can have a separate file to map types when using the ODP.NET data provider or MSDAORA provider. Files are provided for mapping among all the tested database combinations. An example would be OracleToMSSql.XML, but the files do not need to have special names. You can change the files if you have a set of preferred mappings or you can add additional files. For example, if I have a direct set of mappings from Teradata to other databases, I can simply code up analogous files and place them in the directory. They would then be used by the Import and Export Wizard when Teradata is selected as a data source or destination.
The data-type mapping files are only used for the Import and Export Wizard; other components of SSIS don't currently use them, although they may use them in future releases. Even though the file named SQLServerToSSIS sounds like it could be used for mapping data types in the data source, for example, the data sources use a hard-coded list of data-type equivalences that are documented in Integration Services Data Types in SQL Server Books Online.
Executing Commands with Execute SQL Task and OLE DB Command
You can issue SQL commands to databases by using either the Execute SQL task or OLE DB Command transformation data-flow component. The Execute SQL task is usually used for pre- or post-processing at the beginning or end of a data flow, but is not restricted to this function. A typical pre-processing task when using SQL Server in an SSIS package might consist of creating or dropping and recreating tables to load data into. The data load would use an associated Data Flow task. The Execute SQL task can use certain connection types: ADO, ADO.NET, EXCEL, ODBC, OLE DB and SQLMOBILE. Some components (ADO and ADO.NET sources) permit you to specify whether or not the command is a raw set of SQL statements or is a stored procedure. If you specify a stored procedure, it uses the ADO or ADO.NET API's CommandType.Procedure value; otherwise, it uses the CommandType.Text value. With SQL Server, using CommandType.Procedure submits the command as a (faster) RPC call, while specifying CommandType.Text uses a statement. This can be seen in SQL Profiler. When using OLE DB with the Execute SQL task, you can't specify whether you're using a stored procedure or SQL statement. However, you can use the ODBC call syntax, which (with SQL Server) uses RPC:
{ call ? = myproc(?,?) }
When using providers for Oracle and DB2, you must specify a stored procedure with parameters by using ODBC syntax.
The OLE DB Command transformation differs from the Execute SQL task in that it uses only OLE DB data sources. The OLE DB Command transformation also requires an SSIS input dataset; the statement that you specify will be executed once per row in the input dataset. As with the Execute SQL task, some providers do not permit parameterized statements, because there is no way for the provider to derive parameters from an SQL statement. A workaround for this limitation (that also works with the Execute SQL task) is to build the SQL statement dynamically using a property expression. The SqlCommand property of SQLCommand or the SQLStatementSource property of the Execute SQL task would then be set to the value of the property expression. You get the flexibility of a parameterized query, even if the data source does not support parameterization correctly. Because you are using the equivalent of dynamic SQL, the usual warnings about checking for potential SQL injection apply. Take care when using the OLE DB Command transformation; unless there is a special command that you need to perform to load the data, using the OLE DB destination will result in better performance.
Transformation Reference Tables
The lookup transformations (for example, the Lookup transformation) use reference tables to do value matching during the lookup process. When you use a table as a reference table, the entire table is cached in memory during the transformation by default. For large tables, you can specify partial caching or no caching. This is useful for large lookup tables when loading the entire table would consume excessive memory. The dialog box you use to control caching (Advanced tab in the Lookup Transformation Editor) is shown in Figure 3.
Figure 3. Dialog box that controls caching in Lookup Transformation Editor (Click on the picture for a larger image)
Partial caching requires that you specify a parameterized query so that SSIS can execute individual SQL statements to look up rows that match. Note that partial caching and no caching changes the case-sensitivity of the comparison to that of SQL. When the selected provider does not support deriving parameters for parameterized queries (for example, Oracle OLE DB providers, and the Microsoft OLE DB Provider for DB2), you cannot specify partial caching or no caching. In the Partial Caching dialog box of the Lookup component, the statement that appears in the Override SQL statement text box when this option is selected is SQL Server–specific. It can be edited manually for other providers. Some of the lookup transformations require a SQL Server 2005 instance that is used for work tables. No other database type can be substituted.
Additional Database Tools in BI Development Studio
SSIS contains a Query Designer tool. For this tool to be available, the provider must support the metadata that is used by the tool. Although the Query Designer is available with SQL Server and Oracle databases (only when the MSDAORA provider is used), most other OLE DB providers do not supply the needed metadata. The ability to check the syntax of a query without executing it (as is available by using the Parse Query button in the Execute SQL Task dialog box or the Verify SQL button in the Query Designer) is also provider-specific.
Table 2. User-interface features and supported OLE DB providers
Database product | Company | Result set metadata from stored procedures | Derive parameters for parameterized SQL statements | Partial caching with Lookup component | Can use Query Designer | Can use Verify SQL in Query Designer | Can use Parse Query in dialog boxes |
SQL Server | Microsoft | Y | Y | Y | Y | Y | Y |
Access | Microsoft | Y | Y | Y | Y | Y | Y |
DB2 | Microsoft | N | N | N | Y | N | Y |
DB2 | IBM | N | Y | Y | Y | N | Y |
Oracle | Microsoft | Y‡ | N | N | Y | N | Y |
Oracle | Oracle Corp | Y‡ | N | N | Y | N | Y |
‡Oracle returns results from stored procedures through a special type of output parameter, REFCURSOR. Oracle can return metadata for strongly typed REFCURSORs.
Development and Run Time Considerations for
64-bit Connectors
To understand connectivity on 64-bit computers, we need to distinguish between which SSIS-related components are shipped on Itanium (IA64) systems vs. x64-based systems. We also need to understand the difference between designing SSIS packages in BI Development Studio and running SSIS packages on the command line.
The SSIS command-line program that runs packages exists in x86, x64, and IA64 versions, and all SQL Server databases can be used to store and schedule SSIS packages. Neither ODBC nor Microsoft Office data (Office Excel, Office Access, Jet 4.0, ACE, and so on) is supported in 64-bit SSIS scripts. You can, however, run scripts that use these types of data by using the 32-bit command line program on any 64-bit server.
Almost all SSIS development and debugging will use the Business Intelligence Development Studio tool. BI Development Studio is a 32-bit only tool, although it provides the facility to debug and deploy packages that run in 64-bit mode. BI Development Studio is only supported on 32-bit systems and x64 computers in Windows-on-Windows (WOW) mode. BI Development Studio cannot be used on Itanium (IA64) systems, even in WOW mode.
When developing in BI Development Studio, 32-bit data providers are always used for graphic-user interface operations such as providing metadata for drop-down list boxes and using the Query Designer. Therefore, if 64-bit providers will be used in production packages, the equivalent 32-bit provider must be available at design time. Each SSIS project does have a "Run64BitRuntime" switch however, so that packages will use either 32- or 64-bit mode when executing in the designer. Using the 64-bit runtime is the default. This does require that BI Development Studio execute on an x64 server, however.
Generic Connectivity Issues
With most databases you can choose different providers and different provider types. A common design problem is deciding which provider to use. This section addresses connectivity at a high level; that is, independently of SSIS components and databases. Your choice of provider should be based on the richness of SSIS support for different provider types, data-type fidelity concerns, and provider support for the database metadata that SSIS and BI Development Studio need.
Data-Access APIs and SSIS
Data-access APIs can be divided into two types: data source–specific APIs and data source–neutral or generic APIs. A data-source–specific API is usually procedural-based, and is only useful with one data source. Examples of data source–specific APIs are OCI (Oracle Call Interface) and DBlib (database library used by Sybase, as well as by SQL Server 6.0 and earlier versions). Data source–neutral APIs are usually designed by a committee to be useful with multiple data sources. Database manufacturers write providers or drivers that hook their database into this type of API. In most cases, the driver is layer that calls down to the database-specific API (for example, the Oracle OLE DB provider calls OCI), but drivers can also be written to call directly to the underlying network protocol (for example, the OLE DB Provider for SQL Server). Examples of data source–neutral APIs are the following:
- OLE DB, a COM-based API
- ADO.NET, a .NET-based API
- ODBC, a procedural-based API
- JDBC, a Java-based API
Although SSIS components can be written to a database-specific API, most built-in SSIS connectivity works by using database-neutral APIs. Therefore, the quality of connectivity to different data sources is based on the richness of generic data-access support. Because SSIS is a component-based model it is possible to choose whether, for example, the ODBC driver or OLE DB provider for a given database is most compatible with SSIS and choose the SSIS connection manager appropriately. SSIS supports OLE DB, ADO.NET, and ODBC. There is no built-in support for JDBC as SSIS is not written in Java and is not JVM-aware. This means designers cannot work with JDBC-based data sources in SSIS.
SSIS has most support for OLE DB. There is a built-in OLE DB connection manager, and an OLE DB data-flow source and data-flow destination. In addition, there is an OLE DB–specific Data Flow transformation, the OLE DB Command transformation. The OLE DB and SQL Server data-flow destinations are the only destinations that support direct database population from SSIS data.
There are an Office Excel–specific data-flow source and data-flow destination. These components use the OLE DB Provider for Jet 4.0 to produce and consume data in SSIS. Analysis Services dimension and partition processing can use the Analysis Services 9.0 OLE DB Provider or connect to an Analysis Services project that is part of the same BI Development Studio solution. Data-mining APIs are also based on OLE DB.
ADO.NET support consists of an ADO.NET data-flow source. The ADO.NET destination produces an ADO.NET DataReader object; there is no direct support to insert data into the database using ADO.NET. ODBC support is provided by means of an ADO.NET-to-ODBC bridge provider, the ODBC. NET data provider. This means that ODBC support goes through the ADO.NET layer and has the same functionality as the ADO.NET components. To load data from an ADO.NET DataReader destination, you must write a custom component or use the SSIS Script component. This destination is most often used by external applications that want to source data from SSIS such as the Integration Services Data Processing Extension for Reporting Services.
ADO.NET drivers are known as data providers or managed providers. To be visible in the ADO.NET connection manager, the data provider must be ADO.NET 2.0 compliant. ADO.NET 1.0 and 1.1 did not provide a central place for the providers to be registered, and therefore do not appear in the list. ADO.NET 2.0 keeps its master list of data providers in the machine.config file, although providers are also allowed to be registered in program-specific configuration files. For more information, see the technical article Generic Coding with the ADO.NET 2.0 Base Classes and Factories on the Microsoft Developer Network (MSDN). Microsoft shipped four ADO.NET data providers with ADO.NET 2.0; these are all visible in the ADO.NET connection manager:
- SqlClient Data Provider
- OracleClient Data Provider
- Odbc Data Provider
- OleDb Data Provider
Because the OleDb Data Provider is an additional layer over OLE DB, using ADO.NET to talk to OLE DB providers in SSIS, you should always use an OLE DB connection manager instead.
Using the.NET Framework Data Provider for ODBC is the only way that SSIS supports ODBC input. In addition to the ODBC bridge ADO.NET provider, there is also an OLE DB provider for ODBC (COM component named MSDASQL). This was the ODBC connectivity mechanism used by Data Transformation Services (DTS) 2000. It is specifically not supported in SSIS. Not only does it not appear in the BI Development Studio list of OLE DB providers, but attempting to use DTS packages that are converted to SSIS and name this provider will result in errors.
Data Types
One of the most important issues in any ETL system is to maintain the fidelity and precision of incoming data. Although the ISO/ANSI SQL standard specifies a standard set of data types for relational databases to support, each database has its slightly different set of data types. In addition to the ISO/ANSI standard and database specific types, each API (OLE DB, ODBC, and ADO.NET) has its own set of standard data types. Instead of aligning itself with any of these standards, SSIS defines its own set of data types for use in the data flow. Each SSIS data type supports the equivalent of a NULL value. SSIS types are also used when SSIS saves information in files using its internal representation. These files are called raw files. For a list of SSIS internal data types, see Integration Services Data Types in SQL Server Books Online. That topic also contains a chart of mappings to databases with supported providers and databases, "Mapping Integration Services Data Types to Database Data Types." Be sure to reference the information in this section when planning which data types to use for database loading and extracting, and using database data with SSIS transformations. The Import and Export Wizard uses its own set of files to define mappings of data types between different databases.
One issue when using ADO.NET is that there is no ADO.NET data type corresponding to VARCHAR (non-Unicode character) in relational databases. Therefore, ADO.NET data sources output all character data as Unicode. This can cause problems with simple components that use ADO.NET sources and destinations that expect single-byte character types. You can work around this by using a Data Conversion transformation. When comparing string data, SSIS converts character data to Unicode using the code page reported for the data. Some data providers report the code page on a column, table, or database basis; some providers do not report the code page but allow specification as part of the connection information. If a provider does not report the code page or have a way to specify it at the provider level, you can choose the AlwaysUseDefaultCodePage property (which works in conjunction with the DefaultCodePage property) in the SSIS component if the component permits this specification. Some components attempt to use the default code page by default or deduce the code page from the component's Locale property.
SSIS will do as few data conversions as possible, because data conversation is expensive, especially with date and time data types. If there is a data type supported by the database that is "close enough" but not exactly the same as an SSIS data type, this can cause problems. An example of this problem is different data sources' representation and richness of support for DECIMAL data types, leading to loss in data fidelity. SSIS uses a built-in mapping for the supported databases and providers, as mentioned previously. For all other databases, SSIS attempts to use the metadata exposed through the database API. OLE DB has the richest API in this regard, and SSIS uses the OLE DB metadata table DBSCHEMA_PROVIDER_TYPES. With ODBC and ADO.NET, or when the OLE DB provider does not specify the standard metadata table, SSIS looks for well-known data types like INTEGER and VARCHAR. A series of data types introduced with SQL-1999, including time-duration data types and "locator" types are not directly supported by SSIS. If SSIS cannot deduce the format of these types, they can be mapped to binary or character by using a Data Conversion transformation component.
Data-Source Metadata
SSIS is heavily dependent on standard database metadata. In addition to connection information, which we'll mention next, SSIS components attempt to provide lists of databases (catalog, in ISO/ANSI syntax) for a database instance, lists of tables and views, and especially information about columns in tables as previously mentioned. Every database API has a standard way to expose this information to consumers. For example, OLE DB uses schema rowsets that are obtainable via a standard provider interface while ODBC uses a series of standard functions such as GetTableInfo. ADO.NET added the ability to expose data-source metadata in a provider-neutral manner in ADO.NET 2.0. ADO.NET data providers must be 2.0-compliant to be usable with SSIS. For more information about ADO.NET provider-neutral metadata, see Schemas in ADO.NET 2.0 on MSDN. Most SSIS components allow direct input of database metadata such as a table name and also permit database metadata as SSIS variables. In some cases, as with column information, the information supplied by the database can be manually overridden in the SSIS component.
Three special cases of metadata inference that are used by SSIS but not supported by all providers bear mention, because they can limit the SSIS data-flow designer's choices. One case is the ability of the provider to deduce parameter information for arbitrary parameterized queries (for example, "SELECT empno, name FROM emp WHERE empno = ?
"). This provider-specific ability is used in components such as the Lookup component. The second case is the ability of the provider to return metadata information about rowsets that will be returned after invoking a stored procedure. This would be used when extracting rowsets of information from a stored procedure in a data flow or the Execute SQL task. If providers do not supply a way to obtain metadata information in these two cases, an error is raised at design time. Currently there is no way to manually provide the missing information. A related problem with metadata inference in stored procedures occurs because SSIS prepares the statement without executing it to obtain metadata. A limitation of this is that metadata based on objects that are created during execution (for example, a temporary table that is created in a stored procedure) might not return metadata.
Database Connector-Specific Details
This section covers SSIS support for SQL Server and other popular databases, detailing idiosyncrasies of specific databases, specific providers, and specific SSIS components when appropriate. I'll also lightly touch upon SSIS support for common non-database sources, such as flat files and XML data. Only a subset of the providers mentioned is supported by the Microsoft SSIS team, and we'll start with those.
SSIS Supported Data Sources
Following is a list of the data sources that have been tested and are supported by Microsoft to work with SSIS. Only the Microsoft-supplied providers for Oracle and DB2 are supported directly; working through issues with SSIS using the Oracle and IBM providers may entail coordinating support between Microsoft and the database manufacturer. Other providers may work completely or partially with SSIS but are not specifically supported by Microsoft. There is no formal SSIS compliance test suite for provider manufacturers, but the manufacturers may certify that they have tested the providers with SSIS.
Supported OLE DB providers
- SQL Native Client (SQLNCLI)
- Microsoft OLE DB Provider for SQL Server (SQLOLEDB)
- Microsoft Jet 4.0 Data Provider
- Microsoft OLE DB Provider for Oracle (MSDAORA)
- Microsoft OLE DB Provider for DB2 (DB2OLEDB)
Supported ADO.NET data providers
- Microsoft SqlClient data provider 2.0
- Microsoft OracleClient data provider 2.0
- OleDb bridge data provider 2.0
- Odbc bridge data provider 2.0
- Microsoft SAP data provider (As of this writing, this provider is in beta and is not currently certified by SAP.)
Supported ODBC drivers (via the ADO.NET data provider for ODBC 2.0)
- None
SQL Server–Specific Built-In Support
Because SSIS is part of the SQL Server 2005 family, SQL Server, especially SQL Server 2005, has the richest and most flexible support in SSIS. A number of specialized SSIS components support only SQL Server data. These include the following:
- Bulk Insert task
- SQL Server destination
- Transfer tasks (databases, error messages, jobs, logons, master stored procedures, and SQL Server objects)
- Database Maintenance tasks (SQL Server 2005 only)
- SQL Server Mobile (SQL Server Compact Edition) destination
- Fast Load options in the OLE DB destination
No other data sources work directly or indirectly with these control flow tasks, Data Flow components, or fast load options.
The Bulk Insert task uses the Transact-SQL BULK INSERT statement for speed when loading large amounts of data. The SQL Server destination uses logic similar to but even more optimized than the logic used by the SQL Server built-in BULK INSERT command. Because the SQL Server destination uses shared memory for maximum loading speed, the package containing the SQL Server destination must execute on the same server as the database engine. Both the Bulk Insert task and the SQL Server destination can specify options that would be specified in the Transact-SQL BULK INSERT statement. Using the SQL Server destination is faster than either the Bulk Insert task or using the OLE DB destination with one of the fast load options. The fast load option of the OLE DB destination works by using optional parameters of SQL Server's BCP (bulk copy) utility; these optional parameters include:
- Keep identity values from the imported data file or use unique values assigned by SQL Server.
- Retain a null value during the bulk load operation.
- Check constraints on the target table or view during the bulk import operation.
- Acquire a table-level lock for the duration of the bulk load operation.
- Specify the number of rows in the batch and the commit size. The number of rows in a batch defaults to 100.
Database transfer tasks and database maintenance tasks are components that are specifically built in for use with SQL Server 2000 and SQL Server 2005. These components exist to simplify database migration and also to allow a workflow for SQL Server maintenance that can be extended through BI Development Studio and scheduled through the SSIS scheduler. Database maintenance tasks are used by the Database Maintenance Plan facility of SQL Server Management Studio (SSMS). In addition, SSMS contains a designer and a wizard for creating database maintenance plans that simply create SSIS packages. The database maintenance plan packages are stored in the SQL Server database and can be edited with BI Development Studio, as well as SSMS.
SQL Server Compact Edition is a different database than SQL Server, providing many but not all of the SQL Server features, but featuring a very small memory footprint. It is designed for use with mobile devices or as a personal data store on individual workstations. The SQL Server Mobile destination is written in ADO.NET using the SqlServerCE data provider and is specific to SQL Server Compact Edition.
DB2 Connectivity
There are many OLE DB providers for DB2 data. The ones that I've seen used and tested the most are the Microsoft OLE DB Provider for DB2 and IBM OLE DB Provider for DB2. The Microsoft OLE DB Provider for DB2 is part of the SQL Server 2005 Feature Pack and is downloadable here. This provider is approved for use with SSIS in SQL Server 2005 Enterprise Edition and Developer Edition. The Microsoft OLE DB Provider for DB2 is available for SSIS x86, x64, and IA64 versions. Installing the provider on an x64 computer installs both the provider for 32- and 64-bit versions, so that it can be used with SSIS in native 64-bit mode and Windows-on-Windows (WOW) mode. A recent update has been made to the provider to speed processing when doing loads to a DB2 database. It's preferred to use the Feature Pack provider rather than the OLE DB provider for DB2 that is part of Host Integration Services 2004 (HIS 2004), as the Feature Pack provider contains enhancements for faster data extraction and loading that are not yet available in the HIS 2004 provider. The Microsoft OLE DB Provider for DB2 from the Feature Pack is only useable with the Enterprise Edition of SQL Server Integration Services. Host Integration Services 2006 will contain a provider with these features. Attunity also produces an OLE DB provider for DB2 that has been tested with SSIS.
Some programmers use the DB2 ODBC driver, perhaps because ODBC drivers were directly supported by the SSIS predecessor, DTS. It's suggested that you convert those jobs to use an OLE DB provider, but as long as ODBC continues to work, DB2 ODBC will continue to be the API of choice for some customers. IBM has recently released, along with DB2/UDB version 9.0, an ADO.NET 2.0–compliant data provider. This provider works relatively well with SSIS, although it has no better stored procedure result set support than the DB2 ODBC driver. Compared to the Microsoft OLE DB DB2 provider, it is not as optimized for fast buffered output and the input of large amounts of DB2 data, so the OLE DB provider is still the provider of choice unless speed is not an issue.
The IBM OLE DB DB2 provider and Microsoft OLE DB DB2 provider cannot use parameterized stored procedures (for example, in the SSIS OLE DB source or Execute SQL task) that return rowsets, because SSIS has no way to derive the metadata for the output rowset by way of the DB2 APIs. When this kind of command is used, the OLE DB source cannot create the column metadata and, as a result, the data-flow components that follow the OLE DB source in the data flow have no column data available prior to execution time, and the execution of the data flow fails. Another interesting observation is that, while the DB2 database's dialect of SQL ordinarily requires the standard trailing semicolon on SQL statements, SSIS Execute SQL task adds a semicolon. You should leave the semicolon out in the designer, so that the statement will execute without errors.
Oracle Connectivity
As with DB2 connectivity, there are a quite a few different choices for Oracle connectivity. Each has its strengths and weaknesses. A Microsoft-supported OLE DB provider for Oracle (MSDAORA) has been available since Oracle 7.4. This provider is relatively fast and works with any version of Oracle from 7.4 to 10gR2, but has not been enhanced since Oracle 8. Therefore, it does not support any of the Oracle data types introduced since version 8, including CLOB, BLOB, BFILE, and UROWID. An attempt to use tables or views that contain these data types will fail. Another limitation of this provider is that it exists in a 32-bit version only; no 64-bit version of MSDAORA exists.
Although Microsoft supports MSDAORA, the Oracle provider for OLE DB (OraOLEDB.Oracle.1) is the one suggested for best results. This provider has been tested by the SSIS team, although it is not as well stress-tested as MSDAORA. OraOLEDB.Oracle.1 is available in 32-bit and 64-bit versions, however, and at this time is the only released 64-bit provider for Oracle. Although the OraOLEDB.Oracle.1 provider supports the new Oracle data types, neither this provider nor MSDAORA supports executing stored procedures with parameters that provide result sets. SSIS is unable to use the metadata APIs against the Oracle database.
There were some problems using the Oracle 64-bit OLE DB provider when this provider was released because the Oracle client would not work with programs in directories that contained parentheses. To distinguish between 32- and 64-bit versions when SSIS is installed on a 64-bit computer, it uses directories that contain "(x64)" and "(x86)" as part of the directory name. These directories are used by the Windows file redirection service to accommodate separate 32-bit and 64-bit versions of applications. Some programmers were able to work around this problem by copying the SSIS executables to a directory without parentheses and changing the SSIS jobs to use this alternate location. Oracle Corporation currently has an updated provider that fixes this problem.
Just as SQL Server has a bulk insert command and a bulk copy utility to perform fast loading based on the physical structure of the database, Oracle has a high performance loader. Though an equivalent of the Bulk Insert task does not exist for the Oracle loader utility, a third party component by ETI (http://www.eti.com) is available that encapsulates this functionality. This is a good alternative to using the OLE DB destination if loading data into Oracle using SSIS is desired.
In addition to OLE DB providers for Oracle, at least two ADO.NET data providers for Oracle exist, and are useable with SSIS. The Microsoft .NET Data Provider is part of the .NET base-class libraries; it lives in System.Data.OracleClient. Oracle has a .NET 2.0–compliant provider, ODP.NET. Both can support the latest Oracle data types, and ODP.NET has some Oracle-specific optimizations, such as the ability to use multiple parameter sets. .NET data providers work with 64-bit versions of .NET 2.0, as well as with 32-bit versions. ODBC drivers for Oracle are available, but not commonly used with SSIS.
Problems have been reported when using SSIS and the Oracle NUMBER data type. In versions of the Oracle database that do not support a true INTEGER data type, specifying the INTEGER data type results in a column specification of NUMBER(38). Unfortunately, with providers other than MSDAORA, these columns are represented as NUMBER (DECIMAL) without a precision and scale specification, and must be manually mapped to the SSIS data type DT_I4. It has also been reported that using the System.Data.OracleClient ADO.NET data provider with SSIS truncates all NUMBER columns, losing decimal places. Using the Oracle ODBC provider with ADO.NET and SSIS translates all NUMBER columns to FLOAT, an imprecise numeric data type. When using the Oracle or the Microsoft OLE DB provider, you must manually map all NUMBER columns that contain decimal places to the correct known precision and scale.
Because Oracle metadata APIs list system tables along with user tables, the drop-down list of tables in SSIS components when using Oracle is sometimes unreasonably long; there is no way to filter or order this list. Care should be taken when using table names with Oracle destinations. The OCI API capitalizes all SQL object names like tables whenever a statement is submitted (this happens regardless of the tool used) unless the name is surrounded by double quotes. Using double quotes is the only way to create case-sensitive names. SSIS-generated CREATE TABLE statements for new tables use SQL Server conventions, so be sure to change the table name to all capital letters. Attempting to specify case-sensitive double-quotes names in the Execute SQL task will result in an error during design time with some databases (such as Oracle) unless the DelayValidation property is set to False.
Single Oracle SQL statements do not require a semicolon in SSIS (or, for that matter, in SQL*Plus). Including a semicolon will cause a statement error. A PL/SQL batch can be submitted for execution in an Oracle database by using an Oracle anonymous block. This PL/SQL syntax requires trailing semicolons in SSIS, as it does in SQL*Plus. You cannot execute multiple batches, either in SQL Server using the GO keyword or in Oracle with the forward slash (/) as you can in the built-in utilities for these databases; only single batches are permitted.
Connectivity to Office Excel, Office Access, and File-Based Databases
Office Access and Office Excel are often used to store data that is either loaded into or extracted from a SQL Server database. SSIS contains an Office Excel–specific connection manager, source, and destination components, making it particularly easy to set up and configure. The custom components for Office Excel work only with Office Excel 2003 and earlier versions and these versions of Office Excel are also directly supported in SSIS through the OLE DB Provider for Jet 4.0. When using Office Excel as a data source, the data types are deduced from the values in the first eight rows by default, although the data types are configurable in SSIS. An Office Excel connection string parameter which is very useful when using Office Excel with SSIS is IMEX=1, which indicates that mixed numeric and text data in the same column should be treated as text.
There is not a custom connection manager or custom components for Office Access. To use Office Access, you need to use the OLE DB Source and specify the Jet 4.0 provider. Office Access uses the OLE DB data source and destination. The OLE DB Provider for Jet is only available in a 32-bit version. In order to use Jet on 64-bit hardware, you must use the 32-bit version of SSIS runtime. This is easily configured by changing the directory to which the SSIS jobs point to the 32-bit directory. dBase files work for SSIS and the Jet 4.0 provider; information on enabling the configuration of dBase-based data sources is available in SQL Server Books Online. For Visual FoxPro databases with later versions of FoxPro, using the FoxPro-specific OLE DB provider is required because the Jet 4.0 provider does not support the newer FoxPro database file format.
Microsoft Office 2007 files have a new format and are not supported by the Jet 4.0 OLE DB provider. Specifically, the Office Excel data source and data destination cannot be used with Office Excel 2007 files. Microsoft Office 2007 shipped with a new OLE DB provider, the Office 12 Microsoft Access Engine OLE DB Provider (ACE). The ACE provider must be used with Office Excel 2007 and Office Access 2007 data, but ACE supports only Microsoft Office 2007 data. SSIS has not been specifically tested with other data that is compatible with ACE, such as Office SharePoint data.
Connectivity and Other Databases
Information on connectivity to untested databases is mostly available through newsgroups and customer feedback. The information is sketchy and may be a little skewed because information consists mostly of problem-based questions and third-party software documentation. DTS supported other databases through either OLE DB or ODBC. In SSIS, OLE DB is the API of choice and ODBC should be considered a backup choice.
Some larger shops are either converting from Teradata databases or using SQL Server as an adjunct to Teradata for data warehouse data. Teradata provides an OLE DB provider in 32-bit and 64-bit versions, as well as a newer ADO.NET data provider. Using the Teradata providers permits integration with SSIS by using tables and views, but does not permit using commands. In order to use commands with Teradata, some customers have resorted to defining a linked server to Teradata using SQL Server 2005 linked servers and then issuing commands through the SQL Server provider against the linked server. A high-performance data source for Teradata is available through an SSIS partner, ETI Software. The ADO.NET provider has not yet been tested with SSIS.
There have been reported successes using the Sybase Adaptive Server database with the Sybase OLE DB provider and obtaining data from mySQL databases into SSIS using the mySQL ODBC driver. OLE DB providers for mySQL exist, but do not provide the correct metadata for SSIS. There have been a few reported problems obtaining column rowset metadata from the Sybase SQL Anywhere provider.
Third-party vendors provide diverse data providers and other connectivity options for SSIS. Attunity provides a set of OLE DB providers specifically tested with SSIS for relational or non-relational databases. ETI provides a "build-to-order" service with many data sources available. DataDirect Technologies is developing a set of 64-bit OLE DB providers for Oracle, DB2, and Sybase data sources, specifically for use with SSIS. This series of providers will be available early in 2007.
Non-Database Data Connectors
SSIS data providers are not limited to relational databases. Data connectors can be built over any data source that can expose data as rows and columns. In this section, I'll look at data connectors for non-RDBMS sources.
Application System Connectors
Two ADO.NET data providers for SAP are available. The Microsoft .NET Data Provider for mySAP Business Suite is available as a technology preview from Microsoft. This provider has been tested and is supported for use with SSIS, but has not yet been certified by Software AG for use with SAP. The provider uses the standard RFC SAP interface and supports executing Remote-Function Modules (RFMs) and Business APIs (BAPIs) in SAP. A provider-specific subset of SQL that supports both SQL statements and executing RFMs and BAPIs as stored procedures is supported. The provider features a column and table name "friendly mode" that allows mapping SAP standard names to friendly names with an XML configuration file. Another data provider for SAP is the Xtract IS component, which is available from Theobald Software. The vendor has tested the provider with SSIS and supports it for that use. This product supports additional functionality, such as integration with cube-processing transformations in SSIS.
The Flat-File Provider
SSIS provides rich support for flat files in a variety of formats. SSIS does not use the OLE DB Jet 4.0 provider's flat-file support; the Flat File engine is built from scratch and is richer than the Jet provider support. Because all databases include a tool or utility to dump the database data to a flat file (for example, the bcp utility in SQL Server and Sybase), a flat file can be used as a data source if direct connectivity to the database is not possible or is not desirable because of database contention considerations. Files can even be processed in groups directly from the file system by using the File System tasks and the Foreach Loop container. After flat files have been retrieved and are available for SSIS, you can use a built-in Flat File source and Flat File destination. These are used along with the Flat File and Multiple Flat Files (MULTIFLATFILE) connection managers. This connection manager provides support for the three most commonly used types of flat files: delimited files (sometimes referred to as comma-separated value files), fixed width files, and ragged-right style files. A variety of options are available, including the specification or column and row delimiters, code page specification, skipping header rows, and files with the column names in the first row. The Flat File data source can read a set of sample rows at the beginning of the file and suggest data types. The default data type is a character type with a length of 50 characters.
SSIS Raw Files
The SSIS native data format can be saved to disk. This format is known as a raw file. A data source and destination for raw files are built in. If you are going to store data from SSIS to use as input to SSIS later on (say, by sending it to a business partner who also uses SSIS), the raw file is the best, lightest in size, and most performant choice, as there is no need for any translation or other processing. The only cost is I/O cost alone. This has significant savings over other file formats.
XML Data Support
Because XML is supported by almost every computing platform and directly supported by many databases, including SQL Server 2005 and SQL Server 2000, XML is a good fit for a data integration program. Some XML-based databases even use XML as the database storage format. XML divides data into elements and attributes and provides support for multiple rowsets arranged hierarchically. The main problem with using XML is that, because of the repeating tags, it's similar to a flat file with a column header row included with every data row; this tends to make XML a "heavy" data format.
SSIS comes with tasks to enable the processing of XML data (the XML task) and obtaining data using Web services (the Web Services task). The XML task can perform a variety of validations, queries, and transformations on the XML data. The transformations are Diff, Merge, and Patch (against a pair of XML files), XML Schema validation, XPath queries, and XSLT transformations. Diff, Merge, and Patch are based on the XMLDiff utility available on the Web and supported by the Microsoft XML team. The remaining operations are provided by the .NET 2.0 XML library System.Xml. In addition to XML transformations, SSIS native expressions can be applied to a selection of XML nodes. The Web Services task uses standard WSDL (Web Service Description Language) to determine the data types used and also to obtain a list of callable Web service methods. No facility is supplied to allow specific credentials to be passed to the Web service, however.
Although there is no XML connection manager as such, there is a built-in XML source. The XML source requires an XML Schema to supply the data types of the XML nodes, although an XML Schema can be implied by SSIS at design time by using the schema inference tool supported by the Microsoft XML team. This tool does not use the same type of schema inference that the ADO.NET DataSet uses, although a DataSet-generated XML Schema can be supplied. You can decompose the XML document into multiple rowsets by using the XML source. These rowsets can be used individually in successive SSIS data-flow steps. Additional key columns are provided on each rowset to maintain integrity, so that the value of the hierarchical structure is not lost.
Extensibility
A powerful feature of SSIS is that, if the component or data source that you need to use is not built in to the product, there are standard interfaces available that allow you to hook into SSIS. The two main ways to extend the product are by building custom components and by using the built-in Script component and writing customized code in scripts.
Accessing Databases with Custom Components and with the Script Component
You can build custom SSIS components that use your data provider, such as a source, destination, transformation, task, or connection manager. The SQL Server 2005 samples include a sample destination that writes to an ADO.NET DataSet object. Examples of commercially available custom data components include ETI's Bulk Extract and Bulk Load components and Persistent's bulk Data Flow component. Attunity has a Data Change component that allows sourcing only the changed data to speed ETL operations such as loading into data warehouses.
You can also populate a database with a Script-based destination. This is particularly handy when your end product is an ADO Recordset or an ADO.NET DataReader destination. Although you cannot populate or update the database from either of these destinations, you can use Visual Basic.NET script with a DataReader as input to populate the database using ADO.NET. An example ADO.NET Script destination is published in the Creating a Destination with the Script Component topic in SQL Server 2005 Books Online. You can also use a Visual Basic.NET Script destination component to populate a database using an ADO Recordset.
Another use of the Script component can be to write code that provides the equivalent functionality of partial caching in the Lookup transformation. This transformation would accomplish the lookup one row at a time, "by hand" using custom Visual Basic.NET script code. By doing the lookups one at a time, only the rows that are needed are fetched from the database. This is usually only required when the lookup tables are very large, the machine on which SSIS is running is memory-constrained, and the OLE DB provider being used does not support the necessary metadata for the Lookup's partial caching option.
Using the Script component has some advantages over using the provided components or building a custom component, such as ease of use and rapid development. But it's also worth mentioning that the Script component has some disadvantages. Reusability is one disadvantage, as the code needs to be copied in every Script component. In SQL Server 2005 SSIS, you are also limited to using Visual Basic.NET only. Visual C# scripting is slated for a future version of VSA Script called VSTA. You must also be concerned about intellectual property rights when shipping solutions that contain source code in script form. For more information on creating SSIS Sources and Destinations using Script, see the book Scripting SQL Server 2005 Integration Services by Donald Farmer.
Conclusion
SQL Server Integration Services supports many heterogeneous data sources, including generic provider-based interfaces. The richness of the functionality exposed depends on the database, provider, problem space, and data types used. Often there is more than one way of approaching data extraction or data loading when using Integration Services. In addition to the provider-based functionality that is built in to the system and the Business Intelligence Development Studio design tool, you can use flat files, XML, custom scripting components, or even SQL Server linked servers to extract and load data from heterogeneous data sources. Integration Services has the best and richest built-in support for SQL Server 2005, but plenty of room exists for third parties and programmers to extend the model to their data sources of choice. Third-party integration components are already available and the number of extensions provided by the Integration Services community is growing continuously.
For more information:
https://msdn2.microsoft.com/en-us/sql/aa336312.aspx
Appendix: Data-Connectivity Providers and SSIS
Note The following is not a definitive list of all data providers, and not all data providers on this list have been tested by Microsoft with SSIS. Only current Beta or shipping products are listed. Information about third-party products was provided by the product vendors and could not be independently verified.
Data source |
Released by |
Data-access API |
Supported database versions |
Vendor support with SSIS? |
64-bit support? |
SQL Server | Microsoft | OLE DB | 2000 and later versions |
Y | x64, IA64 |
SQL Server | Microsoft | ADO.NET | 2000 and later versions |
Y | x64, IA64 |
SQL Server | Microsoft | ODBC | 2000 and later versions |
Y | x64, IA64 |
SQL Server | Attunity | OLE DB | 2000 and later versions |
Y | |
SQL Server | DataDirect | OLE DB | 2000 and later versions |
Y | |
DB2 | Microsoft | OLE DB | All DRDA-compliant versions | Y | x64, IA64 |
DB2 | IBM | OLE DB | z/OS and UDB 7.0 and later, AIX 5.0 and later, AS400 4.5 and later versions |
Y | |
DB2 | IBM | ADO.NET | DB2 UDB 9.0 | N | |
DB2 | IBM | ODBC | z/OS and UDB 7.0 and later versions AIX 5.0 and later |
N | N |
DB2 | Attunity | OLE DB | 6.1, 7.x, 8.0 on z/OS 7.x, 8.0 on UNIX 7.x, 8.0 on Windows |
Y | |
DB2/400 | Attunity | OLE DB | On AS/400 5.1 and later versions |
Y | |
DB2 | DataDirect | OLE DB | z/OS and UDB 7.0 and later versions AIX 5.x AS400 4.5 and later |
N | |
DB2 | HIT | OLE DB | z/OS and UDB 8.0 and later versions AIX 5.x, AS400 4.5 |
N | |
DB2 | DataDirect | ADO.NET | z/OS and UDB 7.0 and later AIX 5.x, AS400 4.5 |
Y | |
DB2 | ETI | Bulk Load | 8.0 and later versions |
Y | |
DB2¹ | Persistent | Data Flow Component
Bulk Write & Bulk Read |
- | N | |
Oracle | Microsoft | OLE DB | 7.3.4 and later versions² |
Y | N |
Oracle | Microsoft | ADO.NET | 8.0 and later versions |
Y | x64, IA64 |
Oracle | Oracle Corp | OLE DB | 8i and later versions | Y | x64, IA64 |
Oracle | Oracle Corp | ADO.NET | 8i and later versions | Y | x64, IA64 |
Oracle | Oracle Corp | ODBC | 8i and later versions | N | |
Oracle | Microsoft | ODBC | 8i and later versions | N | |
Oracle | Attunity | OLE DB | 9i and later versions | Y | |
Oracle | DataDirect | OLE DB | 8i and later versions | Y | |
Oracle | DataDirect | ADO.NET | 8i and later versions | Y | |
Oracle | ETI | Bulk Load | 9.0 and later versions |
Y | |
Oracle | Persistent | Data Flow Component
Bulk Write |
8i and later versions | N | |
SAP¹ | Microsoft | ADO.NET | R/3 4.6C and later versions |
Y | |
SAP | Theobald | OLE DB | R/3 | Y | |
Office Access | Microsoft | OLE DB | 2003 and earlier versions |
Y | |
Office Excel | Microsoft | OLE DB | 2003 and earlier versions |
Y | |
Office 2007 | Microsoft | OLE DB | 2007 | N | |
Sybase | Sybase | OLE DB | 11.5 and later versions |
N | |
Sybase | Sybase | ADO.NET | 11.5 and later versions |
N | |
Sybase | Attunity | OLE DB | 12.0 and later versions |
Y | |
Sybase | DataDirect | OLE DB | 11.5 and later versions |
Y | |
Sybase | DataDirect | ADO.NET | 11.5 and later versions |
Y | |
Informix | IBM | OLE DB | 7.3 and later versions |
N | |
Informix | Attunity | OLE DB | 7.31, 9.x, 10 | Y | |
Informix¹ | Persistent | Data Flow Component
Bulk Write & Bulk Read |
- | N | |
Teradata | Teradata | OLE DB | 2.6 and later versions |
N | |
Teradata | Teradata | ADO.NET | 2.6 and later versions |
N | |
Teradata | ETI | Bulk Load | 2.5 and later versions |
Y | |
Teradata | ETI | Bulk Extract | 2.5 and later versions |
Y | |
FoxPro | Microsoft | OLE DB | 8.0 and later versions |
N | |
File DBs | Microsoft | OLE DB | Any Jet 4.0– compatible version | N | |
Adabas | Attunity | OLE DB | 6.2.2 to 7.4.x on z/OS 3.3 to 5.1 on Open |
Y | |
CISAM | Attunity | OLE DB | On UNIX | Y | |
DISAM | Attunity | OLE DB | On UNIX, Linux, and Windows |
Y | |
Ingres II | Attunity | OLE DB | 2 to 2.56 | Y | |
Oracle Rdb | Attunity | OLE DB | 7.1.x, on OpenVMS Alpha and Integrity (Itanium) |
Y | Y (HP Integrity) |
RMS | Attunity | OLE DB | On OpenVMS Alpha and Integrity (Itanium) |
Y | Y (HP Integrity) |
Enscribe | Attunity | OLE DB | On HP NonStop G- Series and H-Series |
Y | Y (HP Integrity) |
SQL/MP | Attunity | OLE DB | On HP NonStop G- Series and H-Series |
Y | Y (HP Integrity) |
IMS/DB | Attunity | OLE DB | 6.1 and later versions |
Y | |
VSAM | Attunity | OLE DB | On z/OS 1.1 and later versions |
Y | |
LDAP | Persistent | ODBC | All LDAP-compliant servers |
N | Y |
¹This product is in beta.
²Although this provider can connect to and use versions of Oracle up to and including Oracle 10gR2, it does not support database constructs (such as BLOB/CLOB data types) introduced after Oracle 8.0).