Share via


1.3 Overview

This document describes the persistence file structure for Microsoft SQL Server Integration Services (SSIS). Integration Services is used to orchestrate activities to move and conform data from disparate sources in an enterprise into a convenient location and shape, often for analysis and reporting. The DTSX structure describes these orchestration activities as a pathway that is comprised of a set of steps, each of which can involve processing of the data as it flows through the step. Steps can be conditional or iterative, and not all of the steps that are described in this document are necessarily executed on any one passing of data through the defined steps.

The set of steps starts with a reference to a source from which to import data to begin the set of operations that the data will be passed through. The DTSX structure then allows the user to specify that the data is to pass through an executable that performs a processing task upon the data, or that the data is to pass through a pipeline component that modifies and/or transforms the data. The structure also allows the ability to define a pathway for the data with conditional elements that can change the data flow pathways or operate iteratively in a loop.

Finally, the structure allows the ability to define a final destination for the post-processed data flow. A file that conforms to the DTSX structure as defined in this document can then be processed by the data transformation processing engine, or it can be opened in a UI to make modifications to the definition.

The following diagram shows one typical use of Integration Services: to consolidate a data flow from multiple sources, and while consolidating, also transform and assure consistency of all the data.

Integration Services package data flow

Figure 1: Integration Services package data flow

The intermediate step in the data flow can be one of the following:

  • Executable Task: Performs a defined type of work along the data flow. Custom tasks can be written by end users to perform any task that is not included with the Integration Services product. Examples of standard tasks that ship with Integration Services are tasks that manipulate an external file, process Microsoft SQL Server Analysis Services objects, run a data mining query, perform file transfers with FTP, perform a database backup, and execute an external executable.

  • Component: Components perform data transformations along the data flow. Examples of components that ship with Integration Services are components that perform aggregation; create columns derived by a specified formula; and perform row counts, sampling, lookups, and many other tasks. End users can also create custom components to perform additional transformations that are not included with the standard product.

The following diagram is a detailed example of a data flow described by the DTSX structure format. The diagram shows the presence of two data sources (an OLE DB Source and a Flat File Source) and three final data destinations (a Raw File Destination, an OLE DB destination, and a Microsoft SQL Server Destination). The steps in the data flow, based on conditional processing, determine which destination the data ultimately flows to and how it is transformed and processed on the route to its final destination.

Some of the intermediate steps are conditional. For example, in the diagram there are two arrows between Lookup and Fuzzy Lookup. One arrow represents the data flow for erroneous data, and the other represents the flow for standard output.

pictafe6ea1a-0900-4ece-b77c-e6c9ddc02074

Figure 2: Data flow described by DTSX structure

Differences between DTSX and DTSX 2.0

DTSX version 2.0 (DTSX 2.0) is an enhanced refactoring of the XML that is documented in [MS-DTSX]. Both [MS-DTSX] and [MS-DTSX2] describe the package files that are used by Integration Services. DTSX 2.0 is used for package files starting with Microsoft SQL Server 2012.

DTSX 2.0 has the following XML refactoring changes from DTSX:

  • DTSX has Property elements with a Name attribute to specify the property name. In DTSX 2.0, named properties are XML attributes of the parent element. However, a small number of named Property elements remain in DTSX 2.0.

  • Some new attributes have been added to DTSX 2.0.

  • To organize the larger number of attributes in DTSX 2.0, AttributeGroup groups of attributes have been defined, and are referenced by the complex type definitions in the XSD of DTSX 2.0.

  • DTSX 2.0 specifies defaults for most attribute values, and sets the XSD usage attribute to "optional" for attributes with defaults. In DTSX, all property elements are always present, even if the element value is equal to the default value.

  • In places where repeated element instances were allowed in DTSX, these repeated elements have been placed inside a parent element to hold the collection in DTSX 2.0, in most cases. For example, if multiple Executable elements appear within a package file, the Executable elements are contained in a parent Executables element in DTSX 2.0.