Pivot Transformation
The Pivot transformation makes a normalized data set into a less normalized but more compact version by pivoting the input data on a column value. For example, a normalized Orders data set that lists customer name, product, and quantity purchased typically has multiple rows for any customer who purchased multiple products, with each row for that customer showing order details for a different product. By pivoting the data set on the product column, the Pivot transformation can output a data set with a single row per customer. That single row lists all the purchases by the customer, with the product names shown as column names, and the quantity shown as a value in the product column. Because not every customer purchases every product, many columns may contain null values.
When a dataset is pivoted, input columns perform different roles in the pivoting process. A column can participate in the following ways:
The column is passed through unchanged to the output. Because many input rows can result only in one output row, the transformation copies only the first input value for the column.
The column acts as the key or part of the key that identifies a set of records.
The column defines the pivot. The values in this column are associated with columns in the pivoted dataset.
The column contains values that are placed in the columns that the pivot creates.
The following diagram shows a data set before the data is pivoted on the Product column.
The following diagram shows a data set after the data has been pivoted on the Product column.
To pivot data efficiently, which means creating as few records in the output dataset as possible, the input data must be sorted on the pivot column. If the data is not sorted, the Pivot transformation might generate multiple records for each value in the set key, which is the column that defines set membership. For example, if the dataset is pivoted on a Name column but the names are not sorted, the output dataset could have more than one row for each customer, because a pivot occurs every time that the value in Name changes.
The input data might contain duplicate rows, which will cause the Pivot transformation to fail. "Duplicate rows" means rows that have the same values in the set key columns and the pivot columns. For example, if you use the data set before the data is pivoted on the Product column, as shown in the diagram, and add a row with Kate in the Cust column and Soda in the Product column, these duplicates values would cause the Pivot transformation to fail, regardless of the quantity in the Qty column. To avoid failure, you can either configure the transformation to redirect error rows to an error output or you can pre-aggregate values to ensure there are no duplicate rows. For example, in the sample data set, you could sum the values in the Qty column by customer and product.
The Pivot transformation uses the properties on its input and output columns to define the pivot operation.
The Pivot transformation includes the PivotKeyValue custom property. This property can be updated by a property expression when the package is loaded. For more information, see Integration Services Expression Reference, Using Property Expressions in Packages, and Transformation Custom Properties.
This transformation has one input, one regular output, and one error output.
Configuring the Sample Dataset
The sample dataset shown in the diagram was configured as follows: the PivotUsage property of the Cust column was set to 1, to indicate that it is a set key column; the PivotUsage property of the Product input column was set to 2, to indicate that a column must be created for each product; the PivotUsage property of the Qty input column was set to 3, to indicate that quantity values are placed into the pivot column.
The transformation output was configured to include six columns. The columns, which can be added by using the Advanced Editor dialog box, were named Cust, Ham, Soda, Milk, Beer, and Chips. The PivotKeyValue property of the Ham column was set to Ham, to indicate that the transformation should look for that value in the input column. Similarly, the PivotKeyValue property of the Soda column was set to Soda, and so on.
Columns in the transformation input were then mapped to columns in the output.
The SourceColumn property of the Cust column was configured to use the lineage identifier of the Cust input column. The SourceColumn properties of the Ham, Soda, Milk, Beer, and Chips columns were configured to use the lineage identifier of the Qty input column. Another way to configure this would be to set the SourceColumn property of the Ham, Soda, Milk, Beer, and Chips columns to -1, which would insert the value True instead of the data value. For example, instead of the values 12 and 24, the Beer column would then contain the value True, to indicate only that the customer had purchased the product, instead of showing the quantity purchased.
The rows in the transformation output contain the values from the Cust and Qty input columns.
Pivot Options
You set the PivotUsage property of the input columns to specify the role each column performs in the pivoting process. The valid values of PivotUsage are 0, 1, 2, and 3.
The following table describes the PivotUsage options.
Option |
Description |
---|---|
0 |
The column is not pivoted, and the column values are passed through to the transformation output. |
1 |
The column is part of the set key that identifies one or more rows as part of one set. All input rows with the same set key are combined into one output row. |
2 |
The column is a pivot column. At least one column is created from each column value. |
3 |
The values from this column are placed in columns that are created as a result of the pivot. |
Configuring the Pivot Transformation
You can set properties through SSIS Designer or programmatically.
For more information about the properties that you can set in the Advanced Editor dialog box or programmatically, click one of the following topics:
For more information about how to set the properties, see How to: Set the Properties of a Data Flow Component.
|