Mapping individual array values to new values based on a lookup table
In Azure Data Factory, how can I perform a lookup on each item in a string array or line of delimited text to map each item to a new value? I have a set of codes in one column. I would like to map these codes to their descriptions and have the descriptions end up in another column in the same format. I have a dataset that contains rows with each code and description on separate lines. How can I use the dataset to map each code to its description?
e.g.["SG", "DB", "TP"] map to ["single", "double", "triple"]or
"SG"~"DB"~"TP" map to "single"~"double"~"triple"
My lookup table:
Code | Description |
---|---|
SG | single |
DB | double |
TP | triple |
Azure Data Factory
-
phemanth • 13,070 Reputation points • Microsoft Vendor
2024-06-06T05:16:22.38+00:00 @David Lang Welcome to Microsoft Q&A platform and thanks for posting your question.
Here's how you can perform a lookup on each item in a string array or line of delimited text in Azure Data Factory to map them to descriptions:
Method 1: Using ForEach Activity and Lookup Activity
ForEach Activity:
Add a ForEach Activity to your pipeline.
Set the Items property to the expression that represents your string array or delimited text.
- For an array:
@activity('PreviousActivity').output.value
(assuming the previous activity outputs the array). - For delimited text: Use a split function like
split(activity('PreviousActivity').output.value, '~')
to split the line by your delimiter (~
in this case).
Lookup Activity (Inside ForEach):
Inside the ForEach loop, add a Lookup Activity.
Set the Source property to your dataset containing codes and descriptions.
Configure the Lookup Activity to perform a lookup on the current item from the ForEach loop (code in this case). You can achieve this in two ways:
- Lookup Key: Set the lookup key to the current item from the ForEach loop. Use the expression
@item
inside the ForEach loop. - Filter Condition: Alternatively, use a filter condition like
code eq @item
to filter the lookup table based on the current item.
Set the Select property to choose the desired output column (Description in this case).
Set Variable Activity (Optional):
- You can add a Set Variable Activity after the ForEach loop to capture the final output array containing descriptions.
- Set the variable value to an expression like
concat(arrayVar, item('LookupActivity').output.value)
. Here,arrayVar
is a variable you define to store the descriptions andconcat
function combines them iteratively.
Method 2: Using Data Flow with Split and Lookup
Data Flow:
- Use a data flow activity in your pipeline.
Split Transformation:
- Add a Split transformation to the data flow.
- Use the split function like
split(source_column, '~')
to split the input string array or delimited text into individual codes (source_column being your original column).
Lookup Transformation:
Add a Lookup transformation after the split.
Set the Source property to your dataset containing codes and descriptions.
Configure the lookup similar to method 1:
- Use the code from the split output as the lookup key.
- Set the Select property to choose the Description column.
Select Transformation (Optional):
- You can add a Select transformation to rename or format the output column containing descriptions.
Output:
Both methods will achieve the desired outcome. The first method uses a loop and a separate lookup activity for each item, while the second method leverages data flow transformations for a potentially more efficient approach, especially for larger datasets.
Hope this helps. Do let us know if you any further queries.
- For an array:
-
phemanth • 13,070 Reputation points • Microsoft Vendor
2024-06-07T14:52:16.99+00:00 @David Lang We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
-
David Lang • 40 Reputation points
2024-06-09T19:49:22.0833333+00:00 xxxxxxxxxx
-
David Lang • 40 Reputation points
2024-06-09T19:50:09.58+00:00 Hi @phemanth
My entire ETL is in a dataflow so I'd like to use method #2. The problem is that I don't know how many codes will be in the original code column. I can make method #2 work if I manually enter the array index of the code array I create but is there some way to use method #2 without having to know the number of codes? I need an ADF function that will iterate over a string array regardless of the size of the array.
-
phemanth • 13,070 Reputation points • Microsoft Vendor
2024-06-10T07:29:08.7+00:00 @David Lang Certainly, you can achieve code-to-description mapping in ADF using Data Flow even without knowing the upfront size of the string array (Method 2). Here's how you can modify the approach to handle dynamic arrays:
1. Split Transformation:
- Maintain the Split transformation as described earlier. Use a split function like
split(source_column, '~')
to divide the input string array or delimited text into rows for each code (source_column being your original column).
2. Add an Expand Transformation:
- After the Split transformation, introduce an Expand transformation.
- Set the
Expansion Operation
property toSplit
. - In the
Source Expression
field, provide the output column from the Split transformation (containing individual codes). This essentially expands each row from the Split into separate rows, one for each code.
3. Lookup Transformation:
- Keep the Lookup transformation as mentioned before.
- Set the
Source
property to your dataset containing codes and descriptions. - Configure the lookup key using the expanded code from the Expand transformation.
4. Select Transformation (Optional):
- You can include a Select transformation after the Lookup to rename or format the output column containing descriptions.
Benefits:
- This approach remains within the Data Flow paradigm, aligning with your preference for method #2.
- It handles dynamic code arrays efficiently without requiring manual intervention for array indexing.
- Maintain the Split transformation as described earlier. Use a split function like
-
David Lang • 40 Reputation points
2024-06-10T13:57:00.0866667+00:00 @phemanth I don't believe there is such a thing as an Expand transformation in ADF. Can you show me where it is?
-
phemanth • 13,070 Reputation points • Microsoft Vendor
2024-06-10T17:50:11.65+00:00 @David Lang I apologize for confusion on earlier response Here's the exact way how you can adapt Method 2 in Azure Data Factory (ADF) to handle string arrays of variable size without knowing the maximum number of codes:
Using Single Split and Apply To Each
Split Transformation: Maintain the Split transformation to divide the input string or delimited text into individual codes. Use
split(source_column, '~')
(replace~
with your delimiter).Apply To Each Transformation:
- Add an Apply To Each transformation after the Split. This transformation iterates over each element (code) in the split output array.
- Within Apply To Each, you'll perform the lookup for the current code.
Lookup Transformation (Inside Apply To Each):
- Set the Source property to your dataset containing codes and descriptions.
- Configure the Lookup similar to the previous methods:
- Use the current item from Apply To Each as the lookup key (accessible as
item()
within Apply To Each). This refers to the individual code being processed in the loop. - Set the Select property to choose the "Description" column.
Select Transformation (Optional):
- You can add a Select transformation after Apply To Each to rename or format the output column containing descriptions.
Output:
This approach iterates through each code in the split array using Apply To Each. Inside the loop, the Lookup retrieves the description for the current code, providing the desired mapping without needing a predefined array index.
Benefits:
- Handles arrays of any size.
- Leverages ADF's built-in data flow capabilities for potentially better performance.
- Ensure your source data doesn't contain empty elements within the string array, as Apply To Each might iterate over them as well. You can add a conditional split or filter before Apply To Each to handle empty elements if necessary.
By following these steps, you can effectively use Method 2 in your ADF data flow to map codes to descriptions in string arrays of variable size. This approach provides a scalable and efficient solution for your ETL process.
- Add an Apply To Each transformation after the Split. This transformation iterates over each element (code) in the split output array.
-
David Lang • 40 Reputation points
2024-06-10T20:08:58.6633333+00:00 @phemanth There's also no such thing as an Apply To Each transformation in Azure Data Factory.
-
phemanth • 13,070 Reputation points • Microsoft Vendor
2024-06-11T06:17:05.71+00:00 thank you for addressing the point
Here’s a revised approach:
Split Transformation: Use the Split transformation to divide the input string or delimited text into individual codes. Use
split(source_column, '~')
(replace~
with your delimiter).Lookup Transformation: After the Split transformation, add a Lookup transformation. Set the Source property to your dataset containing codes and descriptions. Configure the lookup key using the code from the Split transformation output. Set the Select property to choose the “Description” column.
Aggregate Transformation: After the Lookup transformation, add an Aggregate transformation. This will group all the rows back together. In the Group By field, specify the original identifier for the row (if any). In the Aggregates field, use the collect() function to create an array of descriptions.
Select Transformation (Optional): You can add a Select transformation after the Aggregate to rename or format the output column containing descriptions.
This approach will handle arrays of any size and leverages ADF’s built-in data flow capabilities for potentially better performance. Please ensure your source data doesn’t contain empty elements.
-
David Lang • 40 Reputation points
2024-06-12T16:42:37.44+00:00 @phemanth This will not work. After I use the split function I will have my codes values in an array string. But to use the Lookup transformation I will have to manually add an index number to specify which array index I am looking up (e.g. myCodes[1] == DescriptionTable.code).
As I said in an earlier comment, I have no way of knowing the number of codes that will be sent which means I don't know how big the string array will be, so I can't hard-code index values. I need a way to iterate over the string array regardless of its size.
-
phemanth • 13,070 Reputation points • Microsoft Vendor
2024-06-13T06:38:31.8466667+00:00 @David Lang Here’s a high-level overview of the steps:
Lookup Activity: Use a Lookup activity to retrieve the entire mapping table (code to description) from your dataset.
ForEach Activity: For each item in your input array or delimited text, perform the following steps:
- Split the string into an array if it’s a delimited text.
- Use an
If Condition
activity inside theForEach
loop to check if the current item matches any code in the lookup dataset.- If it matches, replace the code with the corresponding description.
- Use an
Here’s a pseudo-code representation of the process:
{ "name": "Pipeline", "properties": { "activities": [ { "name": "LookupMappingTable", "type": "Lookup", "typeProperties": { "source": { "type": "SqlSource", "sqlReaderQuery": "SELECT * FROM MappingTable" }, "dataset": { "referenceName": "MappingTableDataset", "type": "DatasetReference" } } }, { "name": "ForEachCode", "type": "ForEach", "typeProperties": { "items": "@split('SG~DB~TP', '~')", "activities": [ { "name": "IfCondition", "type": "IfCondition", "typeProperties": { "expression": { "value": "@equals(item(), activity('LookupMappingTable').output.value[0].Code)", "type": "Expression" }, "ifTrueActivities": [ { "name": "SetVariable", "type": "SetVariable", "typeProperties": { "variableName": "Description", "value": "@activity('LookupMappingTable').output.value[0].Description" } } ] } } ] } } ] } }
Remember to replace
'SG~DB~TP'
with your actual input and"SELECT * FROM MappingTable"
with your actual SQL query to retrieve the mapping table. Also, replaceMappingTableDataset
with the name of your actual dataset in Azure Data Factory.This pipeline will replace each code in your input with its corresponding description from the mapping table. The descriptions will be stored in a variable named
Description
. You can then use this variable in subsequent activities as needed. - Split the string into an array if it’s a delimited text.
Sign in to comment