Hi @phemanth regarding the conditional select method.. I have 3 streams as per your resolution. Now how can we merge 3 streams into 1 to continue the flow?
In ADF, I am not able to merge data from multiple streams with different columns with different data types
Hello, I am not able to merge data from multiple streams in ADF DATAFLOWS. Stream 1) flow let - only 1 column. Stream 2) flow let - only 1 column. Stream 3) source columns (54 columns) Stream 4) regular data flow. Now I want to merge 1,2, 3 with 4th stream and continue the flow in dataflow? Iam not able to do. I have tried to do some join, lookup but that doesn't help because I don't have any matching condition.. Tried union but that doesn't Suport? Please help?
Azure Synapse Analytics
Azure Data Factory
-
phemanth 6,885 Reputation points • Microsoft Vendor
2024-04-25T07:32:05.05+00:00 Thanks for using MS Q&A platform and posting your query.
Use the "Select" transformation in ADF Dataflows to select only the required columns from each stream. This will ensure that all the streams have the same columns.
Use the "Derived Column" transformation in ADF Dataflows to convert the data types of the columns to a common data type. This will ensure that all the streams have the same data types.
Use the "Union" transformation in ADF Dataflows to merge the data from all the streams. The "Union" transformation requires that all the streams have the same number of columns and the same data types.
f you need to perform any additional transformations on the merged data, you can use the "Derived Column" or other transformations in ADF Dataflows.
if the above suggestion doesn't work for you, please provide more details such as your present dataflow configuration.
-
phemanth 6,885 Reputation points • Microsoft Vendor
2024-04-26T06:02:23.4933333+00:00 @neeraj aitha We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
-
neeraj aitha 0 Reputation points
2024-04-26T10:55:46.11+00:00 @phemanth but my scenario is like the 3 streams which is mentioned are not the source data and they are inner streams. Also if I do union I don't have same number of columns. I have 54 columns coming from stream number 3)
-
phemanth 6,885 Reputation points • Microsoft Vendor
2024-04-29T05:25:19.9433333+00:00 @neeraj aitha Here's how to tackle merging data from inner streams with different column counts in ADF Dataflows:
1. Leverage Derived Column for Stream Replication (if applicable):
- If Streams 1 and 2 contain single values or a limited set of data points, you can use the "Derived Column" transformation to replicate those values into new columns within each stream. This approach ensures all streams have the same number of columns for the union.
- For example, if Stream 1 has a value "source1" and Stream 2 has "source2", create 54 new columns in each stream filled with the respective source value (e.g., "source1" for all 54 columns in Stream 1).
2. Conditional Column Selection (if applicable):
- If Streams 1 and 2 have more than a few data points, but the number of columns is still manageable, you can use a combination of "Select" and "Derived Column" transformations:
- Use "Select" to pick the desired columns from Streams 1 and 2.
- Employ "Derived Column" to create new columns with null values (or a predefined default value) for the remaining 54 columns in each stream.
- This method ensures all streams have the required number of columns (potentially with null values for Stream 1 and 2's extra columns).
3. Custom Script-Based Transformation (for complex scenarios):
- If Streams 1 and 2 have a large number of columns or complex data structures, consider using a script-based transformation. You can write code (e.g., Python, C#) to manipulate the data and create a new stream with the desired column structure matching Stream 3.
- This approach requires scripting expertise but offers greater flexibility for intricate data handling.
4. Data Flow with Multiple Sinks (alternative approach):
- If merging into a single stream isn't feasible due to structural differences, you might consider creating a data flow with multiple sinks.
- Design the data flow to process Streams 1, 2, and 3 independently.
- Configure separate sinks (e.g., Azure Data Lake Storage, Azure SQL Database) to store the output from each stream.
- This approach provides separate datasets for each stream, allowing further processing or analysis individually.
Choosing the Best Approach:
The most suitable method depends on the complexity of Streams 1 and 2's data structures and the number of columns they contain:
- For simple cases with single values or a few data points, use Derived Column replication or conditional selection with Derived Column.
- For complex data structures or a large number of columns, explore scripting-based transformations or a multi-sink data flow
-
phemanth 6,885 Reputation points • Microsoft Vendor
2024-04-30T04:42:46.8866667+00:00 @neeraj aitha just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
-
neeraj aitha 0 Reputation points
2024-04-30T07:50:18.2166667+00:00 @phemanth Iam trying to understand your solutions. Will test once I got scenario match and get back to you here. I may take a week to test.
-
phemanth 6,885 Reputation points • Microsoft Vendor
2024-05-02T10:23:39.09+00:00 @neeraj aitha Here's how to merge 3 streams into 1 using the conditional select method with Derived Column in ADF Dataflows:
Select Desired Columns:
- For each of Streams 1 and 2, use the Select transformation to pick only the specific columns you want to include in the merged stream.
Create New Columns with Null Values:
- For Streams 1 and 2, utilize the Derived Column transformation to generate new columns that match the remaining columns from Stream 3.
- Within Derived Column:
- Define new column names corresponding to the missing columns in Streams 1 and 2.
- Set the expression for each new column to a constant value (e.g.,
null
or a default value) to maintain consistency.
Example (assuming Stream 1 has 1 column and Stream 2 has 2 columns):
- Stream 1 has a column named "source1_data". You want to include it in the merged stream.
- Stream 2 has columns "source2_col1" and "source2_col2". You want both in the merged stream.
- Stream 3 has 54 columns, including "col1", "col2", ..., "col54".
Stream 1 Transformation:
- Select: Choose "source1_data"
- Derived Column: Create 52 new columns (col1 to col52) with an expression like
null
or a specific default value.
Stream 2 Transformation:
- Select: Choose "source2_col1" and "source2_col2"
- Derived Column: Create 50 new columns (col3 to col52) with an expression like
null
or a specific default value.
Merging Streams:
After applying Select and Derived Column to Streams 1 and 2, use the Union transformation to combine all three streams into a single one.
- Union requires all input streams to have the same number of columns (even if some contain null values).
Continuing the Flow:
Once you have the merged stream, you can connect it to subsequent transformations in your data flow for further processing.
-
neeraj aitha 0 Reputation points
2024-05-07T07:34:15.89+00:00 Hi @phemanth , Is it possible to bring columns from inner streams to main stream of the derived columns as per your resolution? What would be the expression to bring those 2 columns into derived column in main stream to merge the data... Please let me know.
-
phemanth 6,885 Reputation points • Microsoft Vendor
2024-05-08T05:27:07.78+00:00 @neeraj aitha you can bring columns from inner streams (stream 1 and 2 in this case) into the main stream (stream 3) using Derived Column transformations in ADF Dataflows. Here's how to achieve that based on the conditional column selection approach:
Select Desired Columns:
Use the Select transformation on streams 1 and 2 to choose the specific columns you want to include in the merged stream.
Create New Columns with Expressions:
- Apply the Derived Column transformation to streams 1 and 2.
- Within Derived Column, define new column names corresponding to the missing columns in streams 1 and 2.
- Set the expression for each new column to reference the desired column from the respective stream.
Expression Example:
Assuming Stream 1 has a column named "source1_data" and Stream 2 has a column named "source2_data", and you want to merge them with Stream 3 which has additional columns "col1", "col2", ..., "col54":
Stream 1 Transformation:
- Select: Choose "source1_data"
- Derived Column:
Create 52 new columns (col1 to col52) with expressions like:
iif(col(source1_data) IS NULL, null, col(source1_data)) // Replace 'source1_data' with your actual column name
This expression checks if "source1_data" is null. If yes, it sets the new column to null. Otherwise, it copies the value from "source1_data".
Stream 2 Transformation:
- Select: Choose "source2_data"
- Derived Column:
Create 51 new columns (col2 to col52) with expressions like:
iif(col(source2_data) IS NULL, null, col(source2_data)) // Replace 'source2_data' with your actual column name
Explanation of the Expression:
-
iif
is a function that checks a condition and returns different values based on the outcome. -
col(source1_data)
(orcol(source2_data)
) retrieves the value from the selected column in the respective stream. -
IS NULL
checks if the value is null. - If the value is null, the new column is set to null (to maintain consistency with Stream 3).
- If the value is not null, it's copied to the new column.
Merging Streams:
- After applying Select and Derived Column to streams 1 and 2, use the Union transformation to combine all three streams.
Following this approach, you'll have a merged stream with all columns from Stream 3, where the missing columns from streams 1 and 2 are filled with null values (or a specific default value you choose in the expression).
Sign in to comment
1 answer
Sort by: Most helpful
-
neeraj aitha 0 Reputation points
2024-04-30T07:55:12.39+00:00