Created a dataflow which is joining table in SQL Server and table in snowflake and target is snowflake table. Dataflow runs fine in debug mode for one run. In triggered mode its showing in progress forever and no progress also reading data source

Suma Nvss Toyyeti01 0 Reputation points
2023-05-11T12:53:09.84+00:00

Dataflow is not working when we are joining sql table and snowflake table

Same data flow works when two snowflake tables are joined.

Is there any limitation in data flows with heterogeneous datasources

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Sedat SALMAN 13,345 Reputation points
    2023-05-13T16:42:26.8766667+00:00

    Please follow this link to check how to join different streams in ADF

    https://learn.microsoft.com/en-us/azure/data-factory/data-flow-join

    and this one to see how to create a data flow

    https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow

    and finally this one for supported data sources

    https://learn.microsoft.com/en-us/azure/data-factory/connector-overview

    Azure Data Factory's mapping data flows do support operations on heterogeneous data sources. However, there are certain factors that may influence the successful execution of such operations.

    Here are a few things you might consider:

    • Data Integration Runtime: Ensure that the Azure Integration Runtime instance is correctly configured. The Integration Runtime is responsible for the movement of data between different data stores and for dispatching and monitoring of data flow activities.
    • Data Store Connectivity: Validate the connection and access permissions to both SQL Server and Snowflake.
    • Schema Compatibility: Check the compatibility of the schemas of the SQL Server and Snowflake tables. They need to be compatible in terms of data types and structure for the join operation to work correctly.
    • Query Optimization: Heterogeneous operations are generally more resource-intensive and slower than operations on homogeneous data sources. Review your data flow design and queries to ensure they are optimized for performance.
    • Debug Mode vs. Triggered Run: In debug mode, data flows run on a warm, always-up cluster which makes them execute faster compared to the triggered runs where clusters need to be started up, causing some delay. However, this shouldn't result in the data flow being stuck in progress forever.

  2. ShaikMaheer-MSFT 38,406 Reputation points Microsoft Employee
    2023-05-17T17:19:11.6433333+00:00

    Hi,

    Thank you for posting query in Microsoft Q&A Platform.

    It should work with heterogeneous data sources too if DataSource supports as source transformation in dataflows.

    You mentioned it works fine in debug mode, so it should work in trigger mode too. Might be some intermittent issue.

    Kindly consider retrying and also using different IR with more cores and see if that helps.

    Hope this helps. Please let me know if any further queries.


    Please consider hitting Accept Answer button. Accepted answers help community as well.