Data Factory Dataflow with Snowflake warehouse - Schema drift

AzRobo 16 Reputation points
2021-08-25T19:14:19.707+00:00

Hi,

we are doing a POC to use ADF dataflow mappings to load data in Snowflake WH tables. We tried to create a generic dataflow to read from the snowflake table and write to another snowflake table using schema drift option enable.
source table: 10 columns - 5 column names matches with target
Target Table: 5 columns

But the dataflow is failing at the source column which is not present in target table with error as
"Message":"Job failed due to reason: at Sink 'SinkHub': java.sql.BatchUpdateException: SQL compilation error: error line 1 at position 3\ninvalid identifier 'XYZ'"

Dataflow is successful only if we keep exact column names in source table like target table. We are expecting that columns matched should get populated when Auto Mapping is enabled.
That means dataflow schema drift option is not working with snowflake database.

anyone faced similar issue?.

Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,002 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,466 Reputation points Microsoft Employee
    2021-08-26T11:27:31.397+00:00

    Hi @AzRobo ,

    Welcome to Microsoft Q&A Platform. Thank you for posting your query here.

    In Data flows, for auto mapping to work, your column names between source and sink should match exactly same.

    If your column names not matches between source and sink then you should disable auto mapping inside Sink transformation mappings tab and manual map columns accordingly.

    Schema drift is the case where your sources often change metadata.

    In a source transformation, schema drift is defined as reading columns that aren't defined your dataset schema. When schema drift is enabled, all incoming fields are read from your source during execution and passed through the entire flow to the Sink. By default, all newly detected columns, known as drifted columns, arrive as a string data type.

    In a sink transformation, schema drift is when you write additional columns on top of what is defined in the sink data schema.

    So, schema drift is case where you get additional columns or reading columns which are not defined in your dataset schema. It is not the case of mapping columns between source and sink when column names doesn't match.

    You can refer below small video also to know about schema drift in Mapping data flows.
    https://www.youtube.com/watch?v=LS0u7DxhpDI

    Click here to know more about schema drift.

    Hope this helps. Please let us know if any further queries. Thank you.

    ---------------------------

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.