ADF copy activity - Don't throw error when source schema drifts and ignore the new columns without any error

Saket Detroja 1 Reputation point Microsoft Employee
2020-10-08T04:38:05.437+00:00

I have a pipeline that copies the data from Kusto tables to Azure SQL Data Warehouse tables. There is a ForEach activity which iterates over the list of all the tables and using "Bulk insert" in ADF copy activity the data is copied. Also, the data copy is incremental and that is achieved by using SQL query to load the data incrementally. Now, as Kusto is a dynamic schema storage, new columns get added to the schema of Kusto tables. When that happens, I want to ignore the new columns and copy only those columns which exist in the corresponding SQL Data Warehouse table. Also, one can be sure that the column names will exactly match. I just want to discard the new columns. How can this be achieved? (I know this can be done in data flow by using Auto mapping, but data flow doesn't support Kusto as source)

Azure Data Explorer
Azure Data Explorer
An Azure data analytics service for real-time analysis on large volumes of data streaming from sources including applications, websites, and internet of things devices.
537 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,070 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,028 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. HimanshuSinha-msft 19,476 Reputation points Microsoft Employee
    2020-10-08T23:32:28.737+00:00

    Hello anonymous userDetroja-6582 ,

    Thanks for the ask and using the forum .

    Can you please elaborate the how are you fetching the tables name , are you using the lookup activity ? I think we can keep the query with the columns names and try if that helps . Also you mentioned that you pull the darta incrementally , are you using watermark while pulling the data or you pull everything to Synapse and then do second pull to the relevant tables ?

    Thanks
    Himanshu


  2. Saket Detroja 1 Reputation point Microsoft Employee
    2020-10-21T17:28:31.71+00:00

    Hello @HimanshuSinha-msft ,

    Thanks for the reply, we actually already solved the problem using a json based dynamic mapping in the copy data activity. Now, challenge with generating that json for all the tables is the same challenge that one will face with what you are suggesting which is to query only the required columns. But how to know about those required columns, we cannot just go and do that manually for all the tables right! So, we have used a python script to fetch the schemas from the sink beforehand based on which we generate the json mapping.

    To answer your question, we pull data incrementally using a datetime field in the tables.

    Thanks,
    Saket

    0 comments No comments

  3. Billy Switzer 0 Reputation points
    2023-09-15T21:50:58.5266667+00:00

    I ran into the same problem and solved it by using Data Flow instead of the Copy Activity. Set the Source and Sink but leave the Mapping blank. It will copy any columns whose names match exactly and ignore any others.

    Be aware that this may increase your pipeline cost.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.