ADF copy activity - Don't throw error when source schema drifts and ignore the new columns without any error

Question

I have a pipeline that copies the data from Kusto tables to Azure SQL Data Warehouse tables. There is a ForEach activity which iterates over the list of all the tables and using "Bulk insert" in ADF copy activity the data is copied. Also, the data copy is incremental and that is achieved by using SQL query to load the data incrementally. Now, as Kusto is a dynamic schema storage, new columns get added to the schema of Kusto tables. When that happens, I want to ignore the new columns and copy only those columns which exist in the corresponding SQL Data Warehouse table. Also, one can be sure that the column names will exactly match. I just want to discard the new columns. How can this be achieved? (I know this can be done in data flow by using Auto mapping, but data flow doesn't support Kusto as source)

Answer

Hello anonymous userDetroja-6582 ,

Thanks for the ask and using the forum .

Can you please elaborate the how are you fetching the tables name , are you using the lookup activity ? I think we can keep the query with the columns names and try if that helps . Also you mentioned that you pull the darta incrementally , are you using watermark while pulling the data or you pull everything to Synapse and then do second pull to the relevant tables ?

Thanks
Himanshu

Answer

Hello @HimanshuSinha-msft ,

Thanks for the reply, we actually already solved the problem using a json based dynamic mapping in the copy data activity. Now, challenge with generating that json for all the tables is the same challenge that one will face with what you are suggesting which is to query only the required columns. But how to know about those required columns, we cannot just go and do that manually for all the tables right! So, we have used a python script to fetch the schemas from the sink beforehand based on which we generate the json mapping.

To answer your question, we pull data incrementally using a datetime field in the tables.

Thanks,
Saket

Answer

I ran into the same problem and solved it by using Data Flow instead of the Copy Activity. Set the Source and Sink but leave the Mapping blank. It will copy any columns whose names match exactly and ignore any others.

Be aware that this may increase your pipeline cost.

Share via

ADF copy activity - Don't throw error when source schema drifts and ignore the new columns without any error

3 answers