Loading files with ever changing columns

bk 466 Reputation points
2020-11-05T13:53:37.39+00:00

Hi All,
I am using ADF to load files from a sftp site into ADLS. The issue is the incoming files are not consistent , column wise. Some times they add columns and there is no way we get notified. Is there a way in ADF where it automatically copies the new columns without manual intervention?
Please advice.
Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,566 questions
{count} votes

Accepted answer
  1. HimanshuSinha-msft 19,471 Reputation points Microsoft Employee
    2020-11-10T20:13:11.413+00:00

    Hello @bk ,

    Thanks for the being so detailed in the ask , believe me that helps .
    So in your case the sink is flat files and the source is SQLMI . This is what I was thinking , do you
    really need to have the new columns ? if not you can use the tabular source , please read about that here
    (https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#tabular-source-to-tabular-sink)

    Now let go ahead with the second possiblity where you need to append new columns to the SQL . As far I know
    you can use the autocreate table option in data flow to create the staging table and then you can use the Stored proc
    to read the Staging table and Alter the actual table .

    let me know if this helps .

    Updated on 11/17/2020

    After talking to few members in the team we are sure that the schema drft(supported) in the source does not mean the we will have the schema drift on the sink side(not supported)
    . But then I did tried the below work around in ADF and as per the testing it seems to work . I assume that the file is like below today date

    11,22,33,44,55,66,77,88

    and tmmrw it will be

    11,22,33,44,55,66,77,88,999,10000,11
    21,22,23,24,25,26,27,28,299,20000,21

    Here is the logic

    1. Copy the data to an intermediate table with the option with "Auto create table" .
    2. The second copy will copy the data from intermediate table to the main table , but we will use the pre-copy script option to columns
      check if the schema to the intermediate and final table is not same , if it is not we will add the missing columns .

    Here is below piece of SQL scripts ( in my example the name of the final table is foofinalTable and intermediate table is fooSourceTable.

    DECLARE @SQlCOmmand table ( id int identity , command varchar(100) )

    INSERT INTO @SQlCOmmand ( command)
    SELECT 'ALTER TABLE ' + Table_Schema +'.foofinalTable ADD ' +COLUMN_NAME + ' nvarchar(100)' as command
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = N'fooSourceTable'
    and COLUMN_NAME not in (
    SELECT COLUMN_NAME
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = N'foofinalTable')

    DECLARE @I INT
    DECLARE @CMD VARCHAR(1000)
    SELECT @I = 1
    DECLARE @Max INT
    SELECT @Max = MAX(ID) FROM @SQLCOMMAND

    WHILE @I <= @Max BEGIN
    SELECT @CMD = COMMAND FROM @SQLCOMMAND WHERE ID = @I
    EXEC (@CMD )
    SET @I = @I + 1
    END

    40504-changingcol.gif

    Thanks Himanshu
    Please do consider to click on "Accept Answer" and "Upvote" on the post that helps you, as it can be beneficial to other community members.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.