Loading dynamic files

Niren Adhikary 96 Reputation points
2023-06-16T06:46:11.4+00:00

Hello, We are trying to load data from an output file to a delta table in azure but there will be some additional columns added everytime the output file is created. What is the best possible solution to load the output file to a delta table in case the file has additional columns in it.. @aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Azure Storage
Azure Storage
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,542 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,380 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,655 questions
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 38,551 Reputation points Microsoft Employee Moderator
    2023-06-20T09:08:23.4+00:00

    Hi Niren Adhikary,

    Thank you for posting query in Microsoft Q&A Platform.

    If you are loading data from an output file to a delta table in Azure and the output file has additional columns added to it each time it is created, you can use the following approach to handle the additional columns:

    Create the delta table with a schema that includes all the columns that are present in the output file. You can use the CREATE TABLE statement to create the delta table with the required schema.

    When you load the data from the output file to the delta table, use the MERGE INTO statement to merge the data from the output file with the data in the delta table. The MERGE INTO statement allows you to insert new rows, update existing rows, or delete rows based on a condition.

    In the MERGE INTO statement, use the WHEN NOT MATCHED BY TARGET clause to insert new rows into the delta table. This clause allows you to specify the columns that are present in the output file but not in the delta table. You can use the NULL value to insert default values for these columns.

    In the MERGE INTO statement, use the WHEN MATCHED clause to update existing rows in the delta table. This clause allows you to specify the columns that are present in the output file but not in the delta table. You can use the SET clause to update the values of these columns.

    Here is an example of how you can use the MERGE INTO statement to load data from an output file to a delta table:

    MERGE INTO delta_table AS target
    USING output_file AS source
    ON target.id = source.id
    WHEN MATCHED THEN
      UPDATE SET target.column1 = source.column1, target.column2 = source.column2, ...
    WHEN NOT MATCHED BY TARGET THEN
      INSERT (id, column1, column2, ...) VALUES (source.id, source.column1, source.column2, ...)
    
    
    

    In this example, delta_table is the name of the delta table, output_file is the name of the output file, and id, column1, column2, etc. are the names of the columns in the delta table and the output file. The MERGE INTO statement merges the data from the output file with the data in the delta.

    Hope this helps. Please let me know if any further queries.


    Please consider hitting Accept Answer button. Accepted answers help community as well. Thank you.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.