Azure Data Factory Dataflow parque to deltalake

Gokhan Varol 26 Reputation points
2022-08-30T14:29:07.343+00:00

We are looking for a way to accomplish this in Azure Data Factory DataFlow.
We have numerous parquet files per table. Each parquet has the same primary key column (_PK) and only one primary key. Each parquet has different column definition (except the fixed primary key column).
We are trying to upsert this parquet files into Delta format in an ADF data flow.
We want to use only one dataflow to accomplish this.
We want dataflow to upsert all columns in the parquet into Delta and join on primary key column (named _PK) to do the upsert.
We are getting failures whenever the dataflow automatically detects the columns, it doesn't like / keep the primary key column whenever it autodetects columns.
Can this be accomplished in a single adf dataflow for any parquet files upsert into delta with the same primary key column on parquet/delta files?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,020 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Bhargava-MSFT 31,196 Reputation points Microsoft Employee
    2022-08-31T23:04:14.813+00:00

    Hello @Gokhan Varol ,

    Welcome to the MS Q&A platform.

    Please correct me if my understanding is wrong. You are trying to upsert all columns in the Parquet file(source) to Delta(sink) in the dataflow and join on the Primary key to do the upsert. During the upsert process, you were getting an error message when automatically detecting the columns.

    Could you please provide the detailed error message?

    and can you please try unchecking the Automapping and manually map the columns and see?

    236676-image.png


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.