Error: 'Key columns' expression should return string[], when passing array of parameters in DataFlow with "Delta" as Sink

ManiK 36 Reputation points
2022-02-04T15:27:43.013+00:00

Hello,

I have set of parquet files that I need to Upsert (Update else Insert) into "Delta" (sink) using Data Flow with mainly following 3 transformations:

SourceParquetFile==>AlterRowUpsert:true()==>SinkDelta (AllowUpsert)

I am trying to make it a generic data flow since for some sources there could be only one key column and for others more than one.
Hence, have defined the parameter for the Data Flow as type String Array = pKColumnsForUpsert (example: ['id', 'name'])

And in the Delta sink, under Key columns - using Custom Expression as:- split(toString($pKColumnsForUpsert), ',') ; where the Update method is selected as Allow Upsert - without any errors.

However, during validation of the data flow it results into error as :- 'Key columns' expression should return string[]

Have already followed and tried following links for suggestions/workarounds which looks like works for other "database" sink types, but probably not with "Delta" as sink type:

https://stackoverflow.com/questions/67163563/passing-the-dataflow-parameter-to-sink-key-column-in-azure-data-factory/67202095#67202095

https://learn.microsoft.com/en-us/answers/questions/133996/how-to-pass-a-parameter-with-more-than-one-key-fie.html

171386-delta-key-col-expr-error.png171348-generic-parquet-to-delta-upserts.png171461-upsert-true.png171442-delta-upsert-custom-expr.png

Please help/suggest.

Thanks!!

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,341 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,538 questions
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 37,896 Reputation points Microsoft Employee
    2022-02-08T11:13:44.783+00:00

    Hi @ManiK ,

    Thank you for posting query in Microsoft Q&A Platform.

    I could see your dataflow parameter is already array type, hence we no need to perform split operation here.

    172196-image.png

    By any chance if you wish to use your parameter "pKColumnsForUpsert" as string type, then we will need to perform split operation to take column names from string( 'id','name' string passed in to your parameter) as array values.

    Hope this will help. Please let us know if any further queries.

    ----------------

    Please consider hitting Accept Answer. Accepted answers helps community as well.


0 additional answers

Sort by: Most helpful