Job failed due to reason: at Sink ***: cannot resolve 'source.`***`' due to data type mismatch: cannot cast struct<histories:array<struct<author:struct<accountId:string,accountType:string,active:boolean,avatarUrls:struct<16x16:string,24x24:string,32x32:st

Ryan Ruenroeng 30 Reputation points
2023-07-18T18:35:13.12+00:00

Hi there,

I'm getting a PySpark error when trying to use a dataflow to add data to my Delta Lake using Azure Data Factory. The error seems to be occurring because there is a mismatch between the schema being returned by the API I'm accessing. It works for several data flows and then intermittently stops working.

  • As this is the first time I'm adding data to a new table in the Delta Lake, I am only Inserting, not Upserting rows.
  • My response has an array of objects that I flattened and added to the data flow. I an inserting on the following criteria: isNull(id)==false()
  • Changelog is a complex type: User's image
  • In the Sink, I allow schema drift and have the following settings set (I don't actually have an upsert condition listed in my alter row activity): Can you please provide me with some guidance on why I'm getting the error I'm getting?
Job failed due to reason: at Sink 'IssueDeltaLakeSink': cannot resolve 'source.`changelog`' due to data type mismatch: cannot cast struct<histories:array<struct<author:struct<accountId:string,accountType:string,active:boolean,avatarUrls:struct<16x16:string,24x24:string,32x32:string,48x48:string>,displayName:string,self:string,timeZone:string>,created:string,id:int,items:array<struct<field:string,fieldId:string,fieldtype:string,from:string,fromString:string,tmpFromAccountId:string,tmpToAccountId:string,to:string,toString:string>>>>,maxResults:smallint,startAt:boolean,total:smallint> to struct<histories:array<struct<author:struct<accountId:string,accountType:string,active:boolean,avatarUrls:struct<16x16:string,24x24:string,32x32:string,48x48:string>,displayName:string,self:string,timeZone:string>,created:string,id:int,items:array<struct<field:string,fieldId:string,fieldtype:string,from:string,fromString:string,tmpFromAccountId:string,tmpToAccountId:string,to:string,toString:string>>,historyMetadata:struct<activ
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,646 questions
{count} votes

Accepted answer
  1. QuantumCache 20,366 Reputation points Moderator
    2023-07-19T22:46:03.3733333+00:00

    Hello @Ryan Ruenroeng Welcome to Q&A forum,

    Thank you for sharing the scenario on this forum and helping others! This is really helpful.

    Updated: Resolution from @Ryan Ruenroeng

    This was happening for ~90% of my runs. I was able to resolve my issue. To do this, I poured over the failing examples to try and find which API call would be representative of all of the rest of the calls. (Not omitting a property and such). I imported the schema using that example instead of the one I'd arbitrarily chosen. What would be neat would be if you could allow users to input some example APIs or an array of default values for the input parameters so that the full schema can be gleaned when importing.

    Please Post an idea/feedback on the Microsoft IDEAS portal and we will be upvoting and others can comment, the Product team considers the feedbacks regularly and triage them.

    User's image

    Is this the issue frequent or intermittent?

    I think we need to make sure that the data types of the source data match the data types of the target schema in your Delta Lake table.

    As a troubleshooting step, you could temporarily modify your data flow to work with a static, predetermined schema that matches the expected structure. This will help identify if the issue lies with the dynamic schema or the data itself.

    As Original poster can't accept their own answers/resolutions on this forum, I am posting the resolution from **Ryan Ruenroeng **as an Answer.

    Please click "Accept Answer". So that we can close this thread.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Ryan Ruenroeng 30 Reputation points
    2023-07-20T22:04:43.0966667+00:00

    This was happening for ~90% of my runs. I was able to resolve my issue. To do this, I poured over the failing examples to try and find which API call would be representative of all of the rest of the calls. (Not omitting a property and such).

    I imported the schema using that example instead of the one I'd arbitrarily chosen.

    What would be neat would be if you could allow users to input some example APIs or an array of default values for the input parameters so that the full schema can be gleaned when importing.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.