Azure Data Factory sink using Polybase getting validation error.

Brian Pierschbacher 1 Reputation point
2021-11-08T22:17:53.497+00:00

I have a linked service connection to azure storage v2 using Managed Identity as authentication method. The first part of the activity loads data from on-premise into blob storage, the dataset has null value " " and treatEmptyasNull as false. The data loads has expected with source query setting null values to " " and that is what in blob in storage.

The second part copies that data from blob storage into dedicated sql pool (formerly SQL DW) using Polybase on the on the sink (into sql table). When validate the pipeline the validation output has "Source format: Empty string should be treated as Null and Null value must be set to empty string. Conditions are not met to run PolyBase copy directly, please enable staging or fix these conditions."

The sink activity does have information "Please make sure to assign Storage Blob Contributor RBAC role to the SQL Database Server"

Any suggestions would be greatly appreciated.

PS. We have this scenario working an older Data Factory using Azure Storage connector instead of the current Azure Blob Storage connector.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,343 questions
{count} votes

2 answers

Sort by: Most helpful
  1. HimanshuSinha-msft 17,726 Reputation points Microsoft Employee
    2021-11-16T18:53:07.197+00:00

    Hello @Brian Pierschbacher ,
    Thanks for the patiance .
    As the error message indicates, if you want to use polybase/copy statement to direct load data to Synapse, the ‘null value’ setting in Source dataset must be set as default or set to empty string (""). See doc. The null values you set is “ ”, which doesn’t match the requirement. Suggest you to follow the guidance, either to remove the ‘null value’ settings or ‘enable staging’ in ‘Settings’ tab for a staged copy.
    Let me know if you have futher questions

    Thanks
    Himanshu

    -------------------------------------------------------------------------------------------------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    No comments

  2. Brian Pierschbacher 1 Reputation point
    2021-11-16T19:55:13.863+00:00

    Did you get it to work?

    I did to some digging and I think I found the one issue. In capturing the output from the sys.dm_pdw_exec_requests, I found the location of the external file is set as Data Lake Storage Gen 2 . We are not using Data Lake storage.

    Using Azure Storage connector in ADF LOCATION ='wasbs://policy@aegdevstoragev2.blob.core.windows.net' (working)
    Using Azure Blob Storage connector with staging enabled in ADF LOCATION ='abfss://staging@staegedwdev001.dfs.core.windows.net' (not working)
    Using Azure Blob Storage connector without staging enabled in ADF LOCATION ='abfss://policy@staegedwdev001.dfs.core.windows.net' {not working)

    @HimanshuSinha-msft

    No comments