Issue when running notebook from pipeline

Shao Peng Sun 91 Reputation points
2023-04-19T16:19:12.6466667+00:00

I have a notebook. And in this notebook, I update in spark session configuration to use service principle to make connection to storage account because I am only allowed to write into the storage account via the service principle.
And to make it work, I also need to change the spark session configuration to enable "Run as managed identity".
If I just run this notebook, it works very well. The problem is if I create a pipeline and run this notebook from pipeline, then it will be failed, and get error "

Operation failed: "This request is not authorized to perform this operation using this permission.", 403

" Could you please help on this? What should I do to make it also work by running this notebook from pipeline?

spark.conf.set("fs.azure.account.auth.type.kynsightprodweudlsa01.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.kynsightprodweudlsa01.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.kynsightprodweudlsa01.dfs.core.windows.net", sp_clientid)
spark.conf.set("fs.azure.account.oauth2.client.secret.kynsightprodweudlsa01.dfs.core.windows.net", sp_secretvalue)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.kynsightprodweudlsa01.dfs.core.windows.net", client_endpoint)
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator
    2023-04-20T21:37:57.9233333+00:00

    Hello Shao Peng Sun, Please correct me if my understanding is wrong. You were having issues when running the notebook via the pipeline. But it's running fine, when you run it from the synapse studio.

    Synapse notebooks use Azure Active Directory (Azure AD) pass-through to access the ADLS Gen2 accounts. If you are running the notebook directly on the synapse then your account needs to have Storage Blob Data Contributor to access the ADLS Gen2 account (or folder).

    If you are running the notebook via the pipeline, then synpase workspace managed service identity needs to have Storage Blob Data Contributor to access the ADLS Gen2 account (or folder).

    I hope this helps. Please let me know if you have any further questions.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.