Using the account key as authentification in Synapse to write to ADLSGen2

Victor Seifert 46 Reputation points
2022-02-25T11:10:59.787+00:00

Is it possible to use the account key of a ADLS Gen2 storage to write pyspark dataframes from within Synapse?

Background: I want to develop and run my code locally in my IDE (notebooks are horrible for larger applications in my opinion and make it very hard to test your code) and only later execute it as a package within Synapse - since I dont have linkedServices/service principals locally, I want to use the account key to authenticate. However, I get a 403 error in Synapse when running the same code which worked locally.

I can execute the following code locally without a problem, but I get authentication errors in Synapse - the only difference being that I don't have to build a SparkSession in Synapse but one is supplied automatically.

Error: "java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, "

Here is the local code which runs successfully:

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .config('spark.jars.packages', 'org.apache.hadoop:hadoop-azure:3.3.1') \
    .getOrCreate()


adls_account_key = "<myaccountkey>"
adls_container_name = "<mycontainername>"
adls_account_name = "<myaccountname>"
filepath = "/Data/Contacts"

spark.conf.set(f"fs.azure.account.key.{adls_account_name}.dfs.core.windows.net", adls_account_key)
base_path = f"abfss://{adls_container_name}@{adls_account_name}.dfs.core.windows.net"    
df = spark.read.parquet(base_path + filepath)
df.show(10, False)

Here is the Synapse Code which crashes with 403 errors:

adls_account_key = "<myaccountkey>"
adls_container_name = "<mycontainername>"
adls_account_name = "<myaccountname>"
filepath = "/Data/Contacts"

spark.conf.set(f"fs.azure.account.key.{adls_account_name}.dfs.core.windows.net", adls_account_key)    
base_path = f"abfss://{adls_container_name}@{adls_account_name}.dfs.core.windows.net"
df = spark.read.parquet(base_path + filepath)
df.show(10, False)
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,696 questions
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,061 Reputation points
    2022-03-08T17:51:28.453+00:00

    Hello @Victor Seifert

    I am consolidating our conversation for the betterment of the community.

    Databricks allows use of Account key, but Synapse does not allow use of Account key. Both use Spark and have much in common, so it is easy to get the two confused. Synapse has tighter integration of Azure to give you better security options. Writing secrets in your code (such as Account key) exposes you to dangers. Account key makes sense for a temporary test, but in production is not suitable.

    Thanks,
    Martin

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Vidya Narasimhan 2,126 Reputation points Microsoft Employee
    2022-02-27T15:54:53.783+00:00

    @Victor Seifert please go through this link for details on RBAC permissions required
    https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-data-operations-portal

    You might be running the script locally as a user or service principal with the right RBAC access to Storage account.
    For Synapse, assign the required RBAC role to Synapse MSI on Storage account container/blob.