Accessing Azure ADLS gen2 with Pyspark on Windows

alexander tikhomirov 31 Reputation points
2022-04-07T06:32:44.833+00:00

Hello
Maybe it not corrected tag about azure synapse, but my team is checking to have local dev experience to write code for Synapse Job definition. What the best practise in this case?
I have working local dev environment VSCode, PySpark and I could execute PySpark code and theoretically use the same scripts in Synapse Job definition, but I would like to use command

spark.read.load('abfss://XXX@X .dfs.core.windows.net/XXX/file.csv', format='csv')

to read file from adls gen2 from both environment, locally and in remote spark cluster, is it possible? Someone has such experience?

locally doesn't work

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,369 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 37,896 Reputation points Microsoft Employee
    2022-04-08T09:34:22.013+00:00

    Hi @alexander tikhomirov ,

    Thank you for posting query in Microsoft Q&A Platform.

    To access ADLS gen2 from windows machine. You need to perform below high level steps.

    • Set up the environment
    • Configure your storage account in Hadoop
    • Connect to your storage account.

    All above steps are detailed documented in below link. Kindly follow the steps and see if that helps. Please let us know how it goes.
    https://learn.microsoft.com/en-us/dotnet/spark/how-to-guides/connect-to-azure-storage