Accessing Azure ADLS gen2 with Pyspark on Windows

Question

Hello
Maybe it not corrected tag about azure synapse, but my team is checking to have local dev experience to write code for Synapse Job definition. What the best practise in this case?
I have working local dev environment VSCode, PySpark and I could execute PySpark code and theoretically use the same scripts in Synapse Job definition, but I would like to use command

spark.read.load('abfss://XXX@X .dfs.core.windows.net/XXX/file.csv', format='csv')

to read file from adls gen2 from both environment, locally and in remote spark cluster, is it possible? Someone has such experience?

locally doesn't work

Answer

Hi @alexander tikhomirov ,

Thank you for posting query in Microsoft Q&A Platform.

To access ADLS gen2 from windows machine. You need to perform below high level steps.

Set up the environment
Configure your storage account in Hadoop
Connect to your storage account.

All above steps are detailed documented in below link. Kindly follow the steps and see if that helps. Please let us know how it goes.
https://learn.microsoft.com/en-us/dotnet/spark/how-to-guides/connect-to-azure-storage

Share via

Accessing Azure ADLS gen2 with Pyspark on Windows

1 answer