synapse mount point using python no file found error (synfs:/)

Sukumar, Udayakiran (Allianz UK) 20 Reputation points
2023-09-25T20:48:27.9366667+00:00

Hi,

I am trying to use synapse mssparkutils.mount function to mount onto the ADLS gen2 and read a csv/ text file using python to do text operations. The mount was created using linked service, but while reading through python it thorws an error saying file not found. where as same path I could use for reading the csv file using spark.

Could you please help why this could be ? This probably would be a set back for us in terms of operating efficiently as majority of our user base is python based and would like to play with the storage without having to download/ copy the file do the operations and then move the file back.

Please note that the linked service is created using terraform and usage of access keys and sas tokens are not viable option for us to authenticate as its been disabled and we are only allowed to access the storage account through linked services.

# This works
path = mssparkutils.fs.getMountPath("/test/csv/file.csv") # equals to /synfs/{jobId}/test
print(path)

# This works using spark
df = spark.read.load("synfs:/15/test/csv/file.csv", format='csv') 

# This works 
mssparkutils.fs.ls("synfs:/15/test/csv/file.csv")

# This doesn't work
with open("synfs:/15/test/csv/file.csv") as f:
    print(f.read())

# This doesn't work
with open("synfs/15/test/csv/file.csv") as f:
    print(f.read())

# This doesn't work
with open("/test/csv/file.csv") as f:
    print(f.read())
%%spark 
mssparkutils.fs.mount( 
    "abfss://contrainerName@storageaccountName.dfs.core.windows.net", 
    "/test", 
    Map("LinkedService" -> "linkedservicename") 
)
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,868 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 23,251 Reputation points
    2023-09-26T09:13:17.3366667+00:00

    I guess that there is a difference in how the Synapse native utilities (mssparkutils) and Python built-in file I/O functions (open()) handle file paths and access methods.

    Azure Synapse has utilities to interact with Azure Data Lake Storage (ADLS) and other Azure storage options natively, and those utilities know how to handle synfs:/ paths.

    On the other hand, Python's built-in open() function won't understand the synfs:/ path prefix. Instead, it expects traditional file system paths, and it won't be able to open files from ADLS directly unless you use an appropriate library or SDK that provides this functionality.

    Since you can read the file with Spark using spark.read.load(), consider doing your file operations directly on the dataframe.

        df = spark.read.load("synfs:/15/test/csv/file.csv", format='csv')
    

    If you really need to read the file content directly in Python without using Spark, consider using Azure SDK libraries. The Azure Data Lake Storage SDK for Python allows you to read files from ADLS Gen2. However, this might involve authenticating in a different way, which may or may not be suitable for your environment.

    Another option, If you only want the content of the file as a string in Python (especially if the file is not too large), you can read the file using Spark and then collect it to convert to a Python string:

        rdd = spark.sparkContext.textFile("synfs:/15/test/csv/file.csv")
        content = "\n".join(rdd.collect())
        print(content)
    
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.