I guess that there is a difference in how the Synapse native utilities (mssparkutils) and Python built-in file I/O functions (open()) handle file paths and access methods.
Azure Synapse has utilities to interact with Azure Data Lake Storage (ADLS) and other Azure storage options natively, and those utilities know how to handle synfs:/
paths.
On the other hand, Python's built-in open()
function won't understand the synfs:/
path prefix. Instead, it expects traditional file system paths, and it won't be able to open files from ADLS directly unless you use an appropriate library or SDK that provides this functionality.
Since you can read the file with Spark using spark.read.load()
, consider doing your file operations directly on the dataframe.
df = spark.read.load("synfs:/15/test/csv/file.csv", format='csv')
If you really need to read the file content directly in Python without using Spark, consider using Azure SDK libraries. The Azure Data Lake Storage SDK for Python allows you to read files from ADLS Gen2. However, this might involve authenticating in a different way, which may or may not be suitable for your environment.
Another option, If you only want the content of the file as a string in Python (especially if the file is not too large), you can read the file using Spark and then collect it to convert to a Python string:
rdd = spark.sparkContext.textFile("synfs:/15/test/csv/file.csv")
content = "\n".join(rdd.collect())
print(content)