Resolving FileNotFoundError When Reading Parquet Files in Synapse Notebook

Question

In my Synapse Notebook, I aimed to read Parquet files. However, I encountered a 'FileNotFoundError' when attempting to use a wildcard. The folder structure I intend to access is as follows: 'test/year={yyyy}/month={MM}/day={dd}/*.parquet'. Here's the code snippet I executed:

df = pd.read_parquet('abfss://xxx@xxx.dfs.core.windows.net/test/*/*/*/*.parquet', storage_options='')

Any insights on resolving this issue would be appreciated.

df = pd.read_parquet('abfss://xxx@xxx.dfs.core.windows.net/test/*/*/*/*.parquet', storage_options='')

Any insights on resolving this issue would be appreciated.

Accepted Answer

Hi @Clover J

Thanks for the question and using MS Q&A platform.

As I see that pd.read_parquet() function does not support the wildcard character * in the path, which is why you are getting the FileNotFoundError. Instead, you can use the spark.read.parquet() function to read all the files under the specified folder.

Here's the corrected code snippet:

df = spark.read.parquet('abfss://xxx@xxx.dfs.core.windows.net/test/*/*/*/*') 
df.show()

This code will read all the Parquet files under the test folder, with the folder structure year={yyyy}/month={MM}/day={dd}. The df.show() function will display the contents of the DataFrame.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Resolving FileNotFoundError When Reading Parquet Files in Synapse Notebook

0 additional answers