If you are using Spark pools in Azure Synapse, you can easily read multiple Parquet files by specifying the directory path or using a wildcard pattern in the path. The Spark DataFrame API allows you to read all Parquet files in a specified directory or match a pattern.
# Read multiple Parquet files from a directory
df = spark.read.parquet("/path/to/your/directory/")
# Or using a wildcard pattern to match specific files
df = spark.read.parquet("/path/to/your/directory/prefix*.parquet")
Another approach is reading multiple Parquet files with SQL on-demand. You generally specify the folder path, and Azure Synapse automatically reads all Parquet files within that folder. However, using wildcards directly in the OPENROWSET
function isn't supported in the same way as in Spark. Instead, you specify the directory, and it processes all Parquet files in that directory.
SELECT *
FROM OPENROWSET(
BULK 'https://yourstorageaccount.dfs.core.windows.net/yourfilesystem/path/to/your/directory/',
FORMAT='PARQUET'
) AS [result]