How to read multiple csv files into one data frame using azure synapse notebooks

2024-04-03T12:39:56.23+00:00

Hi Team,

Could you please help us on how to read multiples files of csv from different subfolders into a single data frame using azure synapse note book using pyspark.

Eg:'abfss://test@testsalesdatalake.dfs.core.windows.net/Bronze/properties/2024/01/26/test1.csv','abfss:/test@testsalesdatalake.dfs.core.windows.net/Bronze/properties/2024/02/02/test1.csv','abfss:/test@testsalesdatalake.dfs.core.windows.net/Bronze/properties/2024/02/03/test2.csv'

Need to read above files in azure synapse notebooks, please share your thoughts on it.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,316 questions
{count} votes

1 answer

Sort by: Most helpful
  1. BhargavaGunnam-MSFT 25,796 Reputation points Microsoft Employee
    2024-04-03T17:35:19.2266667+00:00

    Hello SaiSekhar, MahasivaRavi (Philadelphia),

    Read multiple csv files using pyspark is discussed here: https://sparkbyexamples.com/spark/spark-read-multiple-csv-files/

    You can try the below code and let me know

    `from pyspark.sql.functions import *

    Define the file paths

    file_paths = ['abfss://test@testsalesdatalake.dfs.core.windows.net/Bronze/properties/2024/01/26/test1.csv', 'abfss:/test@testsalesdatalake.dfs.core.windows.net/Bronze/properties/2024/02/02/test1.csv', 'abfss:/test@testsalesdatalake.dfs.core.windows.net/Bronze/properties/2024/02/03/test2.csv']

    Read the CSV files into a single DataFrame

    df = spark.read.format("csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .load(file_paths)

    Show the DataFrame

    df.show() `

    0 comments No comments