How to read csv.gz in azure synapse notebboks.

2024-03-18T10:36:58.6133333+00:00

How to read gz file in synapse notebbok.

file name - flat_event_params_20240202-000000000000.csv.gz

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,323 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,128 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,496 Reputation points
    2024-03-19T07:29:31.67+00:00

    @SaiSekhar, MahasivaRavi (Philadelphia) - Thanks for the question and using MS Q&A platform.

    Apache Spark natively supports reading compressed gzip files into data frames directly. We have to specify the compression option accordingly to make it work.

    Lets check that with an example on how to read file name - flat_event_params_20240202-000000000000.csv.gz in azure synapse notebook.

    Step1: We will read the file name - flat_event_params_20240202-000000000000.csv.gz which has uploaded to the same storage account created by Synapse workspace.

    Note: To copy the ABFSS path which is used to replace the filename in step2.User's image

    Step2: Here's an example code snippet that shows how to read a compressed CSV file in Synapse Notebooks:

    # Read zipped file directly from Spark
    df_zipped = spark \
        .read \
        .format("csv") \
        .option("compression", "gzip") \
        .option("header", True) \
        .load("dataset/tmp/sales.csv.gz") # Replace with copied ABFSS path
    display(df_zipped)
    

    User's image

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.