How to read csv.gz in azure synapse notebboks.

Question

How to read csv.gz in azure synapse notebboks.

SaiSekhar, MahasivaRavi (Philadelphia) 140

How to read gz file in synapse notebbok.

file name - flat_event_params_20240202-000000000000.csv.gz

Olaf Helper 47,436 Reputation points

2024-03-18T12:55:02.3366667+00:00

A GZ file is a compressed file like ZIP, you have to uncompress it first.
PRADEEPCHEEKATLA 90,641 Reputation points Moderator

2024-03-21T04:21:51.76+00:00

@SaiSekhar, MahasivaRavi (Philadelphia) - Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

Olaf Helper 47,436 Reputation points

2024-03-18T12:55:02.3366667+00:00

A GZ file is a compressed file like ZIP, you have to uncompress it first.
PRADEEPCHEEKATLA 90,641 Reputation points Moderator

2024-03-21T04:21:51.76+00:00

@SaiSekhar, MahasivaRavi (Philadelphia) - Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

@SaiSekhar, MahasivaRavi (Philadelphia) - Thanks for the question and using MS Q&A platform.

Apache Spark natively supports reading compressed gzip files into data frames directly. We have to specify the compression option accordingly to make it work.

Lets check that with an example on how to read file name - flat_event_params_20240202-000000000000.csv.gz in azure synapse notebook.

Step1: We will read the file name - flat_event_params_20240202-000000000000.csv.gz which has uploaded to the same storage account created by Synapse workspace.

Note: To copy the ABFSS path which is used to replace the filename in step2. User's image

Step2: Here's an example code snippet that shows how to read a compressed CSV file in Synapse Notebooks:

# Read zipped file directly from Spark
df_zipped = spark \
    .read \
    .format("csv") \
    .option("compression", "gzip") \
    .option("header", True) \
    .load("dataset/tmp/sales.csv.gz") # Replace with copied ABFSS path
display(df_zipped)

User's image

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

How to read csv.gz in azure synapse notebboks.

1 answer

Your answer