@SaiSekhar, MahasivaRavi (Philadelphia) - Thanks for the question and using MS Q&A platform.
Apache Spark natively supports reading compressed gzip files into data frames directly. We have to specify the compression option accordingly to make it work.
Lets check that with an example on how to read file name - flat_event_params_20240202-000000000000.csv.gz
in azure synapse notebook.
Step1: We will read the file name - flat_event_params_20240202-000000000000.csv.gz
which has uploaded to the same storage account created by Synapse workspace.
Note: To copy the ABFSS path
which is used to replace the filename in step2.
Step2: Here's an example code snippet that shows how to read a compressed CSV file in Synapse Notebooks:
# Read zipped file directly from Spark
df_zipped = spark \
.read \
.format("csv") \
.option("compression", "gzip") \
.option("header", True) \
.load("dataset/tmp/sales.csv.gz") # Replace with copied ABFSS path
display(df_zipped)
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.