Append row to existing parquet file Spark + Java

Sweetnesh Dholariya 1 Reputation point
2022-10-19T07:36:37.18+00:00

We are reading data from Kafka stream and creating parquet file after every one clock hour in Azure data lake, but we want to reduce the delay in creating parquet file, mean want to create parquet file and append new record to same parquet file until same clock hour and create new parquet file from next clock hour.

If we are updating existing parquet file with append mode after every 5 minutes then it is creating new parquet file, meaning it is creating 12 parquet file in one clock hour.

Please suggest any way to append record to existing parquet file.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,499 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,466 Reputation points Microsoft Employee
    2022-10-20T17:01:53.49+00:00

    Hi @Sweetnesh Dholariya ,

    Thank you for posting query in Microsoft Q&A Platform.

    Files in Storage accounts will not get append. They always get overwrite when we save with same name.

    In your case since your file names are different, you are seeing 12 different files.

    If you want to read all these 12 files data as single dataset then very easily you can read them as single dataset into your down streams using azure data factory or azure synapse analytics services.

    So, we cannot append in to same file in storage accounts. The way you are having 12 different files in same folder is actually good way handling things.

    Please would like to know how exactly we can read all files data as single dataset using azure data factory or azure synapse analytics please let me know, I can share more details on it.

    Hope this helps. Please let me know if any further queries.

    -------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.