UPSERT IN CSV FILES IN AZURE BLOB STORAGE

Rakesh Kumar 45 Reputation points
2023-11-22T17:55:39.3266667+00:00

Hi, I have 5 tables need to store the data in data lake in table structure like:

tablename/year/month

I have done till the incremental load using data factory. But if anybody updates any data then how can we achieve this in azure blob storage

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,424 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,128 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 19,626 Reputation points
    2023-11-23T11:57:27.5333333+00:00

    I think it is not possible based on this thread : https://stackoverflow.com/questions/61174460/append-data-to-existing-file-in-azure-data-lake-storage-from-rest-api/61189794#61189794

    A workaround maybe : https://stackoverflow.com/questions/71238452/how-to-update-csv-file-placed-at-blob-storage-in-azure

    You can work on incremental updates to a Blob Storage file using Azure Databricks and in ADF you can call the databricks notebook activity

    In this you will have to mount the Azure storage on databricks cluster first

    Then you will have to read the original and incremental file or data as a dataframe

    You can perform joins on the files and create an incremental data frame which you can overwrite to your original file which will be your incremental data and that too in a single file