Delta lake vs Azure data lake

Samy Abdul 3,376 Reputation points
2021-07-17T16:52:01.14+00:00

Hi All, I understand we ingest the data from varied sources and through ADF and build the ADLS Gen2 Data lake. But Data lake is doesn't allow ACID transactions, where as Delta lake which mostly build through data bricks does provide ACID transactions feature, I understand by using Synapse we could overcome this challenge. But ,I am bit confused why can't we straightaway go for Delta lake and avail this feature. What scenarios does call for data lake instead of delta lake. Please elaborate on this. Thanks.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,499 questions
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,106 Reputation points
    2021-07-19T23:53:27.947+00:00

    Hello @Samy Abdul and welcome to Microsoft Q&A.

    My understanding is that a data lake is basically just a place to store all your less-structured data ( compared to relational database ).
    The Delta Lake, if I understand correctly, is a practice where in your Data Lake you focus on writing the changes made, rather than updating the data itself. Like Change Capture. So here the only reason why Delta is atomic, is that you are just appending a line stating what is changed, rather than having to update a flat file.

    To any experts out there, feel free to correct me if I got something wrong.


1 additional answer

Sort by: Most helpful
  1. Tiago Moraes 16 Reputation points
    2021-08-05T20:59:14.087+00:00

    In my opinion, Azure Data Lake Storage is not competing against Delta Lake, actually Delta Lake is built on top of ADLS. They serve different purposes, for example, you should always have your raw data, to reprocess, and this can reside in ADLS, but if you are already using Databricks, you can read the raw data from ADLS, transforming, and ingest into Delta Lake. You can put add as a step into your ADF our Synpase, but if you are not using Databricks, maybe you want to save money, you can have a different strategy for rollback operations, maybe you can reprocess only one day, or one hour. First, as I always says, we need to understand the business need before define the architecture.

    I found the following link explaining how ADLS and Delta Lake can work together - https://techcommunity.microsoft.com/t5/analytics-on-azure/simplify-your-lakehouse-architecture-with-azure-databricks-delta/ba-p/2027272

    3 people found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.