Share via

delta format in ADLS

Anshal 2,251 Reputation points
2024-01-08T10:22:26.5466667+00:00

Hi Friends, I am getting very confused with the delta format in Azure Data Lake. I read that Delta Lake is the default format of data bricks. In storing curated or gold layer data in delta format in  Azure Data Lake we can Azure dataflows inline dataset but if my data is a huge volume it will be a costly solution. is it better to use databricks for this or Dataflows? what is the commonly used technology to use delta format by azure datalake by using ADF? please share links or information to understand the Delta Lake format storage in datalake.

Azure Data Lake Storage
Azure Data Lake Storage

An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.

0 comments No comments
{count} votes

Answer accepted by question author
  1. Smaran Thoomu 33,840 Reputation points Microsoft External Staff Moderator
    2024-01-08T13:26:05.2033333+00:00

    Hi @Anshal ,

    Thank you for reaching out to the Azure community forum with your query about the Delta format in Azure Data Lake. I'll do my best to help you out.

    Delta Lake is an open-source storage layer that brings reliability to data lakes. It provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake is the default format of Databricks, but it can also be used with other Azure services like Azure Data Factory (ADF) and Azure Dataflows.

    Dataflows is a low-code data integration service that provides a visual interface for building data transformation pipelines. To store curated or gold layer data in delta format in Azure Data Lake, you can use Azure Dataflows inline dataset. However, if your data is of huge volume, it might be a costly solution. In that case, you can use Databricks for this purpose. However, Dataflows may not be as flexible or powerful as Databricks for complex data processing tasks.

    Databricks, on the other hand, is a powerful data processing and analytics platform that provides a wide range of tools and features for working with Delta Lake. Databricks is well-suited for complex data processing tasks and large-scale data analytics. However, Databricks can be expensive and may require specialized skills to use effectively.

    Regarding your question about whether to use Databricks or Dataflows for storing curated or gold layer data in delta format in Azure Data Lake, it depends on your specific use case and requirements. If you have a huge volume of data, using Azure Dataflows inline dataset may not be the most cost-effective solution. In this case, Databricks may be a better option as it provides a scalable and cost-effective solution for processing large volumes of data. To choose between Databricks and Dataflows, you should consider factors such as the complexity of your data processing tasks, the size of your data, your budget, and the skills of your team.

    Here are some links to documentation and resources that may be helpful for understanding Delta Lake format storage in Azure Data Lake:

    1. Delta Lake documentation
    2. Azure Data Factory documentation on Delta Lake integration
    3. Databricks documentation on Delta Lake
    4. Delta format on Databricks

    I hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.