Storage Approach Comparison in Azure ADLS Gen2

Relay 200 Reputation points
2025-07-02T16:54:29.0666667+00:00

Hello,

May someone please help me with valuable suggestion which approach I should Take. It would be great if share thoughts with Pros and Cons.

Scenario 1 :

User's image

Scenerio 2 :

User's image

Scenerio 1 Details:

1.Silver layer stored in Azure SQL / Cosmos (normalized,transactional)

2.ADF exports data from CIDH to satellite ADLS.

3.Satellite uses Databricks to build Delta Silver and Goldlayers and share data to other satelites or core.

4.Operational and analytical systems are logically separated

Scenerio 2 details :

1.Lakehouse Architecture with Databricks and ADLS in CIDHsubscription itself.

2.SQL layer still needed but can be reconsidered for long term.

3.Curated data is shared with Satellite as needed.

4.Satellite still performs selective enrichment.

Please help me to choose better approach.

Thanks

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,563 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 34,731 Reputation points Volunteer Moderator
    2025-07-02T20:22:14.5666667+00:00

    Hello Relay !

    Thank you for posting on Microsoft Learn.

    I tried to split your problem as below :

    What is the main difference in how data is stored and accessed?

    Scenario 1 stores the Silver layer in Azure SQL or Cosmos DB, which is great for structured, transactional workloads and systems that require relational integrity. Data is then exported via ADF to satellite ADLS for further transformation.

    In the other hand, scenario 2 keeps the Silver layer natively in ADLS Gen2 as Delta tables and processes it using Databricks directly in the CIDH subscription, adopting a lakehouse architecture. This removes one layer of redundancy and simplifies architecture by using a single format and platform.

    Which approach is better for analytics and large-scale data?

    Scenario 2 is better suited for large-scale analytical queries. Databricks with Delta Lake can scale to petabytes using distributed processing and optimizations like Z-ordering and caching. Scenario 1 relies on Azure SQL or Cosmos DB, which are not as efficient or scalable for analytics, especially if data volumes grow. So, if the primary purpose of your Silver layer is analytics, scenario 2 provides better performance and flexibility.

    Which one is more cost-effective?

    Scenario 2 is more cost-efficient because it avoids data duplication. In Scenario 1, you store the same data twice, once in SQL/Cosmos and again in ADLS. Plus, you pay for compute and maintenance across both systems. Scenario 2 avoids these duplicated costs by storing curated data only in ADLS and using compute resources like Databricks for processing as needed. This approach aligns better with pay-as-you-use object storage and decoupled compute, lowering your TCO over time.

    How well does each handle real-time or streaming data?

    Scenario 2 is natively more suitable for near-real-time or streaming pipelines. It integrates easily with Event Hub, Kafka, and Delta Live Tables to handle continuous data ingestion and transformation. Scenario 1, while capable of handling streaming via SQL and ADF, introduces latency due to extra hops between ingestion and transformation. If your use case involves streaming data, IoT feeds, or real-time dashboards, Scenario 2 is the more appropriate choice.

    What about data governance and security?

    Both scenarios can support governance through tools like Azure Purview or Unity Catalog. However, scenario 2 offers more modern governance capabilities such as Delta Sharing, fine-grained access control, and time travel with Delta transaction logs. It’s easier to apply centralized RBAC and data policies in the lakehouse model, especially when using Unity Catalog across workspaces. In scenario 1, managing access across SQL and ADLS requires more manual coordination.

    In my opinion, scenario 2 is more aligned with the future direction of cloud data architectures. Microsoft and Databricks are both heavily investing in lakehouse technologies like Fabric, Delta Sharing, Unity Catalog. More 3rd party tools and BI platforms now support Parquet and Delta natively. Scenario 1, while functional today, introduces dependencies on legacy patterns that may limit agility and performance in the long run.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.