GDPR Handling In ADLS Gen2

Relay 200 Reputation points
2025-06-18T13:43:02.8566667+00:00

I am creating a Centralised Data LAKEHOUSE as shown in Diagram.

User's image

I have created a Second ADLS Gen2 so it easily connect with Databricks.

I am seeking your valuable help in designing ADLS Gen 2 for silver layer.

  1. Do This approach is good.
  2. Do I need to always duplicate data to ADLS Gen2 from SQL, or any Caching mechanism is available in Azure.
  3. How to improve cost efficiency.
  4. How ADLS Gen2 handle PII information.
  5. How we can assure there is no duplicacy in ADLS Gen2

Please help with your expertise thought.

Thanks.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,562 questions
0 comments No comments
{count} votes

Accepted answer
  1. Nandan Hegde 36,151 Reputation points MVP Volunteer Moderator
    2025-06-18T14:13:38.2433333+00:00

    Duplicating the data across Azure SQL database and ADLS Gen2 is a bad design as per my opinion.

    What is the significance of loading the data into Azure SQL database?

    You can create another container within the existing ADLS Gen 2 itself directly for the silver layer rather than introducing an Azure SQL Database as a bridge between.

    Also any specific reason ADLS Gen2 other than just connectivity to Databricks?

    And when you say handling PII data in ADLS (what do you mean)?

    In Azure SQL database, there are multiple mechanisms like Dynamic data masking or column encryptions to handle PII data but we do not have that flexibility in ADLS Gen 2

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.