Hello azure_learner,
Greetings! Welcome to Microsoft Q&A Platform.
Azure Data Lake Storage Gen2 isn't a dedicated service or account type. It's a set of capabilities that support high throughput analytic workloads. The Data Lake Storage Gen2 documentation provides best practices and guidance for using these capabilities.
Azure Data Lake Storage Gen2 isn't a dedicated service or account type. It's a set of capabilities that support high throughput analytic workloads. The Data Lake Storage Gen2 documentation provides best practices and guidance for using these capabilities. refer - https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices.
It’s common to use Blob Storage as the initial landing zone for raw data. This allows for scalable and cost-effective storage. Using Databricks for data transformations is a good practice. It provides a powerful platform for processing large datasets and performing complex transformations. Alternatively, you can ingest data directly into ADLS Gen2 and use Databricks to read and transform data from there.
If you need to preserve raw data for reprocessing if needed you can store data in Blob Storage and then move it to ADLS Gen2, it might seem like duplication. However, this can be managed by:
Deleting Raw Data: After transformation, you can delete the raw data from Blob Storage to avoid duplication.
Silver Layer: Allows for intermediate transformations and quality checks.
Gold Layer: Provides final, ready-to-use data for reporting.
Perform transformations in Databricks, moving data from Bronze to Silver and then to Gold layers. Ensure raw data in the bronze layer is archived or deleted after processing to avoid unnecessary storage costs. This approach balances the need for data integrity, transformation flexibility, and storage efficiency. This approach balances the need for data integrity, transformation flexibility, and storage efficiency.
refer for more details- https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-overview, https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/scenarios/cloud-scale-analytics/best-practices/data-ingestion, https://learn.microsoft.com/en-us/azure/databricks/ingestion/lakeflow-connect/workday/workday-reports/.
Hope this answer helps! please let us know if you have any further queries. I’m happy to assist you further.
Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members