Where to store delta-files

Christian Gert Hansen 1 Reputation point
2020-10-05T13:05:54.297+00:00

Hey guys!

I have been wondering about the below question for a while and I hope you can help me get a good night sleep again.

Question:
When you are working in Azure DataBricks to do transformations, you can save the results using a delta format. I'm in doubt where to store these delta-files.?

  1. You can store the delta-files in your Data Lake.
  2. You can store the delta-files in your DBFS.

Maybe I'm missing something a long the way but I would choose storing the files in DBFS. Not many tools can access delta-files, so storing the files in your Data Lake does not make sense? Is there any advantage in cost or performance I am missing?
I guess it will be faster to read in the files in Azure DataBricks when it's stored in DBFS......

I hope someone will comment on my thoughts.

br,
Christian

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,389 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,005 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,386 Reputation points Microsoft Employee
    2020-10-05T23:36:36.16+00:00

    Hello @Christian Gert Hansen ,
    Thanks for the question and also for using this forum.

    Just to be sure are you asking if we can save files in DBFS root vs data lake ?

    Databricks File System (DBFS) is a distributed file system mounted into an Azure Databricks workspace and available on Azure Databricks clusters. Allows you to mount storage objects so that you can seamlessly access data without requiring credentials, basicaaly its just a wrapper and you can have any storage supported by ADB .

    It is not recommended not to save any user files or objects. Instead, create a different blob storage directory and mount it to DBFS.
    https://kb.databricks.com/dbfs/dbfs-root-permissions.html

    But yes since the file is locally available , data retrieval is going to be fast .

    Thanks
    Himanshu