Is it possible to access Databricks DBFS from Azure Data Factory?

NewUser 1 Reputation point
2020-11-09T13:07:49.41+00:00

I would like to use the Copy Data Activity in Data Factory to move data from/to Databricks DBFS. I have my databricks workspace successfully linked to Data Factory as a linked service.

If I select Azure Delta Storage as a dataset source or sink, I am able to access the tables in the cluster (not the DBFS) and preview the data, but when validating there is an error because the tables are not delta tables.

Is there a way to directly connect to DBFS?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,236 Reputation points
    2020-11-09T17:52:24.787+00:00

    Hello @NewUser and welcome to Microsoft Q&A.

    I think the easiest way to make data available to DBFS, is to first have DBFS mount a blob storage or data lake gen2 account. Then have Data Factory write the files to that storage account.

    This has a few upsides over trying to write directly to the DBFS root. By confining the writes done by Data Factory to the mounted storage, the rest of DBFS is protected from accidental overwrites and helps keep your data secure by making it impossible for Data Factory to extract data outside this silo.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.