How to connect data from one cluster to another cluster in Databricks SQL

Question

How to connect data from one cluster to another cluster in Databricks SQL

Agrani Nimshani 0

I need to load a table from one cluster to another cluster.

Olaf Helper 47,436 Reputation points

2023-04-03T07:44:14.65+00:00

And what's the problem doing it?
PRADEEPCHEEKATLA 90,641 Reputation points Moderator

2023-04-06T05:51:23.2033333+00:00

Agrani Nimshani - Just checking in to see if the below answer provided by @Susheel Bhatt helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

Olaf Helper 47,436 Reputation points

2023-04-03T07:44:14.65+00:00

And what's the problem doing it?
PRADEEPCHEEKATLA 90,641 Reputation points Moderator

2023-04-06T05:51:23.2033333+00:00

Agrani Nimshani - Just checking in to see if the below answer provided by @Susheel Bhatt helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

To connect data from one Azure Databricks cluster to another Azure Databricks cluster in Databricks SQL, you can use the following steps. But keep in mind that you need to ensure these two clusters are connected to the same Azure Data Lake Storage account, and that the appropriate permissions are set for the two clusters to access the same data in the storage account.

Connect to the source cluster and create an external table to represent the source data. You can use the CREATE TABLE statement with the USING clause to create the external table, and specify the location of the data in the source cluster. For example:

CREATE TABLE source_table
USING parquet
OPTIONS (
  'path' 'dbfs:/mnt/source-data'
)

Connect to the target cluster and create a database to hold the imported data. You can use the CREATE DATABASE statement to create the database. For example:

CREATE DATABASE target_db;

Create a table in the target cluster to hold the imported data. You can use the CREATE TABLE statement with the USING clause to create the table, and specify the location where the data will be imported. For example:

CREATE TABLE target_table
USING parquet
OPTIONS (
  'path' 'dbfs:/mnt/target-data'
)

In the target cluster, use the INSERT INTO statement to import the data from the external table in the source cluster into the table in the target cluster. You can use a subquery to select the data from the external table. For example:

INSERT INTO target_table SELECT * FROM source_table

Share via

How to connect data from one cluster to another cluster in Databricks SQL

1 answer

Your answer