To connect data from one Azure Databricks cluster to another Azure Databricks cluster in Databricks SQL, you can use the following steps. But keep in mind that you need to ensure these two clusters are connected to the same Azure Data Lake Storage account, and that the appropriate permissions are set for the two clusters to access the same data in the storage account.
- Connect to the source cluster and create an external table to represent the source data. You can use the
CREATE TABLE
statement with theUSING
clause to create the external table, and specify the location of the data in the source cluster. For example:
CREATE TABLE source_table
USING parquet
OPTIONS (
'path' 'dbfs:/mnt/source-data'
)
- Connect to the target cluster and create a database to hold the imported data. You can use the
CREATE DATABASE
statement to create the database. For example:
CREATE DATABASE target_db;
- Create a table in the target cluster to hold the imported data. You can use the
CREATE TABLE
statement with theUSING
clause to create the table, and specify the location where the data will be imported. For example:
CREATE TABLE target_table
USING parquet
OPTIONS (
'path' 'dbfs:/mnt/target-data'
)
- In the target cluster, use the
INSERT INTO
statement to import the data from the external table in the source cluster into the table in the target cluster. You can use a subquery to select the data from the external table. For example:
INSERT INTO target_table SELECT * FROM source_table