Use IntelliJ IDEA with Databricks Connect for Scala
Note
This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.
This article covers how to use Databricks Connect for Scala and IntelliJ IDEA with the Scala plugin. Databricks Connect enables you to connect popular IDEs, notebook servers, and other custom applications to Azure Databricks clusters. See What is Databricks Connect?.
Note
Before you begin to use Databricks Connect, you must set up the Databricks Connect client.
To use Databricks Connect and IntelliJ IDEA with the Scala plugin to create, run, and debug a sample Scala sbt
project, follow these instructions. These instructions were tested with IntelliJ IDEA Community Edition 2023.3.6. If you use a different version or edition of IntelliJ IDEA, the following instructions might vary.
Make sure that the Java Development Kit (JDK) is installed locally. Databricks recommends that your local JDK version match the version of the JDK on your Azure Databricks cluster.
Start IntelliJ IDEA.
Click File > New > Project.
Give your project some meaningful Name.
For Location, click the folder icon, and complete the on-screen directions to specify the path to your new Scala project.
For Language, click Scala.
For Build system, click sbt.
In the JDK drop-down list, select an existing installation of the JDK on your development machine that matches the JDK version on your cluster, or select Download JDK and follow the on-screen instructions to download a JDK that matches the JDK version on your cluster.
Note
Choosing a JDK install that is above or below the JDK version on your cluster might produce unexpected results, or your code might not run at all.
In the sbt drop-down list, select the latest version.
In the Scala drop-down list, select the version of Scala that matches the Scala version on your cluster.
Note
Choosing a Scala version that is below or above the Scala version on your cluster might produce unexpected results, or your code might not run at all.
For Package prefix, enter some package prefix value for your project’s sources, for example
org.example.application
.Make sure the Add sample code box is checked.
Click Create.
Add the Databricks Connect package: with your new Scala project open, in your Project tool window (View > Tool Windows > Project), open the file named
build.sbt
, in project-name > target.Add the following code to the end of the
build.sbt
file, which declares your project’s dependency on a specific version of the Databricks Connect library for Scala:libraryDependencies += "com.databricks" % "databricks-connect" % "14.3.1"
Replace
14.3.1
with the version of the Databricks Connect library that matches the Databricks Runtime version on your cluster. You can find the Databricks Connect library version numbers in the Maven central repository.Click the Load sbt changes notification icon to update your Scala project with the new library location and dependency.
Wait until the
sbt
progress indicator at the bottom of the IDE disappears. Thesbt
load process might take a few minutes to complete.Add code: in your Project tool window, open the file named
Main.scala
, in project-name > src > main > scala.Replace any existing code in the file with the following code and then save the file:
package org.example.application import com.databricks.connect.DatabricksSession import org.apache.spark.sql.SparkSession object Main { def main(args: Array[String]): Unit = { val spark = DatabricksSession.builder().remote().getOrCreate() val df = spark.read.table("samples.nyctaxi.trips") df.limit(5).show() } }
Run the code: start the target cluster in your remote Azure Databricks workspace.
After the cluster has started, on the main menu, click Run > Run ‘Main’.
In the Run tool window (View > Tool Windows > Run), in the Main tab, the first 5 rows of the
samples.nyctaxi.trips
table appear. All Scala code runs locally, while all Scala code involving DataFrame operations runs on the cluster in the remote Azure Databricks workspace and run responses are sent back to the local caller.Debug the code: start the target cluster in your remote Azure Databricks workspace, if it is not already running.
In the preceding code, click the gutter next to
df.limit(5).show()
to set a breakpoint.After the cluster has started, on the main menu, click Run > Debug ‘Main’.
In the Debug tool window (View > Tool Windows > Debug), in the Console tab, click the calculator (Evaluate Expression) icon.
Enter the expression
df.schema
and click Evaluate to show the DataFrame’s schema.In the Debug tool window’s sidebar, click the green arrow (Resume Program) icon.
In the Console pane, the first 5 rows of the
samples.nyctaxi.trips
table appear. All Scala code runs locally, while all Scala code involving DataFrame operations runs on the cluster in the remote Azure Databricks workspace and run responses are sent back to the local caller.