Use Visual Studio Code with Databricks Connect for Scala

Note

This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.

This article covers how to use Databricks Connect for Scala with Visual Studio Code. Databricks Connect enables you to connect popular IDEs, notebook servers, and other custom applications to Azure Databricks clusters. See What is Databricks Connect?. For the Python version of this article, see Use Visual Studio Code with Databricks Connect for Python.

Note

Before you begin to use Databricks Connect, you must set up the Databricks Connect client.

To use Databricks Connect and Visual Studio Code with the Scala (Metals) extension to create, run, and debug a sample Scala sbt project, follow these instructions. You can also adapt this sample to your existing Scala projects.

  1. Make sure that the Java Development Kit (JDK) and Scala are installed locally. Databricks recommends that your local JDK and Scala version match the version of the JDK and Scala on your Azure Databricks cluster.

  2. Make sure that the latest version of sbt is installed locally.

  3. Install the Scala (Metals) extension for Visual Studio Code.

  4. In Visual Studio Code, open the folder where you want to create your Scala project (File > Open Folder).

  5. On the sidebar, click the Metals extension icon, and then click New Scala project.

  6. In the Command Palette, choose the template named scala/hello-world.g8, and complete the on-screen instructions to finish creating the Scala project in the specified folder.

  7. Add project build settings: In the Explorer view (View > Explorer), open the build.sbt file from the project’s root, replace the file’s contents with the following, and save the file:

    scalaVersion := "2.12.15"
    
    libraryDependencies += "com.databricks" % "databricks-connect" % "14.0.0"
    

    Replace 2.12.15 with your installed version of Scala, which should match the version that is included with the Databricks Runtime version on your cluster.

    Replace 14.0.0 with the version of the Databricks Connect library that matches the Databricks Runtime version on your cluster. You can find the Databricks Connect library version numbers in the Maven central repository.

  8. Add Scala code: Open the src/main/scala/Main.scala file relative to the project’s root, replace the file’s contents with the following, and save the file:

    import com.databricks.connect.DatabricksSession
    import org.apache.spark.sql.SparkSession
    
    object Main extends App {
      val spark = DatabricksSession.builder().remote().getOrCreate()
      val df = spark.read.table("samples.nyctaxi.trips")
      df.limit(5).show()
    }
    
  9. Build the project: Run the command >Metals: Import build from the Command Palette.

  10. Add project run settings: In the Run & Debug view (View > Run), click the link labelled create a launch.json file.

  11. In the Command Palette, select Scala Debugger.

  12. Add the following run configuration to the launch.json file, and then save the file:

    {
      // Use IntelliSense to learn about possible attributes.
      // Hover to view descriptions of existing attributes.
      // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
      "version": "0.2.0",
      "configurations": [
        {
          "type": "scala",
          "request": "launch",
          "name": "Scala: Run main class",
          "mainClass": "Main",
          "args": [],
          "jvmOptions": []
        }
      ]
    }
    
  13. Run the project: Click the play (Start Debugging) icon next to Scala: Run main class. In the Debug Console view (View > Debug Console), the first 5 rows of the samples.nyctaxi.trips table appear. All Scala code runs locally, while all Scala code involving DataFrame operations runs on the cluster in the remote Azure Databricks workspace and run responses are sent back to the local caller.

  14. Debug the project: Set breakpoints in your code, and then click the play icon again. All Scala code is debugged locally, while all Scala code continues to run on the cluster in the remote Azure Databricks workspace. The core Spark engine code cannot be debugged directly from the client.