Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
Remote Development is in Beta.
Databricks Remote Development allows you to access your workspace and interactively run workloads on Databricks compute from IDEs using an SSH tunnel. It is simple to set up, eliminates the need for environment management, and keeps all code and data secure within your Databricks workspace.
Requirements
To use Remote Development, you must have:
- Databricks CLI version 0.269 or higher installed on your local machine and authentication configured. See Install.
- A dedicated (single-user) cluster running Databricks Runtime 17.0 or above. See Dedicated compute overview. In addition:
- Unity Catalog must be enabled.
- If a compute policy exists, it must not prohibit jobs execution.
Set up the SSH connection
First, set up the SSH tunnel using the databricks ssh setup command. Replace <connection-name> with the name for the connection, for example, my-connection.
databricks ssh setup --name <connection-name>
The CLI prompts you to select a cluster. You can also specify one directly with --cluster <cluster-id>:
databricks ssh setup --name <connection-name> --cluster <cluster-id>
Note
For IntelliJ users, Databricks recommends adding --auto-start-cluster=false to the setup command and starting the cluster manually before connecting. This is because JetBrains IDEs start all configured clusters on launch, which can result in unexpected compute charges.
Connect to Databricks
Next, connect to Databricks using an IDE or terminal.
Connect using Visual Studio Code or Cursor
For Visual Studio Code, install the Remote SSH extension. Cursor includes a remote SSH extension by default.
In the IDE main menu, click View > Command Palette. Select Remote-SSH: Settings. Alternatively, select Preferences: Open User Settings (JSON) to modify
settings.jsondirectly.Under Remote.SSH: Default Extensions (or
remote.SSH.defaultExtensionsinsettings.json), addms-Python.Pythonandms-toolsai.jupyter.If you are modifying
settings.json:"remote.SSH.defaultExtensions": [ "ms-Python.Python", "ms-toolsai.jupyter" ]Note
Optionally, increase the value of Remote.SSH: Connect Timeout (or
remote.SSH.connectTimeoutinsettings.json) to further reduce the chance of timeout errors. The default timeout is 360.In the Command Palette, select Remote-SSH: Connect to Host.
From the dropdown, select the connection you set up in the first step. The IDE proceeds to connect in a new window.
Connect using IntelliJ IDEs
- Follow the Remote server tutorial to get set up.
- On the new connection screen, enter:
- Username:
root - Host:
<connection-name>
- Username:
Connect using terminal
ssh <connection-name>
Open projects
After connecting, use Open Folder from the Command Palette and navigate to /Workspace/Users/<your-username>.
Note
Files in /Workspace, /Volumes, and /dbfs persist across cluster restarts. Files in /home, /root, and other local paths are ephemeral and lost on restart.
Run code (Visual Studio Code or Cursor)
To run code using Remote Development, you need to ensure that the Databricks virtual environment is set up. This environment includes all built-in DBR libraries and compute-scoped libraries.
Run
echo $DATABRICKS_VIRTUAL_ENVfrom a terminal within the IDE.Example output:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-xxx/bin/pythonOpen the Command Palette and choose Python: Select Interpreter. Paste the output from above.
Open a new terminal and the virtual environment should automatically activate.
To run a Jupyter notebook, make sure that the virtual environment is selected as the kernel. Click Select Kernel in the top right of the notebook.
Python files and .ipynb notebooks can be run and debugged using the standard Python and Jupyter extensions.
Manage Python dependencies
Python dependencies can be managed globally at the cluster level or scoped to individual projects using notebooks.
Cluster libraries (recommended)
Install dependencies via the workspace UI under Compute > Libraries. These persist across cluster restarts and are available in pythonEnv-xxx. See Cluster libraries.
Project-specific notebook setup
For project-scoped dependencies, run a notebook containing %pip install commands at the start of each session:
# Install from pyproject.toml
%pip install .
# Install from a requirements file
%pip install -r requirements.txt
# Install a wheel from Volumes or Workspace
%pip install /Volumes/catalog/schema/volume/your_library.whl
%pip commands include Databricks-specific guardrails and propagate dependencies to Spark executor nodes. This enables user-defined functions (UDFs) with custom dependencies.
For more examples, see Manage libraries with %pip commands.
You do not need to re-run the notebook if the session reconnects within 10 minutes. This is configurable using -shutdown-delay in your SSH configuration.
Note
Multiple SSH sessions on the same cluster share one virtual environment.
Limitations
Databricks Remote Development has the following limitations:
- Shared clusters with multiple users and serverless are not yet supported.
- The Databricks extension for Visual Studio Code and Remote Development are not yet compatible and should not be used together.
- Files edited outside
/Workspace,/Volumes, and/dbfsare lost on cluster restart. - A maximum of 10 SSH connections are allowed per cluster.
- Inactive sessions may drop after 1 hour.
Databricks Notebooks differences
There are some differences in notebooks when using Remote Development:
- Python files don’t define any Databricks globals (like
sparkordbutils). You must import them explicitly withfrom databricks.sdk.runtime import spark. - For ipynb notebooks, these features are available:
- Databricks globals:
display,displayHTML,dbutils,table,sql,udf,getArgument,sc,sqlContext,spark %sqlmagic command to execute SQL cells
- Databricks globals:
To work with Python source “notebooks”:
Search for
jupyter.interactiveWindow.cellMarker.codeRegexand set it to:^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])Search for
jupyter.interactiveWindow.cellMarker.defaultand set it to:# COMMAND ----------
Troubleshooting
This section contains information about resolving common issues.
SSH connection fails or times out
- Verify the cluster is running in the workspace UI.
- Check that outbound port 22 is open and allowed on your laptop, network, and VPN.
- Increase the SSH timeout. See Connect using Visual Studio Code or Cursor.
- For key mismatch errors, delete
~/.databricks/ssh-tunnel-keysand re-rundatabricks ssh setup. - For "remote host identification has changed" errors, check the
~/.ssh/known_hostsfile and delete entries related to your cluster. - SSH sessions may drop after 1 hour and no more than 10 SSH connections can be made to a single cluster. See Limitations.
CLI authentication errors
- Confirm your Databricks CLI profile is valid using
databricks auth login. - Confirm you have
CAN MANAGEpermissions on the cluster.
My code doesn't work
- Ensure you've set up the Databricks virtual environment, see Run code (Visual Studio Code or Cursor)
- IPYNB notebooks and
*.pyDatabricks notebooks have access to Databricks globals, but Python*.pyfiles don’t. See Databricks Notebooks differences.
Files disappear or environment resets after cluster restart
- Files in
/Workspace,/Volumes, and/dbfsmounts persist across cluster restarts. Files in/home,/root, and other local paths are ephemeral and lost on restart. - Use cluster library management for persistent dependencies. Automate reinstalls using init scripts if needed. See What are init scripts?.
SSH setup fails on Windows (WSL)
Run databricks ssh setup directly on Windows, not within WSL. The Windows VS Code instance cannot find SSH configurations created on the WSL side.
FAQ
How is Remote Development different from Databricks Connect?
Databricks Connect allows you to write code using Spark APIs and run them remotely on Databricks compute instead of in the local Spark session. The Databricks Visual Studio Code extension uses Databricks Connect to provide built-in debugging of user code on Databricks.
Remote Development allows you to access the workspace from your IDE and moves your entire development environment onto the cluster — Python, kernel, and all execution runs on Databricks with full access to cluster resources.
How is my code and data secured?
All code runs within your Databricks cloud VPC. No data or code leaves your secure environment. SSH traffic is fully encrypted.
Which IDEs are supported?
Visual Studio Code and Cursor are officially supported. Any IDE with SSH capabilities is compatible, but only VS Code and Cursor are tested.
Are all Databricks notebook features available from the IDE?
Some features such as display(), dbutils, and %sql are available with limitations or manual setup. See Databricks Notebooks differences.
Will my cluster start automatically when I connect using the SSH tunnel?
Yes, but if it takes longer to start the cluster than the connect timeout, the connection attempt will fail. To prevent this, increase the value of Remote.SSH: Connect Timeout from the command palette (or remote.SSH.connectTimeout in settings.json) to further reduce the chance of timeout errors.
How do I know if my cluster is running?
Navigate to Compute in the Databricks workspace UI, and check the status of the cluster. The cluster must show Running for SSH connection to work.
How do I disconnect my SSH/IDE session?
You can disconnect a session by closing your IDE window, using the Disconnect option in your IDE, closing your SSH terminal, or running the exit command in the terminal.
How do I stop the cluster and avoid charges when I'm not working?
To stop immediately, terminate the cluster from the workspace UI. Navigate to Compute in the Databricks workspace UI, find your cluster, and click Terminate or Stop.
Set a short auto-termination policy on your cluster from the workspace UI. After you disconnect, the SSH server waits for the shutdown-delay period (default: 10 minutes), then the cluster's idle timeout applies.
How should I handle persistent dependencies?
Dependencies installed during a session are lost after cluster restart. Use persistent storage (/Workspace/Users/<your-username>) for requirements and setup scripts. Use cluster libraries or init scripts for automation.
What authentication methods are supported?
Authentication uses the Databricks CLI and your ~/.databrickscfg profiles file. SSH keys are handled by Databrick Remote Development.
Can I connect to external databases or services from the cluster?
Yes, as long as your cluster networking allows outbound connections and you have the necessary libraries.
Can I use additional IDE extensions?
Most extensions work when installed within your remote SSH session, depending on your IDE and cluster. Visual Studio Code by default doesn’t install local extensions on remote hosts. You can manually install them by opening the extensions panel and enabling your local extensions on the remote host. You can also configure Visual Studio Code to always install certain extensions remotely. See Connect to Databricks.
Does Remote Development support Private Link?
Yes, however workspace admins must allowlist URLs of VS Code and Cursor extension marketplaces. Users' local machines must also have the ability to access the internet.