Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
The Databricks SSH tunnel is in Beta.
The Databricks SSH tunnel allows you to connect your IDE to your Databricks compute. It is simple to set up, enables you to run and debug code interactively on the cluster, reduces environment mismatches, and keeps all code and data secure within your Databricks workspace.
Requirements
To use the SSH tunnel, you must have:
- The Databricks CLI version 0.269 or higher installed on your local machine and authentication configured. See Install.
- Compute in your Databricks workspace with dedicated (single user) access mode. See Dedicated compute overview.
- The compute must be using Databricks Runtime 17.0 and above.
- Unity Catalog must be enabled.
- If a compute policy exists, it must not prohibit jobs execution.
Set up the SSH tunnel
First, set up the SSH tunnel using the databricks ssh setup command. Replace <connection-name> with the name for the tunnel, for example, my-tunnel.
databricks ssh setup --name <connection-name>
The CLI prompts you to choose a cluster, or you can provide a cluster ID by passing --cluster <cluster-id>.
Note
For IntelliJ, Databricks recommends that you include –-auto-start-cluster=false in the setup command. Starting a JetBrains IDE automatically starts all clusters, which can result in unintended compute costs. If you set this option, you must start the cluster in the workspace to start the SSH tunnel.
Connect to Databricks
Next, connect to Databricks using an IDE or terminal.
Connect using Visual Studio Code or Cursor
For Visual Studio Code, install the Remote SSH extension. Cursor includes a remote SSH extension.
In the IDE main menu, click View > Command Palette. Select Remote-SSH: Settings. Alternatively, select Preferences: Open User Settings (JSON) to modify
settings.jsondirectly.Under Remote.SSH: Default Extensions (or
remote.SSH.defaultExtensionsinsettings.json), addms-Python.Pythonandms-toolsai.jupyter.If you are modifying
settings.json:"remote.SSH.defaultExtensions": [ "ms-Python.Python", "ms-toolsai.jupyter" ]Note
Optionally, increase the value of Remote.SSH: Connect Timeout (or
remote.SSH.connectTimeoutinsettings.json) to further reduce the chance of timeout errors. The default timeout is 360.In the Command Palette, select Remote-SSH: Connect to Host.
From the dropdown, select the tunnel you set up in the first step. The IDE proceeds to connect in a new window.
Note
If the compute is not running, it will be started. However, if it takes longer than the timeout for the compute to start, the SSH connection attempt will fail.
Select linux when prompted for the server type.
Connect using IntelliJ IDEs
Follow the Remote development tutorial to get set up.
On the new connection screen enter the following:
Username:
rootHost:<connection-name>
Connect using terminal
To connect to the Databricks from the command line, provide the ssh command the name of your connection, for example:
ssh my-tunnel
Open projects
- The initial connection opens an empty IDE window without any open folder. In Visual Studio Code, use the Open Folder command from the Command palette to open a desired project.
- Use the workspace mount (
/Workspace/Users/<your-username>) for persistent storage.
Run code (Visual Studio Code)
- If you open a Python project, the Python extension can automatically detect virtual environments, but you still need to manually activate the right one. Select the Interpreter command from the Command palette, and choose the environment
pythonEnv-xxx. This has access to all built-in Databricks Runtime libraries, or anything you’ve installed globally on the cluster. - In some cases the Python extension can’t automatically detect virtual environments (
venv), such as when you open a folder that can’t be recognized as a Python project. To fix this, open a terminal and executeecho $DATABRICKS_VIRTUAL_ENV, then copy the path and use it in the Python: Select Interpreter command.
After the venv is selected, Python files or notebooks can be executed with normal run or debug actions provided by the Python or Jupyter extensions.
Manage Python dependencies
The simplest way to install required dependencies is using the workspace UI. See Compute-scoped libraries. With this approach, you install dependencies globally for the cluster. You don't need to reinstall libraries each time the cluster is restarted.
However, for a more programmatic setup that is scoped to a specific project, use a notebook-scoped installation.
Project-specific setup notebook
To manage dependencies for a specific project:
Create a
setup.ipynbfile in your project.The ssh CLI creates a Python environment (
pythonEnv-xxx), which already has built-in Databricks Runtime libraries or Compute-scoped libraries. Attach the notebook to thispythonEnv-xxxenvironment.Use
%pip installcommands to install your dependencies:%pip install .if you havepyproject.toml(%pip install .<group>to scope it down)%pip install -r dependencies.txtif you havedependencies.txt%pip install /Volumes/your/wheel.whl(or/Workspacepaths) if you built and uploaded a custom library as a wheel
%pipcommands have Databricks-specific logic with additional guardrails. The logic also ensures that dependencies are available to all Spark executor nodes, not just the driver node that you are connected to. This enables user-defined functions (UDFs) with custom dependencies.For more usage examples, see Manage libraries with
%pipcommands.
Run this notebook every time you establish a new ssh session. You don’t need to re-install dependencies if an existing ssh session is dropped and reconnected back to the cluster in under 10 minutes. (The time is configurable with -shutdown-delay=10m option in your local ssh config.)
Note
If you have multiple ssh sessions connected to the same cluster at the same time, they use the same virtual environment.
Limitations
The Databricks SSH tunnel has the following limitations:
- The Databricks extension for Visual Studio Code and the Databricks SSH tunnel are not yet compatible and should not be used together.
- Any Git folder you created in your workspace through the Databricks workspace UI will not be recognized as a git repository by the git CLI and IDE git integrations, as these folders lack .git metadata. To work around this, see How do I use Git with the SSH Tunnel?.
- The home and root mounts on the cluster you connect to are ephemeral. Any content on the cluster is not preserved when the cluster is restarted.
Databricks Notebooks differences
There are some differences in notebooks when using the SSH tunnel:
- Python files don’t define any Databricks globals (like
sparkordbutils). You must import them explicitly withfrom databricks.sdk.runtime import spark. - For ipynb notebooks, these features are available:
- Databricks globals:
display,displayHTML,dbutils,table,sql,udf,getArgument,sc,sqlContext,spark %sqlmagic command to execute SQL cells
- Databricks globals:
To work with Python source “notebooks”:
Search for
jupyter.interactiveWindow.cellMarker.codeRegexand set it to:^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])Search for
jupyter.interactiveWindow.cellMarker.defaultand set it to:# COMMAND ----------
Troubleshooting
This section contains information about resolving common issues.
SSH connection fails or times out
- Make sure your cluster is RUNNING in the Databricks UI and not just stopped or starting.
- Check outbound port 22 is open and allowed on your laptop/network/VPN.
- Increase SSH connect timeout in your IDE. See Connect using Visual Studio Code or Cursor.
- If you see public or private key mismatch errors, try deleting the
~/.databricks/ssh-tunnel-keysfolder. - If you see "remote host identification has changed” errors, check the
~/.ssh/known_hostsfile and delete the entries related to your cluster. - If the SSH session is dropped after 1 hour, this is a known limitation. See Limitations.
- No more than 10 ssh connections are allowed to a single cluster.
CLI authentication errors
- Confirm your Databricks CLI profile is valid and authenticated (
databricks auth login). - Make sure you have proper cluster permissions, such as
CAN MANAGE.
Files disappear or environment resets after cluster restart
- Only
/Workspace,/Volumes, and/dbfsmounts are persistent. All data in/home,/root, etc. is erased after a restart. - Use cluster library management for persistent dependencies. Automate reinstalls using init scripts if needed. See What are init scripts?.
"Not a git repository" error or missing git features in IDE
Git works only if you clone into /Workspace/Users/<your-username> using the terminal. Web-created folders don’t have .git metadata. See How do I use Git with the SSH Tunnel?.
My code doesn’t work
- Make sure you select the right Python interpreter that has access to all Databricks Runtime dependencies.
- If you open a Python project, the Python extension can automatically detect virtual environments, but you still need to manually activate the right one. Execute Python: Select Interpreter command, and choose pythonEnv-xxx environment. It will have access to all built-in Databricks Runtime libraries, or anything you’ve installed globally on the cluster.
- In some cases the Python extension can’t automatically detect virtual environments, such as when you open a folder that can’t be recognized as a Python project. You can open a terminal and execute
echo $DATABRICKS_VIRTUAL_ENV, then copy the path and use it in the Python: Select Interpreter command.
- IPYNB notebooks and
*.pyDatabricks notebooks have access to Databricks globals, but Python*.pyfiles don’t. See Databricks Notebooks differences.
Can’t setup ssh connection on windows under WSL
Databricks recommends performing ssh setup directly on Windows. If you set it up on the WSL side, but then use a Windows version of Visual Studio Code, it won’t find necessary ssh configurations.
FAQ
How is my code and data secured?
All code runs within your Databricks cloud virtual private cloud (VPC). No data or code leaves your secure environment. SSH traffic is fully encrypted.
Which IDEs are supported?
Visual Studio Code and Cursor are officially supported, but the Databricks SSH tunnel is compatible with any IDE with SSH capabilities.
Are all Databricks notebook features available from the IDE?
Some features such as display(), dbutils, and %sql are available with limitations or manual setup. See Databricks Notebooks differences.
Can multiple users develop on the same cluster at once?
No.
Will my cluster start automatically when I connect via SSH Tunnel?
Yes, but if it takes longer to start the cluster than the connect timeout, the connection attempt will fail.
How do I know if my cluster is running?
Navigate to Compute in the Databricks workspace UI, and check the status of the cluster. The cluster must show Running for SSH tunnel connections to work.
How do I disconnect my SSH/IDE session?
You can disconnect a session by closing your IDE window, using the Disconnect option in your IDE, closing your SSH terminal, or running the exit command in the terminal.
Does disconnecting SSH automatically stop my cluster?
No, ssh server has a configurable shutdown-delay, and it will continue running in the background for a specified amount of time (10m by default, can be changed in the ssh config ProxyCommand by modifying -shutdown-delay option). After the timeout the server terminates, which kicks in the cluster idle timeout (which you configure during the cluster creation).
How do I stop the cluster to avoid unnecessary charges?
Navigate to Compute in the Databricks workspace UI, find your cluster, and click Terminate or Stop.
How should I handle persistent dependencies?
Dependencies installed during a session are lost after cluster restart. Use persistent storage (/Workspace/Users/<your-username>) for requirements and setup scripts. Use cluster libraries or init scripts for automation.
What authentication methods are supported?
Authentication uses the Databricks CLI and your ~/.databrickscfg profiles file. SSH keys are handled by the Databrick SSH tunnel.
Can I connect to external databases or services from the cluster?
Yes, as long as your cluster networking allows outbound connections and you have the necessary libraries.
Can I use additional IDE extensions?
Most extensions work when installed within your remote SSH session, depending on your IDE and cluster. Visual Studio Code by default doesn’t install local extensions on remote hosts. You can manually install them by opening the extensions panel and enabling your local extensions on the remote host. You can also configure Visual Studio Code to always install certain extensions remotely. See Connect to Databricks.
How do I use Git with the SSH Tunnel?
Currently Git folders created using the Databricks workspace UI are not recognized as git repositories in IDEs. To work around this, clone repositories using the git CLI from your SSH session into your persistent workspace folder:
- Open a terminal and navigate to a desired parent directory (for example,
cd /Workspace/Users/<your-username>) - Clone your repository in that directory.
- In Visual Studio Code, open this folder in a new window by running
code <repo-name>or open the folder in a new window using the UI.