What happened to Databricks Repos?
Azure Databricks rolled out new user interface elements that allow users to work directly with Git repo-backed folders from the Workspace UI, effectively replacing the prior, separate “Repos” feature functionality.
What does this change mean for me?
If you are a user of the Databricks Repos feature for co-versioned Git-based source control of project assets, the core functionality has not changed. The most notable difference is that many contextual UI operations now refer to “Git folders” rather than “Repos”.
For example, a Databricks folder backed by a Git repo could be created by selecting New and then Repo from the UI:
Now, you select New and choose Git folder. Same thing, different name!
This change provides some improvements that simplify working with version-controlled folders:
- Better folder organization: Git folders can be created at any level of the workspace file tree, allowing you to organize your Git folders in a way that works best for your project. For example, you can create Git folders at
/Workspace/Users/<user email>/level_1/level_2/level_3/<Git folder name>
. Repos can only be created at a fixed directory level, such as the root of the Repos user folder like/Workspace/Repos/<user email>/<Repo name>
.- Note: Git folders can contain or collocate with other assets that are not supported by Repos today. Unsupported asset types like DBSQL assets and MLflow experiments can be moved into Git folders. Serialization support for additional assets will be added over time.
- Simplified UI behaviors: This change brings a common workspace interaction–working with Git–directly into your Databricks workspace, and reduces time spent navigating between your workspace and your version-controlled Git folders.
What has changed, specifically?
- Git folders can be created outside of the
/Repos
directory. - Git folders are created by selecting New > Git folder in a Databricks workspace. This creates a new Git folder under
/Workspace/Users/<user-email>/
. - Git folders can be created at various depths of the workspace file tree as long as they are under
/Workspace/Users/<user-email>
. For example, you can create Git folders at/Workspace/Users/<user-email>/level_1/level_2/level_3/<git-folder-name>
. You can have multiple Git folders under/Workspace/Users/<user-email>
. - Unsupported assets are allowed in Git folders. Serialization support for other asset types will be added over time.
- Unlike Repos, you cannot create a new Git folder in Databricks without a remote repository URL.
What happens to my current Repos?
If you have Repos defined for your Azure Databricks workspace, they are not going away, and you are not required to migrate those existing Repos to Git folders. Instead, Repos have been integrated into the Azure Databricks workspace user interface and are no longer presented as a separate set of folders organized under a top-level Repo node. They can now be found under the /Workspace
root folder as /Workspace/Repos
.
- Existing
/Repos
references will continue to work. Paths that start with either/Repos
or/Workspace/Repos
refer to the same folder, and declared paths injobs
,dbutils.notebook.run
, and%run
references can remain unchanged. - In a rare case, you must make a one-time modification in your workspace for this redirection to work. For more details about this modification, see References to workspace objects.
Databricks recommends that users create new Git folders instead of Repos if they need to connect to Git source control from the Databricks workspace. Colocating Git repos and other workspace assets makes Git folders more discoverable and easier to manage than Repos.
Git folder permissions
Git folders have the same workspace folder permissions as other workspace folders. Users must have the CAN_MANAGE
permission in order to perform most Git operations.
Which DBR I should use for running code in Git folders?
For consistent code execution between Git folders and legacy Repos, Databricks recommends users run code only in Git folders with DBR 15+.
Current working directory (CWD) behavior
Databricks Runtime (DBR) version 14 or greater allows for the use of relative paths and provides the same current working directory (CWD) experience for all notebooks, where you run the notebook from the current working directory. Current working directory (CWD) behaviors might be inconsistent between notebooks in a Git folder and a non-Git folder for older versions of the Databricks Runtime (DBR).
Python sys.path behavior
Databricks Runtime (DBR) version 14.3 or greater provides the same sys.path
behavior in Git folders as in legacy Repos. With earlier DBR versions, Git folder behavior differs from legacy Repos as the root repo directory is not automatically added to sys.path
for Git folders. For Python, sys.path
contains a list of directories that the interpreter searches when importing modules. If you cannot use DBR 15 or above, you can manually append a folder path to sys.path
as a workaround.
For examples on how to add directories to sys.path
using relative paths, see Import Python and R modules.
Python library precedence
Databricks Runtime (DBR) version 14.3 or greater provides the same Python library precedence in Git folders as in legacy Repos.