Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure Databricks Git folders is a visual Git client and API that integrates Git repositories within your workspace. Use Git folders to develop code in notebooks and files while following software development best practices using Git for version control, collaboration, and CI/CD. Git folders supports common Git operations such as cloning a repository, committing and pushing, pulling, branch management, and visually comparing diffs when committing.
This page covers:
Git folders capabilities
Azure Databricks Git folders provide source control for data and AI projects by integrating with Git providers.
Use Git functionality from your Azure Databricks workspace to:
- Clone, push to, and pull from a remote Git repository.
- Create and manage branches for development work, including merging, rebasing, and resolving conflicts.
- Create notebooks, including IPYNB notebooks, and edit them and other files.
- Visually compare differences upon commit and resolve merge conflicts.
For step-by-step instructions, see Run Git operations on Databricks Git folders.
Git folders API
Azure Databricks Git folders have an API to integrate with your CI/CD pipeline. For example, programmatically update a workspace Git folder so that it always has the most recent version of the code. For information about best practices for code development using Azure Databricks Git folders, see CI/CD with Databricks Git folders.
Git providers
A Git provider is a service that hosts a Git-based source control system. These platforms come in two main forms: a cloud service hosted by the vendor, or an on-premises service that your organization installs and manages on its own hardware. Many providers, including GitHub, Microsoft, GitLab, and Atlassian, offer both cloud SaaS and on-premises (often called “self-managed”) options.
Azure Databricks Git folders use an integrated Git repository. Any of the cloud or enterprise Git providers listed in the following sections can host the repository.
When selecting a Git provider during configuration, make sure that you understand the differences between cloud (SaaS) and on-premises systems. Organizations often host self-managed providers behind a VPN, which can make them inaccessible from the public internet. These versions often include “Server” or “Self-Managed” in their names. If you’re unsure which one your organization uses, check your provider’s documentation or ask your company admins.
If your cloud Git provider doesn’t appear in the supported provider list, choosing GitHub might work as a fallback, although this isn’t guaranteed.
Note
If you're using GitHub as a provider and are still uncertain whether you're using the cloud or on-premises version, see About GitHub Enterprise Server in the GitHub docs.
Supported cloud Git providers
Azure Databricks Git folders integrate with the following cloud-based Git providers:
- GitHub, GitHub Advanced Enterprise, and GitHub Enterprise Cloud
- Atlassian Bitbucket Cloud
- GitLab and GitLab Enterprise Edition
- Microsoft Azure DevOps (Azure Repos)
Supported on-premises Git providers
Azure Databricks Git folders integrate with the following on-premises Git providers:
- GitHub Enterprise Server
- Atlassian Bitbucket Server and Data Center
- GitLab Self-Managed
- Microsoft Azure DevOps Server: A workspace admin must explicitly allowlist the URL domain prefixes for your Microsoft Azure DevOps Server if the URL doesn't match
dev.azure.com/*orvisualstudio.com/*. See Git URL allowlists.
If you're integrating an on-premises Git repo that isn't accessible from the internet, you must also install a proxy for Git authentication requests within your company's VPN. See Set up private Git connectivity for Azure Databricks Git folders (Repos).
To learn how to use access tokens with your Git provider, see Configure Git credentials & connect a remote repo to Azure Databricks.