Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The following sections specify limits for Databricks Git folders and Git integration. For general information, see Resource limits.
Jump to:
To learn about Databricks asset types supported in Git folders, see What asset types are supported by Git folders?.
File and repo limits
Azure Databricks doesn't enforce a limit on repository size. However:
- Working branches are limited to 1 GB.
- You can't view files larger than 10 MB in the Azure Databricks UI.
- Individual workspace files have separate size limits. See Limitations.
- Local branches can remain in the associated Git folder for up to 30 days after the remote branch is deleted. To remove a local branch completely, delete the repository.
Databricks recommends keeping the total number of workspace assets and files under 20,000.
Each Git operation is limited to 2 GB of memory and 4 GB of disk writes. Since limits apply per operation, cloning a 5 GB repository fails, but cloning a 3 GB repository and later adding 2 GB succeeds.
If your repository exceeds these limits, you might receive an error or a timeout during cloning, though the operation might still complete in the background.
To work with larger repositories, try sparse checkout or Git CLI commands.
To write temporary files that don't persist after cluster shutdown, use $TEMPDIR. This avoids exceeding branch size limits and offers better performance than writing to a working directory (CWD) in the workspace filesystem. See Where should I write temporary files on Azure Databricks?.
Recovering deleted files
File recoverability varies by action. Some actions allow recovery through the Trash folder, while others don't. Files previously committed and pushed to a remote branch can be restored using the remote repository's Git commit history:
| Action | Is the file recoverable? |
|---|---|
| Delete file with workspace browser | Yes, from the Trash folder |
| Discard a new file with the Git folder dialog | Yes, from the Trash folder |
| Discard a modified file with the Git folder dialog | No, the file is gone |
reset (hard) for uncommitted file modifications |
No, file modifications are gone |
reset (hard) for uncommitted, newly created files |
No, file modifications are gone |
| Switch branches with the Git folder dialog | Yes, from the remote Git repo |
| Other Git operations, such as commit or push, from the Git folder dialog | Yes, from the remote Git repo |
PATCH operations updating /repos/id from Repos API |
Yes, from the remote Git repo |
Monorepo support
Databricks recommends against creating Git folders backed by monorepos—large, single-organization Git repositories with thousands of files across many projects.
Configuration
This section covers Git folder storage, server support, and general setup questions.
Repository content storage
Azure Databricks temporarily clones repository contents to disk in the control plane. The control plane database stores notebook files like those in the main workspace. Non-notebook files are stored on disk for up to 30 days.
On-premises and self-hosted Git servers
Databricks Git folders support GitHub Enterprise, Bitbucket Server, Azure DevOps Server, and GitLab Self-managed if the server is internet-accessible. See Git Proxy Server for Git folders for on-premises integration.
To integrate with a Bitbucket Server, GitHub Enterprise Server, or GitLab self-managed instance that isn't internet-accessible, contact your Azure Databricks account team.
Supported asset types
For details on supported artifact types, see What asset types are supported by Git folders?.
Do Git folders support .gitignore files?
Yes. To prevent Git from tracking a file, add the filename (including extension) to a .gitignore file. Either create one or use an existing file cloned from your remote repository.
.gitignore works only for untracked files. Adding an already-tracked file to .gitignore doesn't stop Git from tracking it.
Git submodule support
Standard Git folders don't support Git submodules, but Git folders with Git CLI access can use them. See Use Git CLI commands (Beta).
Does Azure Data Factory (ADF) support Git folders?
Yes.
Source management
This section covers branching, merging, and how Git folders handle notebooks and dependencies.
Notebook dashboards and branch changes
Azure Databricks source format notebooks don't store dashboard information.
To preserve dashboards, change the notebook format to .ipynb (Jupyter format), which supports dashboard and visualization definitions by default. To preserve visualization data, commit the notebook with outputs.
See Manage IPYNB notebook output commits.
Do Git folders support branch merging?
Yes. You can also create a pull request and merge through your Git provider.
Deleting branches
To delete a branch, you must work in your Git provider.
Python dependency precedence
Python libraries in a Git folder take precedence over libraries stored elsewhere. For example, if a library is installed on your Databricks compute and a library with the same name exists in a Git folder, the Git folder library is imported. See Python library precedence.
Security, authentication, and tokens
This section covers encryption, token storage, and authentication issues with Git providers.
Issue with a conditional access policy (CAP) for Microsoft Entra ID
You might get a "denied access" error when cloning a repository if:
- Your Azure Databricks workspace uses Azure DevOps with Microsoft Entra ID authentication.
- You've enabled a conditional access policy in Azure DevOps and an Microsoft Entra ID conditional access policy.
To resolve this, add an exclusion to the conditional access policy (CAP) for Azure Databricks IP addresses or users.
For more information, see Conditional access policies.
Allowlist with Microsoft Entra ID tokens
If you use Microsoft Entra ID for authenticating with Azure DevOps, the default allowlist restricts Git URLs to:
dev.azure.comvisualstudio.com
For more information, see Allow lists restrict remote repo usage.
Git folder encryption
Azure Databricks encrypts Git folder contents using a default key. Customer-managed keys are only supported for encrypting Git credentials.
GitHub token storage and access
- The Azure Databricks control plane stores authentication tokens. Employees can only access them through audited temporary credentials.
- Azure Databricks logs token creation and deletion, but not usage. Git operation logging lets you audit token usage by the Azure Databricks application.
- GitHub Enterprise audits token usage. Other Git services might also offer server auditing.
Do Git folders support GPG signing of commits?
No.
Do Git folders support SSH?
No. Git folders support only HTTPS.
Azure DevOps cross-tenancy errors
When connecting to DevOps in a separate tenancy, you might see Unable to parse credentials from Azure Active Directory account. If the Azure DevOps project is in a different Microsoft Entra ID tenancy than Azure Databricks, use an Azure DevOps access token. See Connect to an Azure DevOps repo using a token.
CI/CD and MLOps
This section covers how Git operations affect notebook state, MLflow experiments, and job execution.
Incoming changes clear the notebook state
Git operations that alter notebook source code result in loss of notebook state, including cell outputs, comments, version history, and widgets. For example, git pull can change notebook source code, requiring Databricks Git folders to overwrite the existing notebook. Operations like git commit, push, or creating a new branch don't affect source code and preserve notebook state.
Important
MLflow experiments don't work in Git folders with DBR 14.x or lower versions.
MLflow experiments in Git folders
There are two types of MLflow experiments: workspace and notebook. See Organize training runs with MLflow experiments.
Workspace experiments: You can't create workspace MLflow experiments in Git folders. Log MLflow runs to an experiment created in a regular workspace folder. For multi-user collaboration, use a shared workspace folder.
Notebook experiments: You can create notebook experiments in a Databricks Git folder. If you check your notebook into source control as an
.ipynbfile, MLflow runs log to an automatically created experiment. Source control doesn't check in the experiment or its runs. See Create notebook experiment.
Prevent data loss in MLflow experiments
Notebook MLflow experiments created using Lakeflow Jobs with source code in a remote repository are stored in temporary storage. These experiments persist initially after workflow execution, but risk deletion during scheduled cleanup. Databricks recommends using workspace MLflow experiments with Jobs and remote Git sources.
Warning
Switching to a branch that doesn't contain the notebook risks losing the associated MLflow experiment data. This loss becomes permanent if you don't access the prior branch within 30 days.
To recover missing experiment data before the 30-day expiry, restore the original notebook name, open the notebook, and click
in the right pane. This triggers mlflow.get_experiment_by_name() and recovers the experiment and runs. After 30 days, Azure Databricks purges orphaned MLflow experiments for GDPR compliance.
To prevent data loss, avoid renaming notebooks in a repository. If you rename a notebook, immediately click the experiment icon in the right pane.
Running jobs during Git operations
During a Git operation, some notebooks might be updated while others aren't yet, causing unpredictable behavior.
For example, if notebook A calls notebook Z using %run and a job starts during a Git operation, the job might run the latest notebook A with an older notebook Z. The job might fail or run notebooks from different commits.
To avoid this, configure job tasks to use your Git provider as the source instead of a workspace path. See Use Git with jobs.
Resources
For details on Databricks workspace files, see What are workspace files?.