Databricks Git folder limits and reference

The following sections specify limits for Databricks Git folders and Git integration. For general information, see Resource limits.

Jump to:

File and repo limits
Configuration

To learn about Databricks asset types supported in Git folders, see Supported asset types in Git folders.

File and repo limits

Azure Databricks doesn't enforce a limit on repository size. However:

Working branches are limited to 1 GB.
You can't view files larger than 10 MB in the Azure Databricks UI.
Individual workspace files have separate size limits. See Limitations.
Local branches can remain in the associated Git folder for up to 30 days after the remote branch is deleted. To remove a local branch completely, delete the repository.

Databricks recommends keeping the total number of workspace assets and files under 20,000.

Each Git operation is limited to 2 GB of memory and 4 GB of disk writes. Since limits apply per operation, cloning a 5 GB repository fails, but cloning a 3 GB repository and later adding 2 GB succeeds.

If your repository exceeds these limits, you might receive an error or a timeout during cloning, though the operation might still complete in the background.

To work with larger repositories, try sparse checkout or Git CLI commands.

To write temporary files that don't persist after cluster shutdown, use $TEMPDIR. This avoids exceeding branch size limits and offers better performance than writing to a working directory (CWD) in the workspace filesystem. See Where should I write temporary files on Azure Databricks?.

Reduce repository size

If your repository exceeds size limits due to large files, adding them to .gitignore won't reduce repository size. Files already committed to Git remain in the repository history even when added to .gitignore.

To reduce repository size:

Use Git tools like git filter-repo or BFG Repo-Cleaner to remove large files from commit history. This rewrites history and requires force-pushing to your remote repository.
Clone only specific directories. See Configure sparse checkout mode.
Move unrelated code to separate repositories.

For details on removing files from Git history, see GitHub's documentation.

Recovering deleted files

File recoverability varies by action. Some actions allow recovery through the Trash folder, while others don't. Files previously committed and pushed to a remote branch can be restored using the remote repository's Git commit history:

Action	Is the file recoverable?
Delete file with workspace browser	Yes, from the Trash folder
Discard a new file with the Git folder dialog	Yes, from the Trash folder
Discard a modified file with the Git folder dialog	No, the file is gone
`reset` (hard) for uncommitted file modifications	No, file modifications are gone
`reset` (hard) for uncommitted, newly created files	No, file modifications are gone
Switch branches with the Git folder dialog	Yes, from the remote Git repo
Other Git operations, such as commit or push, from the Git folder dialog	Yes, from the remote Git repo
`PATCH` operations updating `/repos/id` from Repos API	Yes, from the remote Git repo

Monorepo support

Databricks recommends against creating Git folders backed by monorepos—large, single-organization Git repositories with thousands of files across many projects.

Configuration

This section covers Git folder storage, server support, and general setup questions.

Repository content storage

Azure Databricks temporarily clones repository contents to disk in the control plane. The control plane database stores notebook files like those in the main workspace. Non-notebook files are stored on disk for up to 30 days.

On-premises and self-hosted Git servers

Databricks Git folders support GitHub Enterprise, Bitbucket Server, Azure DevOps Server, and GitLab Self-managed if the server is internet-accessible. See Git Proxy Server for Git folders for on-premises integration.

To integrate with a Bitbucket Server, GitHub Enterprise Server, or GitLab self-managed instance that isn't internet-accessible, contact your Azure Databricks account team.

Supported asset types

For details on supported artifact types, see Supported asset types in Git folders.

Do Git folders support `.gitignore` files?

Yes. To prevent Git from tracking a file, add the filename (including extension) to a .gitignore file. Either create one or use an existing file cloned from your remote repository.

.gitignore works only for untracked files. Adding an already-committed file to .gitignore doesn't remove it from Git history or reduce repository size. To remove committed files, see Reduce repository size.

Git submodule support

Standard Git folders don't support Git submodules, but Git folders with Git CLI access can use them. See Use Git CLI commands (Beta).

Does Azure Data Factory (ADF) support Git folders?

Yes.

Source management

This section covers branching, merging, and how Git folders handle notebooks and dependencies.

Notebook dashboards and branch changes

Azure Databricks source format notebooks don't store dashboard information.

To preserve dashboards, change the notebook format to .ipynb (Jupyter format), which supports dashboard and visualization definitions by default. To preserve visualization data, commit the notebook with outputs.

See Manage IPYNB notebook output commits.

Do Git folders support branch merging?

Yes. You can also create a pull request and merge through your Git provider.

Deleting branches

To delete a branch, you must work in your Git provider.

Python dependency precedence

Python libraries in a Git folder take precedence over libraries stored elsewhere. For example, if a library is installed on your Databricks compute and a library with the same name exists in a Git folder, the Git folder library is imported. See Python library precedence.

Security, authentication, and tokens

This section covers encryption, token storage, and authentication issues with Git providers.

Issue with a conditional access policy (CAP) for Microsoft Entra ID

You might get a "denied access" error when cloning a repository if:

Your Azure Databricks workspace uses Azure DevOps with Microsoft Entra ID authentication.
You've enabled a conditional access policy in Azure DevOps and an Microsoft Entra ID conditional access policy.

To resolve this, add an exclusion to the conditional access policy (CAP) for Azure Databricks IP addresses or users.

For more information, see Conditional access policies.

Allowlist with Microsoft Entra ID tokens

If you use Microsoft Entra ID for authenticating with Azure DevOps, the default allowlist restricts Git URLs to:

dev.azure.com
visualstudio.com

For more information, see Allow lists restrict remote repo usage.

Git folder encryption

Azure Databricks encrypts Git folder contents using a default key. Customer-managed keys are only supported for encrypting Git credentials.

GitHub token storage and access

The Azure Databricks control plane stores authentication tokens. Employees can only access them through audited temporary credentials.
Azure Databricks logs token creation and deletion, but not usage. Git operation logging lets you audit token usage by the Azure Databricks application.
GitHub Enterprise audits token usage. Other Git services might also offer server auditing.

Do Git folders support GPG signing of commits?

No.

Do Git folders support SSH?

No. Git folders support only HTTPS.

Azure DevOps cross-tenancy errors

When connecting to DevOps in a separate tenancy, you might see Unable to parse credentials from Azure Active Directory account. If the Azure DevOps project is in a different Microsoft Entra ID tenancy than Azure Databricks, use an Azure DevOps access token. See Personal access token.

CI/CD and MLOps

This section covers how Git operations affect notebook state, MLflow experiments, and job execution.

Incoming changes clear the notebook state

Git operations that alter notebook source code result in loss of notebook state, including cell outputs, comments, version history, and widgets. For example, git pull can change notebook source code, requiring Databricks Git folders to overwrite the existing notebook. Operations like git commit, push, or creating a new branch don't affect source code and preserve notebook state.

Important

MLflow experiments don't work in Git folders with Databricks Runtime 14.x or lower versions.

MLflow experiments in Git folders

There are two types of MLflow experiments: workspace and notebook. See Organize training runs with MLflow experiments.

Workspace experiments: You can't create workspace MLflow experiments in Git folders. Log MLflow runs to an experiment created in a regular workspace folder. For multi-user collaboration, use a shared workspace folder.
Notebook experiments: You can create notebook experiments in a Databricks Git folder. If you check your notebook into source control as an .ipynb file, MLflow runs log to an automatically created experiment. Source control doesn't check in the experiment or its runs. See Create notebook experiment.

Prevent data loss in MLflow experiments

Notebook MLflow experiments created using Lakeflow Jobs with source code in a remote repository are stored in temporary storage. These experiments persist initially after workflow execution, but risk deletion during scheduled cleanup. Databricks recommends using workspace MLflow experiments with Jobs and remote Git sources.

Warning

Switching to a branch that doesn't contain the notebook risks losing the associated MLflow experiment data. This loss becomes permanent if you don't access the prior branch within 30 days.

To recover missing experiment data before the 30-day expiry, restore the original notebook name, open the notebook, and click Experiment icon in the right pane. This triggers mlflow.get_experiment_by_name() and recovers the experiment and runs. After 30 days, Azure Databricks purges orphaned MLflow experiments for GDPR compliance.

To prevent data loss, avoid renaming notebooks in a repository. If you rename a notebook, immediately click the experiment icon in the right pane.

Running jobs during Git operations

During a Git operation, some notebooks might be updated while others aren't yet, causing unpredictable behavior.

For example, if notebook A calls notebook Z using %run and a job starts during a Git operation, the job might run the latest notebook A with an older notebook Z. The job might fail or run notebooks from different commits.

To avoid this, configure job tasks to use your Git provider as the source instead of a workspace path. See Use Git with jobs.

Resources

For details on Databricks workspace files, see What are workspace files?.

Feedback

Was this page helpful?

Last updated on 2026-02-02