Migrate to Git from centralized version control
Migrating a team to Git from centralized version control requires more than just learning new commands. To support distributed development, Git stores file history and branch information differently than a centralized version control system. Planning and implementing a successful migration to Git from a centralized version control system requires understanding these fundamental differences.
Microsoft has helped migrate many internal teams and customers from centralized version control systems to Git. This experience has produced the following guidance based on practices that consistently succeed.
Steps for successful migration
For a successful migration, teams should:
- Evaluate current tools and processes.
- Select a Git branching strategy.
- Decide whether and how to migrate history.
- Maintain the previous version control system.
- Remove binary files, executables, and tools from source control.
- Train teams in Git concepts and practices.
- Test the migration to Git.
Evaluate current tools and processes
Changing version control systems naturally disrupts the development workflow with new tools and practices. This disruption can be an opportunity to improve other aspects of the DevOps process.
Teams should consider adopting the following practices as they migrate to the new system:
Required code reviews before checking in code. In the Git branching model, pull request code review is part of the development process. Code reviews complement the CI workflow.
Continuous delivery (CD) to automate deployment processes. Changing version control tools requires deployment process changes, so a migration is a good time to adopt a modern release pipeline.
Select a Git branching strategy
Before migrating code, the team should select a branching strategy.
In Git, short-lived topic branches allow developers to work close to the main branch and integrate quickly, avoiding merge problems. Two common topic branch strategies are GitFlow and a simpler variation, GitHub Flow.
Git discourages long-lived, isolated feature branches, which tend to delay merges until integration becomes difficult. By using modern CD techniques like feature flags, teams can integrate code into the main branch quickly, but still keep in-progress features hidden from users until they're complete.
Teams that currently use a long-lived feature branch strategy can adopt feature flags before migrating to Git. Using feature flags simplifies migration by minimizing the number of branches to migrate. Whether they use feature branches or feature flags, teams should document the mapping between legacy branches and new Git branches, so everyone understands where to commit their new work.
Decide whether to migrate history
Teams might be tempted to migrate their existing source code history to Git. Several tools claim to migrate a complete history of all branches from a centralized tool to Git. A Git commit appears to map relatively well to the changeset or check-in model that the previous version control tool used.
However, this mapping has some serious limitations.
In most centralized version control systems, branches exist as folders in the repository. For example, the main branch might be a folder named /trunk, and other branches are folders like /branch/one and /branch/two. In a Git repository, branches include the entire repository, so a 1:1 translation is difficult.
In some version control systems, a tag or label is a collection that can contain various files in the tree, even files at different versions. In Git, a tag is a snapshot of the entire repository at a specific point in time. A tag can't represent a subset of the repository or combine files at different versions.
Most version control systems store details about the way files change between versions, including fine-grained change types like rename, undelete, and rollback. Git stores versions as snapshots of the entire repository, and metadata about the way files changed isn't available.
These differences mean that a full history migration will be lossy at best, and possibly misleading. Given the lossiness, the effort involved, and the relative rarity of using history, it's recommended that most teams avoid importing history. Instead, teams should do a tip migration, bringing only a snapshot of the most recent branch version into Git. For most teams, time is best spent on areas of the migration that have a higher return on investment, such as improving processes.
Maintain the old version control system
During and after a migration, developers might still need access to the previous version control history. Although the previous version control history becomes less relevant over time, it's still important to be able to refer to it. Highly regulated environments might have specific legal and auditing requirements for version control history.
Especially for teams that do only a tip migration, it's highly recommended to maintain the previous system indefinitely. Set the old version control system to read-only after you migrate.
Large development teams and regulated environments can place breadcrumbs in Git that point back to the old version control system. A simple example is a text file added as the first commit at the root of a Git repository, before the tip migration, that points to the URL of the old version control server. If many branches migrate, a text file in each branch should explain how the branches migrated from the old system. Breadcrumbs are also helpful for developers who start working on a project after it's been migrated and aren't familiar with the old version control system.
Remove binary files and tools
Git's storage model is optimized for versioning text files and source code, which are compact and highly compressible. Binary files are usually large, and once they're added to a repository, they remain in the repository history and in every future clone. Because of the way Git stores history, developers should avoid adding binary files to repositories, especially binaries that are very large or that change often. Migrating to Git is an opportunity to remove these binaries from the codebase.
It's also recommended to exclude libraries, tools, and build output from repositories. Instead, use package management systems like NuGet to manage dependencies.
Assets like icons and artwork might need to align with a specific version of source code. Small, infrequently-changed assets like icons won't bloat history, and you can include them directly in a repository. To store large or frequently-changing assets, use the Git Large File Storage (LFS) extension. For more information about managing large files in GitHub, see Managing large files. For Azure Repos, see Manage and store large files in Git.
One of the biggest challenges in migrating to Git is helping developers understand how Git stores changes and how commits form development history. It's not enough to prepare a cheat sheet that maps old commands to Git commands. Developers need to stop thinking about version control history in terms of a centralized, linear model, and understand Git's history model and the commit graph.
People learn in different ways, so you should provide several types of training materials. Live, lab-based training with an expert instructor works well for some people. The Pro Git book is an excellent starting point that is available free online.
Available free hands-on training courses include:
- Introduction to version control with Git learning path.
- The Get started with Git in Azure Repos quickstart.
- GitHub's Git and GitHub learning resources.
Organizations should work to identify Git experts on teams, empower them to help others, and encourage other team members to ask them questions.
Test the migration
Once teams update their processes, analyze their code, and train their members, it's time to migrate the source code. Whether you do a tip migration or migrate history, it's important to do one or more test migrations into a test repository. Before you do a final migration, make sure:
- All code files migrate.
- All branches are available.
- There are no stray binaries in the repository.
- Users have the appropriate permissions to fetch and push.
- Builds are successful, and all tests pass.
Migrate the code
Do the final migration during nonwork hours, ideally between milestones when there's natural downtime. Migrating at the end of a sprint might cause issues while developers are trying to finish work. Try to migrate over a weekend, when nobody needs to check in.
Plan for a firm cutover from the old version control system to Git. Trying to operate multiple systems in parallel means developers might not know where or how to check in. Set the old version control system to read-only to help avoid confusion. Without this safeguard, a second migration that includes interim changes might be necessary.
The actual migration process varies depending on the system you're migrating from. For information about migrating from Team Foundation Version Control, see Migrate from TFVC to Git.
- Determine how builds will run.
- Decide when tests will run.
- Develop a release management process.
- Move code reviews to pull requests.
- Pick a Git branching strategy.
- Document the branching strategy, why it was selected, and how legacy branches map.
- Decide how long to keep legacy version control running.
- Identify branches that need to migrate.
- If needed, create breadcrumbs to help engineers navigate back to the legacy system.
Binaries and tools:
- Identify which binaries and undiffable files to remove from the repo.
- Decide on an approach for large files, such as Git-LFS.
- Decide on an approach for delivering tools and libraries, such as NuGet.
- Identify training materials.
- Plan training events, written materials, and videos.
- Identify members of the team to serve as local Git experts.
- Do multiple test runs to ensure the migration will go smoothly.
- Identify and communicate a cutover time.
- Create the new Git repo.
- Set the old system to read-only.
- Migrate the main branch first, then any other needed branches.
Submit and view feedback for