Volume 31 Number 9
The Source of Truth: The Role of Repositories in DevOps
By Kraig Brockschmidt | September 2016
The first article of this series, “From Code to Customer: Exploring Mobile DevOps” (msdn.com/magazine/mt767694), introduced the Microsoft DevOps stack for mobile apps and back-end services. It outlined the stages of the whole release pipeline, shown in Figure 1. A release pipeline, put succinctly, is how code that’s committed to a source repository gets transformed into customer-ready apps and services, and then delivered to customer devices and customer-accessible servers. A release pipeline, at its core, is simply a list of the steps that are necessary to make a release happen. Indeed, the practice of DevOps begins with being clear about the exact steps and the processes involved in a release. From there it’s relatively easy to incrementally automate those processes using tooling within the Microsoft DevOps stack.
Figure 1 The Source Control Repository Is the Input to the Release Pipeline
As discussed in the first article, you should always understand how to do every step in a release pipeline manually. Many app projects get started with manual processes, such as manual builds, to get feedback from testers as early as possible. Knowing how to do everything manually also provides a fallback in case some part of an automated release pipeline breaks down.
However, manual processes are expensive to scale, prone to human error, often tedious and put every step at risk of competing for the attention of your employees with all their other work. Automation—having a computer perform those tasks—provides much better scaling, reliability, predictability and auditing, which means higher quality at a lower cost. Automation also frees your employees for tasks that really do need human attention.
In this article I’ll explore a very important aspect of the release pipeline, one that’s probably taken for granted: source control. A project’s source code is the input to the release pipeline, as outlined in Figure 1, and most developers today accept managing code in a source control system as a matter of course. But it helps you understand the whole of DevOps better if you can clearly see the essential role source control plays in that context.
The Reasons for Source Control
The first step in automating a release pipeline is to manage your code in a source control repository of some kind. Again, the source code is the input to the release pipeline, and the next stage, Build, is what converts that code into the artifacts that are then fed into the rest of the pipeline. To automate the build process, with continuous integration especially, systems like Visual Studio Team Services (Team Services for short) must be able to detect when changes are made to that repository. As such, it’s not sufficient to manage your source code merely in some arbitrary folder on your hard drive.
Note: If you don’t already have a free Team Services account, create one by following the instructions at bit.ly/29xK3Oo. Better still, check out the Visual Studio Dev Essentials program (bit.ly/29xKCYq), which gives you a Team Services account along with easy access to many other services, including $25 in Azure credit for 12 months.
I’ll talk about build and continuous integration in the next article of the series. Here I want to focus specifically on source control itself, starting with a brief review of why source control exists in the first place, and the general role it plays in DevOps as a whole. My reason for this is to point out something you might never have thought about before: Source control is fundamentally a form of automation.
This statement may surprise you, because most developers today consider source control a given. But that wasn’t always the case. Chances are you’ve worked on projects without source control at some point in your career, especially when you were first getting started. In those projects, you likely just had a folder on your local hard drive containing all your code, which is what you get when you create a new local app project in Visual Studio. From there, you may have run a local build to produce the executables and such needed for distribution, which you then manually uploaded to a public Web server or perhaps shared with others on physical media.
In short, source control is in no way required to produce a customer-ready app and its back-end services. But having only a single local copy of your source code has a number of problems and risks, all of which I imagine you’ve experienced directly:
- If your hard drive crashes, you might lose everything.
- A single copy of the source code doesn’t maintain any change history, making it very difficult to revert to a prior working state of the project.
- Multiple people working on the code can easily overwrite each other’s changes, or introduce breaking changes, without anyone knowing.
It’s certainly possible to mitigate these risks manually, to some extent. Regular backups, for instance, help guard against code loss and provide a certain degree of history. However, the nongranular nature of whole-project backups makes it very difficult to revert only parts of the project while leaving other changes intact. It’s also possible for people on a team to make personal copies of the code to avoid conflicts, but it’s then exceptionally tedious to integrate those copies together into a working state. Team members can also communicate breaking changes to others by direct e-mail or other messaging, but this becomes burdensome to do on a consistent basis.
Of course, as developers we generally avoid such burdens as often as we can. Instead, we find creative ways to automate these tasks, which is exactly what source control is all about.
At a high level, source control means the following:
- Maintaining a shared repository of all project code on a server, with some kind of automatic backup mechanism.
- Logging changes on a per-file basis, so you can see the entire history for any given file, as well as changes to multiple files that were committed to the repository as a group. This makes it easy to associate build failures and test regressions with specific changes, and to revert individual files or groups of files to any previous state, not just the state of the last backup.
- Storing the code in a place where a build system can detect changes to the repository and automatically trigger a build, which means performing an immediate integration test (continuous integration) with those changes.
- Managing overwrites and conflicts among multiple users, either by requiring developers to lock files for exclusive access while they’re working on them, or by having tools that can automatically merge changes and detect conflicts.
- Sending automatic notifications to interested developers when certain files change, or when merge conflicts require manual resolution.
In short, source control automates many of the tedious processes involved in maintaining a dependable and auditable repository of a project’s code. It’s essential for managing project code for both single developers and teams alike, and is the basis for an automated release pipeline.
Source Control Options
Given the pervasive need for source control, it’s no surprise that many different systems have evolved over the years. Here, though, I’ll discuss the two you can host directly within Team Services: Git and Team Foundation Version Control (TFVC). Hosting a repository means that it’s directly stored and managed within your Team Services account. The Team Services build system can also draw from external Git, GitHub and Subversion repositories, as well, which I’ll talk about in the next article in the series.
Which source control system you choose for a project is really a matter of preference, the experience of your development team and cost considerations. GitHub, for example, is free for public repositories, but has a cost for private ones (see github.com/pricing). Public GitHub repositories are open to everyone by default, which is why open source projects are typically hosted on GitHub. Many of the documentation sets I work on at Microsoft, for example, are stored there, including the entire collection of Microsoft Azure documentation (github.com/Azure/azure-content), and the docs for the Visual Studio Tools for Apache Cordova (github.com/Microsoft/cordova-docs). I also use GitHub for a variety of individual sample projects, like the Altostratus sample (github.com/kraigb/Altostratus) that appeared in MSDN Magazine last year (bit.ly/29mKHiC). Similarly, I chose GitHub for the MyDriving project (github.com/Azure-Samples/MyDriving) because it was intended to be open source from the beginning. (For more on MyDriving overall, see aka.ms/iotsampleapp.)
Team Services, on the other hand, is oriented toward private repositories (Git or TFVC) by default, which means that when you first create a repository, only you have access. To give access to others, you must specifically add them as team members (see bit.ly/29QDHql). The advantage is that you can host unlimited private repositories within your Team Services account for free for as many MSDN subscribers as you want, and costs only kick in when you have more than five users without MSDN subscriptions. For these reasons, I use Team Services for personal projects, such as apps I have in app stores and for code I want to share with only specific individuals.
The primary functional difference between Git and TFVC lies in their respective source control models. A full comparison can be found in the documentation in the topic “Choosing the right version control for your project” (bit.ly/29oZKTZ), but let me summarize.
TFVC, illustrated in Figure 2, is centralized, meaning that files live in a single, central, read-only repository, for which the administrator is responsible for backups. To do work with TFVC, you typically maintain a local read-only copy of the latest version of files from the repository (called a workspace), from which you can run builds and test the app. To make changes, you check out one or more files, which gives you exclusive access until those files are checked back in (that is, integrated into the repository). TFVC then merges the changes into the repository. If multiple developers check out the same file simultaneously (which is allowed), TFVC detects merge conflicts on check-in and informs the developers if manual resolution is needed.
Figure 2 The Basic Team Foundation Version Control Relationships
Git is distributed, meaning that although the master repository lives on the host, you “clone” the entire repository (change history and all), do your work locally and then commit completed changes to the clone as illustrated in Figure 3 (see also git-scm.com for details on Git workflows). When ready to integrate changes with everyone else’s work, you submit “pull requests” from your clone into the master. In this model, every clone is effectively a backup of the repository.
Figure 3 The Basic Git Relationships
Note that both Git and TFVC use some similar words like “check out” to describe entirely different processes, which can be confusing. It’s best to simply work with each one by itself and not expect to transfer any knowledge between the two.
Git’s pull request mechanism is able to detect when changes can be merged automatically, and when manual resolution is needed. Of course, in public repositories like GitHub, you don’t necessarily want anyone and everyone to be able to merge pull requests. Typically, only certain individuals will have permission to merge pull requests, who then act as the gatekeepers for the integrity of the repository as a whole.
For example, take a look at the top of the MyDriving repository at github.com/Azure-Samples/MyDriving and you’ll see Pull requests (see Figure 4). Clicking on that shows currently open requests that are waiting for moderation by those with merge permissions. You can also look at the whole history of pull requests by clicking on the Closed tab in the list.
Figure 4 Pull Requests List on GitHub for the MyDriving Project
Both TFVC and Git support what’s known as branching, which essentially means creating another layer between the master or central repository and other clones or copies. This allows sub-groups of developers (and testers) to do significant work in that branch without affecting the master repository. A branch also maintains its own change history and, in the case of Git, manages its own pull requests from each clone. Only when that team is ready to integrate the work in the branch with the master do they submit a pull request into the master. For more, see “Use branches to isolate risk in TFVC” (bit.ly/29ndlQz) and “Create work in branches” (bit.ly/29VVmgY) in the documentation.
Communication and Auditing
In both Git and TFVC—and in source control systems generally—check-ins, pull requests and so forth all have an associated commenting mechanism. That’s how you communicate the changes you’ve made to everyone else working on the project, and how teams can discuss those changes. These systems typically also provide notifications and discussions around broader issues (like open bugs), work items and so forth.
On GitHub, for example, you include comments with each code commit to your clone, and then make additional comments when submitting a pull request. Those responsible for checking and merging that request can then leave comments or questions within the pull request, especially if they find potential problems with the new code. You can see plenty of examples if you click around in the Closed pull request list in the MyDriving repository mentioned earlier. Similarly, click on the Issues tab on the homepage to see more discussions. As for notifications, these are managed through your personal settings in the Notifications section.
Team Services, for its part, has a whole system for tracking work items, bugs and so on, which you can read about in the Agile tools section of the Team Services documentation (bit.ly/29tvKIE). Within a code repository (whether Git or TFVC), there’s a ubiquitous commenting control, as shown in Figure 5, that lets team members leave notes on changesets (groups of changes), individual files and so on.
Figure 5 The UI for a Changeset in Visual Studio Team Services, Showing the Commenting Button
Such comments, along with specific details about what changes were made to what files, all go into the repository’s history or change log. Having an extensive, detailed history of who made what changes at what time—along with any discussion that happened around those changes—is one of the most significant benefits of using a source control system. The history makes the entire repository auditable across time. If unexpected problems come up later on in the release pipeline, such as regressions that are revealed through unit, integration or UI testing, it’s easy to go back and see the specific changes in a particular file that caused that regression.
Making that process “easy” is actually very important where DevOps as a whole is concerned. Recall from the first article in this series that I talked about DevOps as the continuous validation of performance for an app and services, where “performance” includes both the customer experience and the cost of production. The ability to discover defects as early as possible in the release pipeline helps to minimize costs. Equally important is the time it takes to pinpoint where, exactly, a defect actually exists—the quicker the better! Indeed, you’ve probably had the experience of spending many frustrating days trying to track down a bug in some project, only to find that the fix took all of 10 seconds. In such cases, nearly all the cost came from merely locating the bug.
This is a very important consideration if you or anyone on your team objects to using source control, rather than just “winging it” by keeping your code on some simple network share. The little bit of investment you make by adopting a source control system pays huge dividends over the lifetime of the project, especially when combined with automated builds, continuous integration and automated testing, as you’ll see in future articles. Continuous integration means that every code change triggers an automatic build and runs automated tests, so if that code change causes any kind of failure (build or test), you know about it within minutes.
Team Projects and Multiple Repositories
To build an automated release pipeline for your app and services within Team Services or Team Foundation Server (TFS), you begin with what’s called a team project. A team project is the container for everything you do in Team Services, including planning, work item tracking, collaboration via team rooms, builds, continuous integration, test management, release management, and source control repositories. I say “repositories” here because a team project can directly host multiple repositories, and the Team Services build system can also draw from external Git, GitHub and Subversion repositories. (Similarly, TFS can draw from a repository hosted in Team Services, and vice-versa.)
Note that a team project should not be confused with a Visual Studio project or solution; you can have as many Visual Studio solutions—along with any other code from any other development system—all within the same team project. For this reason, I like to think of a team project as a DevOps treasure box—this helps me avoid confusion between terms, and reminds me of all the DevOps goodies it can manage!
To create a team project, log into the Team Services portal and click New under Recent projects & teams. The New command brings up a dialog box in which you select either Git or TFVC as the source control model for the project’s default repository. This is really just for convenience. If you select TFVC you can create Git repositories later, as you’ll see shortly; if you select Git, you can add a TFVC repository later along with more Git repositories. And if you prefer to use an external host like GitHub, you won’t use the default repository at all and this choice is irrelevant. For example, when I set up a team project for MyDriving, I selected Git for the default repository, but all that’s there is a default readme file because the real project is hosted entirely on GitHub.
To show how a single team project can manage multiple repositories, I created an example project in my own Team Services account and selected TFVC. This project’s Code tab initially appears as shown in Figure 6. Clicking on the outlined control shows the list of existing repositories in the Team Project, along with the New repository and Manage repositories commands. The New command brings up a dialog box where you again select between Git and TFVC. Because I initially created the team project with TFVC, I can’t create another TFVC repository (only one is allowed), but I can create any number of additional Git repositories, as shown in Figure 7.
Figure 6 The Code Tab for a New Project Using Team Foundation Version Control
Figure 7 A Team Project with Multiple Git Repositories and One Team Foundation Version Control Repository
The Manage repositories command takes you to the Team Services control panel where you can see all the repositories (and Git branches) at once; rename or delete them; and manage access permissions. For all the details around users, groups and permissions—too many to cover here—refer to the documentation for “Permissions and groups in Team Services” (bit.ly/29nxvpd) in the “Git repository” and “TFVC” sections. Suffice it to say you can exercise a very fine-grained control over permissions for every one of your team members. Of course, if you’re using an external source control system like GitHub, you’ll manage permissions on that site instead. Note, however, that permissions for the team project as a whole—and not the repositories—are managed through the Security tab shown in that UI. You’ll also find those details in the same documentation page noted earlier (bit.ly/29nxvpd).
Populating a Team Services Repository with Code
Once you’ve created a repository, the big question is how you get your code into it. Team Services gives you a variety of means, as does Visual Studio, so to close this article let me briefly run through those options.
For a TVFC repository, you can first upload files into the repository or create new files directly through the Team Services portal. Navigate first to the Code tab in the team project, then click the … next to the repository and select + Add File(s), as shown in Figure 8. This brings up a dialog box (not shown) through which you can create files and edit them directly in the portal, or create a list of files to upload. In the latter case you also include a check-in comment.
Figure 8 Visual Studio Team Services Command to Add Files to a Team Foundation Version Control Repository
The other way to add code to a TFVC repository is through the Visual Studio Solution Explorer. This is a very convenient way to take a local solution and transfer all the code into source control. Just right-click the solution and select Add Solution to Source Control, as shown in Figure 9. This will bring up a dialog box in which you can select the appropriate team project within your Team Services account or on a specific TFS machine. To connect to a server, which you may need to do initially, use Team Explorer in Visual Studio (the tab outlined at the bottom of Figure 9). For details, see the “Work in Team Explorer” topic in the Visual Studio documentation (bit.ly/29oGp5j).
Figure 9 Adding a Solution to Source Control in Visual Studio
For Git repositories, Team Services will automatically prompt you with a variety of options when you create the repository or navigate to it on the Code tab, a UI that speaks sufficiently for itself. Alternatively, you can create a local Git repository in Visual Studio, and then publish it to a repository in Team Services. I won’t go through that process in detail here, because it’s nicely documented in the topic “Share Your Code with Git and Visual Studio” (bit.ly/29VDYJg). The short of it, though, is to click the Publish button on the lower right of the Visual Studio status bar, which brings up the Publish tab of Team Explorer as shown in Figure 10. Here you can select the Team Services account to use. An important detail is clicking on the small Advanced text to select a team project as shown. As you can also see, the same UI gives you an easy path to publish code to GitHub and other external repositories, as well.
Figure 10 The Publish UI in Visual Studio
With a source control repository in place, you’re now ready to take the next step in automating your release pipeline by setting up builds and continuous integration. That’s what I’ll cover in my next article, where you’ll see how any given build definition within Visual Studio Team Services can draw from any of the source control repositories I’ve discussed here. A team project, furthermore, can manage any number of such build definitions, which means the team project can coordinate builds from any number of repositories to produce the necessary artifacts for the rest of the release pipeline, or pipelines, as the case may be.
Thanks to the following technical experts for reviewing this article: Donovan Brown and Gordon Hogenson