Detecting drift between ARM templates and Azure resource groups
In DevOps Utopia, all of your Azure resources are deployed from ARM templates using a Continuous Deployment tool. The ARM templates and parameters files are all stored in source control, so you can go back through the version history to determine what was changed and what was deployed at any given time. And since only your CD service principal has permission to modify resources, you can be sure that nobody made any "rogue" changes outside of the templates and release process.
While we should all aspire to live in DevOps Utopia, chances are you're not quite there yet. Getting to this requires an investment in processes, tools and culture, and different organisations will have different priorities and rates of progress. So a lot of teams are in a situation where they are investing in automation and continuous deployment, but they still have situations where changes are being made manually, for example using the Azure portal.
For teams in this semi-automated state, one common question I hear is how to tell whether their Azure resources that were originally deployed from a template have "drifted" from their original configuration via manual changes. One approach is to check the Azure Activity Log, but this is far from ideal. While the Activity Log does show changes to resources, it's often hard to figure out exactly what was changed and whether the change was made via a template or some other means. Also the logs are only stored for 90 days unless you've explicitly chosen to retain them in another location.
So I decided to build my own script to make this a little easier. The ArmConfigurationDrift script takes an ARM template and parameters file and compares it to the resources deployed in an existing Azure Resource Group. It will then let you know if any resources are deployed in one location but not the other, or if any parameters differ between the two. The following diagram shows the approach.
The "expand template" script is particularly interesting. An ARM template can't be directly compared with deployed resources as it will include parameters, variables and functions that need to be expanded before the resource list can be used. I've never seem this documented, but it turns out that you can use the Test-AzureRmResourceGroupDeployment cmdlet and capture the "Debug" stream to get an expanded version of the template, as shown in this sample.
Once you have the expanded template, it's relatively easy to pull the metadata about the resources deployed to a resource group and compare the two. Unfortunately I found that there are a number of cases where the properties in the two collections are not identical even if the resources are the same. To minimise false positives, I only report on differences where the exact same property exists in each collection and has different values. I also normalise the different representations of locations so, for example, "Australia East" and "australiaeast" are treated as the same. Even so, there may still be situations where the tool reports false positives or doesn't detect legitimate differences. If you find any such cases or feel like improving the script, please log an issue or make a pull request on the GitHub site.